Principles of Population Genetics
Various aspects of "Population"

     gene pool (a genetic unit):
          all alleles at a (single) locus in a:
     deme (an ecological unit):
          all conspecific individuals in an area
          (areas may be more or less defined)

     panmictic unit (a reproductive unit):
          group of interbreeding individuals
          (interbreeding may be more or less random)

     sample (a statistical unit):
          subset of size 'N'
          (population of N individuals has 2N alleles)


Theory of allele frequencies: minding  p's & q's

Genetic variation in populations described by genotype & allele frequencies
            (not "gene" frequencies) [NS 01-01]

Consider a diploid autosomal locus with two alleles & no dominance
      (=> semi-dominance: AA , Aa , aa  phenotypes distinguishable)

      # AA =    # Aa = y    # aa = z    x + y + z = N (sample size)

      f(AA) = x / N       f(Aa) = y / N       f(aa) = z / N

      f(A) = (2x + y) / 2N          f(a) = (2z + y) / 2N

            or    f(A) = f(AA) + 1/2 f(Aa)      f(a) = f(aa) + 1/2 f(Aa)

            let p = f(A), q = f(a)    p & q are allele frequencies 

      Properties of p & q

        p + q = 1     p = 1 - q    q = 1 - p

            (p + q)=  p2 + 2pq + q2  =  1

            (1 - q)2 + 2(1 - q)(q) + q2 = 1

        p & q interchangeable wrt [read, "with respect to"] A & a

        q usually used for
                  rarer, recessive, deleterious (disadvantageous), or "interesting" allele

              BUT   'common' & 'rare' are statistical properties
                         'dominant' & 'recessive' are genotypic properties
                         'advantageous' & 'deleterious' are phenotypic properties
                  *** any combination of these properties is possible ***



The Hardy-Weinberg Theorem

What happens to p & q in one generation of random mating?

Consider a population of monoecious organisms that reproduce by random union of gametes
      
("tide pool" model)

      (1) Determine expectation
                 of parental alleles coming together in various genotype combinations
                 expectation: the anticipated value of a variable
                                       not quite the same as probability [NS 03-Box1]

            Proofs by probability, binomial expansion, & Punnet Square methods [SR2019 3.1]
            all show that expectation of f(AA) = p2
                                 expectation of f(Aa) = 2pq
                                 expectation of f(aa) = q2

     (2) Re-describe offspring allele frequencies f(A') & f(a')  

       f(A') = f(AA) + 1/2 f(Aa)
                    = p2 + (1/2)(2pq) = p2 + pq = (p)(p+q) = p' = p
 

       f(a') = f(aa) + 1/2 f(Aa)
                    = q2 + (1/2)(2pq) = q2 + pq = (q)(p+q) = q' = q



Hardy - Weinberg Theorem (1908):
     Absent other genetic or evolutionary factors,
        allele frequencies are invariant between generations,
            & constant genotype frequencies reached in one generation.

     p2 : 2pq : q2 are Hardy-Weinberg expectations / proportions (cf. Mendelian ratios 1 : 2 : 1 )

     Not HW "equilibrium": frequencies change within & between generations during evolution


      Hardy-Weinberg Expectations (HWE) obtained under more realistic conditions

            (1) multiple alleles / locus

                  p + q + r = 1
                  (p + q + r)2 = p2 + 2pq + q2 + 2qr + r2 + 2pr = 1

                  Proportion of heterozygotes (H = 'heterozygosity')
                         measures genetic variation at a locus

              Hobs = f(Aa) = observed heterozygosity
              Hexp = 2pq   = expected heterozygosity (for two alleles)

              He = 2pq + 2pr + 2qr = 1 - (p2 + q2 + r2)    for three alleles

                                n
                  He = 1 -  (qi)2      for n alleles
                               i=1

                        where qi = freq. of i th allele of n alleles at a locus
 
             Ex.: if q1 = 0.5, q2 = 0.3, & q3 = 0.2
                            then He = 1 - (0.52 + 0.32 + 0.22) = 0.62

            *** HOMEWORK:
                    
Calculate He 1) if q1 = 0.4, q2 = 0.3, q3 = 0.2, & q4 = 0.1
                                               2) for a locus with 10 or 100 alleles, all at equal frequency
                                          
3) with one allele at q = 0.5 & 10 or 100 at equal frequency [note change]

            (2) sex-linked loci
                    iff [read: "if and only if"] allele frequencies in males & females equal
                    If frequencies initially unequal, they converge over several generations.

            (3) dioecious organisms
                    sexes separate
                    HWE produced by random mating of individuals
                        expand (p2 'AA' + 2pq 'AB' + q2 'BB')2 :
                               nine possible mating types among genotypes
                    selfing (self-fertilization) remains possible



Application of Hardy-Weinberg Expectations (HWE) to evolutionary genetics

Genotype proportions in natural populations can be tested for HWE
     Ho (null hypothesis): no other phenomena acting
     NoteHWE often called a HW equilibrium, BUT
                HWE observed only at birth of any single generation
                         changes bx newborns & adults
due to other factors:
                => HWE not an "equilibrium"
    See Excel spreadsheets for Chi-Square calculations   

    Ex.: MN blood groups in Homo

      Among Euro-Americans:
MM
MN
NN
Sum
1787
3039
1303
6129

        f(M) = [(2)(1787) + 3039] / (2)(6129) = 0.539

        f(N) = [(2)(1303) + 3039] / (2)(6129)  = 0.461    = 1.0 - 0.539

     Chi-square (2) test (NS 01-Box 3):

  N genotypes
#
Observed
Expected
(obs-exp)
d2/exp
MM
  p2
(0.539)2(6129)
1787
1781
6
0.020
MN
  2pqN
(2)(0.539)(0.461)(6129)
3039
3046
-7
0.012
NN
  q2
(0.461)2(6129)
1303
1302
1
0.000
 

   6129  6129
2
0.032ns
(cf. critical value p.05[1 d.f.] = 3.84)                               ( p >> 0.05)
 
      Use one degree of freedom, because although there are three observed classes,
                                                      expectations derived from either one of two alleles

      HOMEWORK: S&R Table 3.1 & Eqn 3.2 are wrong: explain the error, correct the calculation
            See notes on Chi-Square calculations for some hints



     But (you ask) won't "expected" always more or less equal "observed",
            cuz that's where "expected" comes from?
 
            Consider artificial data set : MN blood types
 
 
MM
MN
NN
Sum
f(M)
f(N)
Diné
305
52
4
361
0.917
0.083
Koori
22
216
492
730
0.176
0.824
Combined
327
268
496
1091
0.423
0.577

    Homework: show
Diné & Koori populations separately exhibit HWP

         Chi-square test on combined data:
 
Obs
Exp
d=(O-E)
d2/Exp
MM
327
195
 132
89.35
MN
268
532
-264
131.01
NN
496
364
132
47.87
     
2 =
268.23***
                                                                                  (p << 0.001)

      *=> A mixture of populations, each of which shows HWP,
            will not show expected HWP
            if allele frequencies differ in the separate populations.

      Wahlund Effect: artificial mixture of populations deficient in heterozygotes [NS 01-02]
                    (Relate this to F statistics and population structure, later on)


Advanced topics in allele / phenotype frequencies:
        Estimating & testing phenotype proportions, with multiple alleles & dominance
        Ex. ABO blood group system

        Three alleles (A, B, O) produce
            six
genotypes (AA, AO, BB, BO, AB, OO) with
                four
phenotypes ("A", "B", "AB" "O")
                        A & B dominant to O; "A" = AA + AO; "B" = BB + BO
                        A & B" co-dominant as "AB"

Challenge: Cannot obtain exact algebraic solution for four phenotypes from three variables
                    Therefore use Likelihood method with correction
            Ex.: Best a priori likelihood estimate of f(O) is observed [f("O")]

Data from Aka (Mbenga) (Central African Republic) (Cavalli-Sforza & Bodmer 1971)

ABO
      calculations from Cavalli-Sforza & Bodmer 1971

HOMEWORK calculate Chi-square for the Observed vs Reconstructed counts


Evolutionary Genetics:
    modification of Hardy-Weinberg conditions

Hardy-Weinberg Proportions offer 'null hypothesis':
      Consequences of other genetic / evolutionary phenomena?

     Five major, interacting factors:

      1. Natural selection
            Change of allele frequencies (q) [read 'delta q']
                  occurs due to differential effects of alleles on 'fitness'
            Consequences depend on dominance of fitness
                    [See hardy-weinberg.m MATLAB laboratory exercise]

            Natural Selection is the principle concern of micro-evolutionary theory

      2. Mutation
             A & A' inter-converted at some rate µ
             If µ(AA')    µ'(AA'), net change in frequency

      3. Gene flow
            Movement of alleles between populations at some rate m
            (Im)migration introduces new alleles, changes frequency of existing allele (SR2019 3.12)

      4. Statistical sampling error
            Chance fluctuations occur in finite populations, especially with small N (SR2019 3.7)
            Genetic drift: random change of allele frequencies
                                   over time and (or) space, within and (or) among populations
            Modification of N from non-random reproduction: variable sex ratio, offspring number, population size, etc.

      5. Population structure
           Inbreeding (SR2019 3.2): preferential mating of relatives at some rate F
               
Inbreeding modifies genotype proportions but not allele frequencies
           Assortative Mating
(SR2019 3.4): differential mating of phenotypes and (or) genotypes

           Meta-population structure (SR2019 3.8): sub-populations differ wrt total population (F-statistics)         



All text material © 2022 by Steven M. Carr