Principles of Human Population Genetics

Dr. Steven M. Carr
Department of Biology
Memorial University of Newfoundland
St. John's NL A1B 3X9,

Canada

e-mail: scarr@mun.ca

Genetic Research in my lab

Med6390 - Principles of Human Population Genetics
Index
    1. Theory of Allele Frequencies
    2. The Hardy-Weinberg Theorem
    3. Evolutionary Genetics
    4. Natural Selection [ZIP file for NatSel executable & Simulation assignment]
             See discussion of Norm of Reaction
    5. Mutation
    6. Gene Flow
    7. Population Structure
    8. Genetic Drift
     Mutation, Migration, Inbreeding, & Drift (updatd 07 Oct 2004)

Updated 07 October 2004

Mendelian Genetics concerns the behavior of gene loci in single crosses
       two parents contributes one allele / locus
       expected outcomes are characteristic ratios: 1:2:1, 3:1
          for multiple loci: 1:2:1:2:4:2:1:2:1, 9:3:3:1, etc

Population Genetics concerns the behavior of loci in multiple crosses
       N parents contribute one allele @

Various aspects of "Population"
     gene pool (a genetic unit):
          all the alleles at a (single) locus
     deme (an ecological or demographic unit):
          all the individuals in an area
     panmictic unit (a reproductive unit):
          a group of randomly interbreeding individuals
     sample (a numerical unit):
          a statistical subset of size 'N'

Theory of allele frequencies: p's & q's

Genetic variation in populationscan be described by genotype and allele frequencies.
(not "gene" frequencies)

Consider a diploid autosomal locus with two alleles and no dominance
(=> semi-dominance: AA , Aa , aa phenotypes are distinguishable)

# AA = x # Aa = y # aa = z x + y + z = N (sample size)

f(AA) = x / N f(Aa) = y / N f(aa) = z / N

f(A) = (2x + y) / 2N f(a) = (2z + y) / 2N

or f(A) = f(AA) + 1/2 f(Aa) f(a) = f(aa) + 1/2 f(Aa)

let p = f(A), q = f(a) p & q are allele frequencies

Properties of p & q

p + q = 1 p = 1 - q q = 1 - p

(p + q)²= p² + 2pq + q² = 1

(1 - q)² + 2(1 - q)(q) + q² = 1 [Homework: show this algebraically]

p & q are interchangeable wrt [read, "with respect to"] A & a;

q is usually used for the
rarer, recessive, or deleterious (disadvantageous) allele;

              BUT   'common' & 'rare' are statistical properties
                           'dominant' & 'recessive' are genotypic properties
                           'advantageous' & 'deleterious' are phenotypic properties
                  *** any combination of these properties is possible ***

The Hardy-Weinberg Theorem

What happens to p & q in one generation of random mating?

Consider a population of monoecious organisms
reproduction by random union of gametes ("tide pool" model)...

      (1) Determine the expectations
            of parental alleles coming together in various genotype combinations.
            [expectation: the anticipated value of a variable probability]

            The probability / binomial expansion / Punnet Square methods
            all show that expectation of f(AA) = p²
                                 expectation of f(Aa) = 2pq
                                 expectation of f(aa) = q²

(2) Re-describe allele frequencies among offspring (A' & a').

f(A') = f(AA) + 1/2 f(Aa)
= p² + (1/2)(2pq) = p² + pq = (p)(p+q) = p' = p

f(a') = f(aa) + 1/2 f(Aa)
= q² + (1/2)(2pq) = q² + pq = (q)(p+q) = q' = q

The Hardy - Weinberg Theorem (1908):
     In the absence of other genetic or evolutionary factors,
        allele frequencies are invariant between generations, &
            constant genotype frequencies are reached in one generation.

p² : 2pq : q² are Hardy-Weinberg proportions (cf. Mendelian ratios 1 : 2 : 1 )

[avoid the phrase "Hardy-Weinberg equilibrium":
H-W proportions occur under non-equilibrium conditions].

The Hardy-Weinberg Theorem holds under "more realistic" conditions:

(1) multiple alleles / locus

p + q + r = 1
(p + q + r)²= p² + 2pq + q² + 2qr + r² + 2pr = 1

The proportion of heterozygotes (H = 'heterozygosity')
is a measure of genetic variation at a locus.

H_obs = f(Aa) = observed heterozygosity
H_exp = 2pq = expected heterozygosity (for two alleles)

H_e = 2pq + 2pr + 2qr = 1 - (p² + q² + r²) for three alleles

                                n
                  H_e = 1 - (q_i)²      for n alleles
                               i=1

                        where q_i = freq. of ith allele of n alleles at a locus
             Ex.: if q₁ = 0.5, q₂ = 0.3, & q₃ = 0.2
                            then H_e = 1 - (0.5² + 0.3² + 0.2²) = 0.62

Ex.: PopGen at ABO locus

            (2) sex-linked loci
                    iff [read: "if and only if"] allele frequencies in males and females are identical
                    If frequencies are initially unequal, they converge over several generations.

            (3) dioecious organisms (like humans)
                    sexes are separate
                    H-W is produced by random mating of individuals (random union of genotypes).
                        expand (p² 'AA' + 2pq 'AB' + q² 'BB')² :
                               nine possible 'matings' among genotypes
                              (See derivation)

[Also holds if no selfing (self-fertilization) is possible]

Application of Hardy-Weinberg to population biology

Genotype proportions in natural populations can be tested for H-W conditions
H_o (null hypothesis): no outside factors are acting.

Ex.: MN blood groups in Homo

Among North American whites:

MM MN NN Sum

1787 3039 1303 6129

f(M) = [(2)(1787) + 3039] / (2)(6129)= 0.539

f(N) = [(2)(1303) + 3039] / (2)(6129)= 0.461 = 1.0 - 0.539

Chi-square (²) test:

# expected observed (obs-exp) d²/exp

MM (0.539)²(6129) 1781 1787 6 0.020

MN (2)(0.539)(0.461)(6129) 3046 3039 -7 0.012

NN (0.461)²(6129) 1302 1303 1 0.000

² = 0.032^ns

(cf. critical value p_{.05[1 d.f.]} = 3.84) ( p >> 0.05)

note: there is only one degree of freedom, because there are only two alleles

     But (you ask) won't "expected" always more or less equal "observed",
            cuz that's where "expected" comes from?
            Consider an artificial data set :

	MM	MN	NN	Sum	f(M)	f(N)
Navaho (US)	305	52	4	361	0.92	0.08
Koori (Aus.)	22	216	492	730	0.18	0.82
Combined	327	268	496	1091	0.42	0.58

(Homework: show that Navaho & Koori populations exhibit H-W proportions)

Chi-square test on combined data:

exp obs d=(o-e) d²/exp

MM 192 327 135 94.9

MN 532 268 -264 131.0

NN 367 496 129 45.3

² = 271.2^**

(p << 0.01)

      A mixture of populations, each of which shows Hardy-Weinberg proportions,
            will not show expected Hardy-Weinberg proportions
            if the allele frequencies are different in the separate populations.

           Wahlund Effect: an artificial mixture of populations
                                        (or a structured metapopulation)
                                           will have a deficiency of heterozygotes

Evolutionary Genetics:
modification of Hardy-Weinberg conditions

The Hardy-Weinberg conditions are the 'null hypothesis':
What are the consequences of other genetic / evolutionary phenomena?
How do they interact?

Five major factors:

      1. Natural selection
            Change of allele frequencies (q) [read as 'delta q']
                  occurs due to differential effects of alleles on 'fitness'
            Consequences depend on dominance of fitness
            Natural Selection is the principle concern of "microevolutionary" theory

      2. Mutation
             A and A' are inter-converted at some rate µ .
             If µ(AA') µ'(AA'), net change will occur in one direction.

      3. Gene flow
            Net movement of alleles between populations occurs at some rate m .
            (Im)migration introduces new alleles, changes frequency of existing alleles.

      4. Population structure
           Inbreeding: preferential mating of relatives at some rate F (see Homework).
           Non-random reproduction: variable sex ratio, offspring number, population ize

      5. Statistical sampling error
            Chance fluctuations occur in finite populations, especially those with small size N.
            Genetic drift: random change of allele frequencies
                                 over time & among populations (see Homework)

The Mathematical Theory of Natural Selection

      "Natural Selection" is the name given to an evolutionary process
            in which "adaptation" occurs in such a way that "fitness" increases.
            Under certain conditions, this results in descent with modification.

      If:     variation exists for some trait, and
                a fitness difference is correlated with that trait, and
                the trait is to some degree heritable (determined by genetics),
      Then: the trait distribution will change
                over the life history of organisms in a single generation,
                    and between generations.

The process of change in the population is called "adaptation"

That's all.

The General Selection Model

Evolution & Natural Selection can be modeled genetically.
       variation = variable p & q
      fitness = differential phenotypes of corresponding genotypes
      heritability = Mendelian principles

Natural Selection results in change of allele frequency (q) [read as "delta q"]
in consequence of differences in the relative fitness (W)
of the phenotypes to which the alleles contribute.

Fitness is a phenotype of individual organisms.
    Fitness is determined genetically (at least in part).
    Fitness is related to success at survival AND reproduction.
    Fitness can be measured & quantified (see below).
          i.e., the relative fitness of genotypes can be assigned numerical values.

The consequences of natural selection depend on the dominance of fitness:
e.g., whether the "fit" phenotype is due to a dominant or recessive allele.

Then, allele frequency change is predicted by the General Selection Equation:

q = [pq] [(q)(W₂ - W₁) + (p)(W₁ - W₀)] /

where W₀, W₁, & W₂ are the fitness phenotypes
of the AA, AB, & BB genotypes, respectively [see derivation]

Consider the simplest case: Complete Dominance

genotype: AA AB BB
phenotype: W₀ = W₁ W₂ (AA and AB have identical phenotypes)

Then the GSE simplifies to q = pq²(W₂- W₁) (since W₁ - W₀ = 0)

If 'B' phenotype is more fit than 'A' phenotype,
W₂ > W₁ & q > 0 so q increases.

If 'B' phenotype is less fit than 'A' phenotype,
W₂ < W₁ & q < 0 so q decreases.

            then q (W₂ - W₁) : the greater the difference in fitness,
                                                     the greater the intensity of selection
                                                    and the more rapid the change

A numerical example of Selection:
       Tay-Sachs Disease is caused by an allele
             that is rare         (q 0.001)
                      recessive (W₀ = W₁= 1)
                      lethal         (W₂ = 0)

Then q = pq²(W₂- W₁) = -pq² -q² (since p 1)

That is, Natural Selection results in a decrease in the frequency of
the Tay-Sachs allele of about one part in a million (0.001²) per generation

An alternate notation with selection coefficients simplifies the math

s = 1 - W

        The selection coefficient (s) is the difference in fitness
            of the phenotype relative to some 'standard' phenotype
            that has a fitness W = 1
            [The math is simpler because only one variable is used for fitness.]

(1) Complete dominance

      genotype:   AA      AB     BB
      phenotype: W₀ = W₁ W₂    (AA and AB have identical phenotypes)
                or       1    =   1   1 - s

if 0 < s < 1 : 'B' is deleterious(at a selective disadvantage)
if s < 0 : 'B' is advantageous

then q = -spq² / (1 - sq²) [see derivation]

(2) Incomplete dominance

      genotype:    AA     AB        BB
      phenotype: W₀ W₁    W₂    (all phenotypes different)
         or            1 - s₁   1   1 - s₂

      if 0 < s₁ & s₂ < 1 : overdominance of fitness (heterozygote advantage)
      The population has optimal fitness when both alleles are retained:
           q will reach an equilibrium where q = 0
                   0 < < 1   (read as, "q hat")

then = (s₁) / (s₁ + s₂) [see derivation]

The General Selection Model: Summary

      Direction of allele frequency change is due to fitness difference of alleles
            (whether the effect of the allele on phenotype is deleterious or advantageous).
      Ultimate consequences depend on the dominance of fitness
            (whether the allele is dominant, semi-dominant, or recessive).
      Rate of change is an interplay of both of these factors (see Lab #1)

AA AB BB Consequence of natural selection [ let q = change in f(B) ]

W₀ = W₁ = W₂ No selection (neither allele has a selective advantage):
then q = 0, H-W proportions remain constant

W₀ = W₁ > W₂ deleterious recessive (advantageous dominant):
then q < 0, q 0.00 (loss): how fast? [Does it get there?]

W₀ = W₁ < W₂ advantageous recessive (deleterious dominant):
then q > 0, q 1.00 (fixation): how fast?

      W₀ < W₁ > W₂    overdominance [special case of semi-dominance]:
                                          heterozygote superiority
                                    q   , where q = 0

Demonstration #1: Natural Selection on Deleterious & Advantageous Recessive alleles
Overdominant Selection

Fitness, Adaptation, & Natural Selection in real populations
Fitness is
      a phenotype of organisms and populations
      quantifiable relative to other organisms and populations
      related to capacity for survival and reproduction
      variable in space & time; short-term and long-term (see below)

Short-term measures: "Life Table" parameters

rate of instantaneous increase (r) of a phenotype

recall logistic equation: dN / dt = rN = rN (K - N) / K
where K = carrying capacity

net reproductive rate: exp(r) = e^r
r is "compound interest" on N

replacement rate (R_O): lifetime reproductive output
~ e^r(at low density)

         components of fitness: traits that contribute to survival & reproduction
         Ex.: survivorship (expected survival time)
                 fecundity       (# offspring at age x)

Adaptation is the phenotypic consequence for populations of natural selection on individuals
[cf. adjust / acclimate]

Phenotypic traits that change as a result of selection
are sometimes referred to as "adaptations" or "adaptive characters"

Measuring 'fitness' and observing 'adaptation' in natural populations

Life table analysis: survivorship and fecundity vary with age

     l_x = prob. of survival from birth to age x (cumulative)
           survivorship = probability of survival to age x+1 from age x
     m_x = fecundity (# offspring) at age x

                 L
     then      (l_x)(m_x) exp(-rx) = 1 (in a stable population,
                x=1                                      where L = life expectancy)

             L
      R_o=(l_x)(m_x)   replacement rate e^rat low density
           x=1

This equation is a discrete solution to the continuous logistic equation

R_o can be calculated for two reproductive 'strategies'
as a measure of their relative 'fitness'

Consider a population with two demographic phenotypes:
      These phenotypes correspond to two reproductive 'strategies
       iteroparous strategy: offspring produced over several seasons
       semelparous strategy: offspring produced all in one season

A survivorship and fecundity schedule will compare their life histories
*=> life table parameters can be measured experimentally <=*

      Under 'typical' environmental conditions, survivorship is 50% / year:
            both strategies produce 2 young / female / lifetime
                  => both phenotypes are equally 'fit' [and N is stable]

      In 'good times', survivorship increases to 75% / year:
            iteroparous strategy produces 4 young / female / lifetime
            semelparous strategy produces 3 young / female / lifetime
                  => iteroparous phenotype is 'more fit' [and N is increasing]

      In 'bad times', survivorship decreases to 25% / year:
            iteroparous R_o = 0.72,   semelparous R_o = 1.00
                  => semelparous phenotype is 'more fit' [and N is decreasing]

=> Population phenotypes will adapt to changing conditions

      In a favourable environment, K increases:
       e.g., productivity of meadow increases
                    iteroparity more advantageous, population density increases

      In an unfavourable environment, r increases:
       e.g., severity of winter highly variable
                  semelparity more advantageous, early reproduction favoured

K-strategy: maintain population size N close to K
long-lived, reproduce late, smaller # offspring, lots of parental care
E.g., many bird species, primates (including Homo)

r-strategy: maximize growth potential r
short-lived, reproduce early, larger # offspring, little parental care
E.g., most invertebrates, some rodents

Natural Selection on multilocus traits: Quantitative genetics

We can extend single-locus multilocus quantitative models

      p²:2pq:q²                       W₀,W₁,W₂          Mendel's Laws & H-W Theorem

normal distribution     fitness function               heritability

Variation can be quantified (see a Primer of Statistics for review)

       mean standard deviation:
      variance: ²
      coefficient of variation (CV) = (/) x 100

      CV removes size effect when comparing variance:
        Ex.: Suppose X = leg length     Y = eye width
                X = 100 1.0 versus Y = 1.0 0.1
                CV of X = 1%        CV of Y = 10%
                Y is more variable, though _X is larger

      Quantitative variation follows "normal distribution" (bell-curve) iff
              Multiple loci are involved
              Each locus has about the same effect
              Each locus acts independently
                    [interaction variance (see below) is minimal]

Variation has two sources: genetic (_G²) & environmental (_E²) variance

      phenotypic variance      _P² = _G² + _E² + _GxE²
      additive variance            _A² = _G² + _E²
      heritability                        h² = _G²/_A² = _G² / (_G² + _E²)

          "heritability in the narrow sense": ignores _GxE² interaction variance:
           Identical genotypes produce different phenotypes in different environments.
               Ex.: same breed of cows produces different milk yield on different feed
           The Norm of Reaction describes the relationship between genetics and environement

Artificial breeding indicates that organismal variation is highly heritable
      ex.: Darwin's pigeon breeding experiments
             Artificial selection on agricultural species
                  Commercially useful traits can be improved by selective breeding
             IQ scores in Homo: h² 0.7
                   [But: IQ scores improve with education: _GxE² is large]
            Offsrping / Midparent correlation

            For many traits in many organisms:
            CV   =     5 ~ 10 %
            h²    =     0.5 ~ 0.9

Fitness function expresses relationship between genotype & fitness
Function is a continuous variable, rather than discrete values for W₀, W₁, & W₂

=> Most traits vary & are heritable.
Many traits do respond to 'artificial' selection.
Many traits should respond to 'natural' selection.

=> To demonstrate & measure Natural Selection,
we must show experimentally that heritable variation has consequences for fitness <=

Modes of Selection in natural populations

Quantitative trait distribution can be described as a bell curve
with a particular mean & variance:

What happens to this distribution under Selection?

(1) Directional Selection

      Fitness function has constant slope:
      Trait mean shifted towards favored phenotype
            trait variance unaffected

In single-locus models, the limit of selection is
Elimination of variation by fixation of favored allele

    In quantitative models, rate is limited by
   substitutional genetic load:
            "cost" of replacing non-favored allele ( "intensity" of selection)

"Hard" selection
          Mortality is density-independent
          In Lab #1: N_(after) < N_(before)
        Load is cumulative

(N) over time as q 0
               Fitness is more or less absolute: less realistic, easier to model
       Ex.: Exercise #2, in a malarial environment, 50% die before reproduction.
                 Population "after" is much smaller than "before",
                 but rebounds to N only at start of next generation

   "Soft" selection
          Mortality is density-dependent
          In 'real' stable populations:   N_(after) N_(before)

          Survivorship is proportional to fitness up to K: more realistic
              Selection will affect recruitment to next generation
         Ex.: If the first-born dies of malaria, s/he will be replaced.
                       More births occur such that N is continually "topped up".
                       Birth of succeeding offspring will maintain N near K

(2) Stabilizing Selection (AKA truncation selection)
      Fitness function has a "peak"
      Trait variance reduced around (existing) optimal phenotype,
            trait mean unaffected

      Limits: elimination of variant alleles
              or, 'weeding out' of disadvantageous variants
              homozygosity at multiple loci:
                    difficult iff variance due to recessive alleles
             inbreeding depression: loss of 'health' in inbred lines

Examples:
    Elimination of non-cryptic pepper moths (Biston)
        melanistic variants are eliminated rapidly in light-colored environments
    peppered variants are reduced slowly in dark-colored environments

Birthweight in Homo (Karn & Penrose 1951)
Modal birthweight is optimum for survival

(3) Diversifying Selection (two kinds)
There is a lot of variation: does selection explain it?

(A) Balancing Selection:
Fitness function has more than one peak (multi-modal)

Maintaining heterozygosity (allelic & genotypic variation) by selection

      Overdominance: heterozygotes have superior fitness at a locus
                  because different alleles are favoured in different environments
      Examples:
       sickle-cell hemoglobin in Homo ('Contradictory' selection)
            Leucine Aminopeptidase (LAP) & salinity tolerance in Mytilus mussels
       heterodimers:
                  multimeric enzymes with polypeptides from different alleles
                     often show wider substrate specificity, kinetic properties (V_max & K_M)
           myoglobin in diving mammals

Heterosis: heterozygosity at multiple loci improves general fitness
Hybrid vigour: crossbreeding of inbred lines improves fitness in F₁

      Marginal epistasis: high 'H_obs' is 'good for you'
       Ex.: correlation between phenotype & genotype: antler points in Odocoileus deer
       Ex.: fluctuating asymmetry: Acionyx cheetahs are lopsided

Maintaining polymorphic phenotypic variation by selection

      Sexual Selection (Darwin 1871):
            'exaggerated' phenotypes are disadvantageous somatically
                but are favoured in competition for mates

            secondary sex characteristics:
          Sexual dimorphism in mallards, peafowl, & lions
          Antlers in Cervidae are used in male-male combat
          Tail displays in peacocks attract mates

       'Runaway sexual selection': the Madonna / Ozzy Osborne Effect
                Females choose males on basis of some distinctive trait
                  Offspring have exaggerated trait (males) & preference for trait (females)
                     => selection reinforces trait & preference for trait simultaneously
                            New phenotype spreads rapidly in population

(B) Disruptive selection
Fitness function is a valley
Trait variance increases (like balancing), BUT polymorphism is unstable

[Try NatSel with: q = 0.5, N = 9999, W0 = 1.0, W1 = 0.7, W2 = 1.0]

      Polymorphism can usually be maintained only temporarily:
            One of the phenotypes will outcompete the other
       unless different phenotypes choose different niches (Ludwig Effect)
                [and then this becomes Balancing Selection]

      Scutellar bristles in Drosophila (Thoday & Gibson 1962)
            Selection for 'high #' versus 'low #' lines
                  => 'pseudo-populations' with reduced interfertility
          Might disruptive selection contribute to speciation?

Natural Selection at other levels: Genic & Kin Selection

Natural selection is ordinarily defined as
    differential survival & reproduction of individuals:
      Can selection operate on other biological units?
      Can such selection 'oppose' individual selection?

Genic (Gametic) Selection
Differential survival & 'reproduction' of alleles

      Meiotic Drive: t-alleles in Mus
       tt is sterile (W = 0)
       Tt is 'tail-less' (cf. Manx cats) (W < 1)
           t alleles are preferentially segregated into gametes (80~90%)
                  => f(t) is high in natural populations (40~70%)
                       even though it is deleterious to individuals

Kin (Interdemic) Selection
Differential survival & reproduction of related (kin) groups (families)

      Related individuals share alleles: r = coefficient of relationship [see derivation]
            offspring & parents are related by r = 0.50   [They share half their alleles]
            full-sibs                 "    "                     r = 0.50
            half-sibs                "    "                    r = 0.25
            first-cousins           "    "                   r = 0.125

Inclusive fitness (W_i) of phenotype for individual i
= direct fitness of i + indirect fitness of relatives j,k,l,...

W_i = a_i + (r_ij)(b_ij) summed over all relatives j,k,l,...

            where: a_i    = fitness of i due to own phenotype
                    b_ij = fitness of j due to i's phenotype
                    r_ij   = coefficient of relationship of i & j

Example: What is the fitness value of an alarm call?
When a predator approaches, should i warn j , or keep silent?

              If i & j are unrelated
                   warn:          W_individual = 0.0 + (0.0)(1.0) = 0.0
                   don't warn: W_individual = 1.0 + (0.0)(0.0) = 1.0
                => Such behaviors should not evolve among unrelated individuals

            What is the fitness value in a kin group?
                   W_brothers= 0.0 + [(0.5)(1.0) + (0.5)(1.0)] = 1.0
                   W_cousins= 0.0 + [8][(0.125)(1.0)] = 1.0

J.B.S. Haldane (1892-1964):
"I would lay down my life for two brothers or eight cousins."

Mutation, Migration, Inbreeding, & Genetic Drift in natural populations

How do mutation, migration, inbreeding, and genetic drift interact with selection?

Do they maintain or reduce variation?
Can they maintain variation at a high level?
What is their significance in population (short-term) & evolutionary (long-term) biology?

(1) Mutation / selection equilibrium

      Deleterious alleles are maintained by recurrent mutation.
      A stable equilibrium(where q = 0) is reached
            when the rate of replacement (by mutation)
            balances the rate of removal (by selection).

       µ = frequency of new mutant alleles per locus per generation
           typical µ = 10^-6: 1 in 1,000,000 gametes has new mutant
                        _{_____}
            then =(µ / s)      [see derivation]

Ex.: For a recessive lethal allele (s = 1) with a mutation rate of µ = 10^-6
then = û = (10^-6 / 1.0) = 0.001

mutational genetic load
    Lowering selection against alleles increases their frequency.
        Medical intervention has increased the frequency of heritable conditions
            in Homo (e.g., diabetes, myopia)
    Eugenics: modification of human condition by selective breeding
            'positive eugenics': encouraging people with "good genes" to breed
            'negative eugenics': discouraging people with 'bad genes'' from breeding
           e.g., immigration control, compulsory sterilization
                          [See: S. J. Gould, "The Mismeasure of Man"]

Is eugenics effective at reducing frequency of deleterious alleles?
What proportion of 'deleterious alleles' are found in heterozygous carriers?

(2pq) / 2q² = p/q 1/q (if q << 1)

if s = 1 as above, ratio is 1000 / 1 : most of variation is in heterozygotes,
not subject to selection

(2) Migration / selection equilibrium

Directional selection is balanced by influx of 'immigrant' alleles;
a stable 'equilibrium' can be reached iff migration rate constant.

Consider an island adjacent to a mainland, with unidirectional migration to the island.
The fitness values of the AA, AB, and BB genotypes differ in the two environments,
so that the allele frequencies differ between the mainland (q_m) and the island (q_i).

AA AB BB

W₀ W₁ W₂ q

Island 1 1-t 1-2t q_i 0

Mainland 0 0 1 q_m 1

B has high fitness on mainland, and low fitness on island.
[For this model only, allele A is semi-dominant to allele B,
so we use t for the selection coefficient to avoid confusion]

m = freq. of new migrants (with q_m) as fraction of residents (with q_i)
if m << t q_i = (m / t)(q_m) [see derivation]

Gene flow can hinder optimal adaptation of a population to local conditions.

     Ex: Water snakes (Natrix sipedon) live on islands in Lake Erie (Camin & Ehrlich 1958)
            Island Natrix mostly unbanded; on adjacent mainland, all banded.
            Banded snakes are non-cryptic on limestone islands, eaten by gulls
            Suppose   A = unbanded     B = banded   [AB are intermediate]
                   Let      q_m = 1.0     ["B" allele is fixed on mainland]
                               m   = 0.05   [5% of island snakes are new migrants]
                                 t    = 0.5     so W₂ = 0 ["Banded" trait is lethal on island]
                   then    q_i= (0.05/0.5)(1) = 0.05
                 and     H_exp = 2pq = (2)(0.95)(0.05) 10%
                            i.e, about 10% of snakes show intermediate banding, despite strong selection

=> Recurrent migration can maintain a disadvantageous trait at high frequency.

(3) Inbreeding / selection

Inbreeding is the mating of (close) relatives
or, mating of individuals with at least one common ancestor

     F (Inbreeding Coefficient) = prob. of "identity by descent":
            Expectation that two alleles in an individual are
                exact genetic copies of an allele in the common ancestor
             or, proportion of population with two alleles identical by descent

This is determined by the consanguinity (relatedness) of parents.

Inbreeding reduces H_exp by a proportion F
(& increases the proportion of homozygotes). [see derivation]

        f(AB) = 2pq (1-F)
            f(BB) = q² + Fpq
            f(AA) = p² + Fpq

Inbreeding affects genotype proportions,
inbreeding does not affect allele frequencies.

Inbreeding increases the frequency of individuals
with deleterious recessive genetic diseases by F/q [see derivation]

Ex.: if q = 10^-3 and F = 0.10 , F/q = 100
=> 100-fold increase in f(BB) births

Inbreeding coefficient of a population can be estimated from experimental data:

F = ( 2pq - H_obs ) / 2pq [see derivation]

Ex.: Selander (1970) studied structure of Mus house mice living in chicken sheds in Texas

Variation at *Est*-4 locus
	obs	exp
AA	0.226	0.181
AB	0.400	0.489
BB	0.374	0.329

since p = 0.226 + (1/2)(0.400) = 0.426

& q = 0.374 + (1/2)(0.400) = 0.574

Then F = (0.489 - 0.400) / (0.489) = 0.182
which is intermediate between F_full-sib = 0.250
& F_1st-cousin = 0.125

=> Mice live in small family groups with close inbreeding
[This is typical for small mammals]

      Paradoxes of inbreeding:
            Inbreeding is usually thought of as "harmful":
                inbreeding increase the probability that deleterious recessive alleles
                    will come together in homozygous combinations
                    "Harmful" alleles are reinforced
            Inbreeding depression: a loss of fitness in the short-term due to
                    difficulty in conception, increased spontaneous abortion, pre- & peri-natal deaths
                    Ex.: First-cousin marriages in Homo
                        Two-fold increase in spontaneous abortion & infant mortality
                        Every human carries 3 ~ 4 "lethal equivalents"

               Demonsration #2: Selection & inbreeding in small populations

         However, in combination with natural selection, inbreeding can be "advantageous":
           increases rate of evolution in the long-term (q 0 more quickly)
                    deleterious alleles are eliminated more quickly.
           increases phenotypic variance (homozygotes are more common).
                    advantageous alleles are also reinforced in homozygous form

(4) Genetic Drift / selection

Genetic Drift is stochastic q [unpredictable, random]
(cf. deterministic q [predictable, due to selection, mutation, migration)

Sewall Wright (1889 - 1989): "Evolution and the Genetics of Populations"

Stochastic q is greater than deterministic q in small populations:
allele frequencies drift more in 'small' than 'large' populations.

Drift is most noticeable if s 0, and/or N small (< 10) [N 1/s]

q drifts between generations (variation decreases within populations over time) [DEMO];
eventually, allele is lost (q = 0) or fixed (q = 1) (50:50 odds)

           Ex: [Demonstration #3]
                      [Try: q = 0.5, W0 = W1 = W2 = 1.0, and N = 10, 50, 200, 1000;
                                repeat 10 trials each, note q at endpoint]

q drifts among populations (variation increases among populations over time);
eventually, half lose the allele, half fix it.

**=> Variation is 'fixed' or 'lost' & populations will diverge by chance <=**

      Evolutionary significance:
            "Gambler's Dilemma" : if you play long enough, you win or lose everything.
            All populations are finite: many are very small, somewhere or sometime.
            Evolution occurs on vast time scales: "one in a million chance" is a certainty.
            Reproductive success of individuals in variable: "The race is not to the swift ..."

What happens in the really long run?

      Effective Population Size (N_e)
            = size of an 'ideal' population with same genetic variation (measured as H)
                  as the observed 'real' population.
            = The 'real' population behaves evolutionarily like one of size N_e :
                  e.g., the population will drift like one of size N_e
            loosely, the number of breeding individuals in the population

Consider three special cases where N_e < or << N_obs [the 'count' of individuals]:

(1) Unequal sex ratio

Ne = (4)(N_m)(N_f) / (N_m + N_f)
where N_m & N_f are numbers of breeding males & females, respectively.

           "harem" structures in mammals (N_m << N_f)
            Ex.: if N_m = 1 "alpha male" and N_f = 200
                    then     N_e = (4)(1)(200)/(1 + 200) 4
              A single male elephant seal (Mirounga) does most of the breeding
                    Elephant seals have very low genetic variation

           eusocial (colonial) insects like ant & bees (N_f << N_m)
                     Ex.: if N_f= 1 "queen" and N_m= 1,000 drones
                                   then     N_e = (4)(1)(1,000)/(1 + 1,000) 4
                    Hives are like single small families

      (2) Unequal reproductive success
             In stable population, N_{offspring/parent}= 1
           "Random" reproduction follows Poisson distribution (N = 1 1)
                  (some parents have 0, most have 1, some have 2, a few have 3 or more)

X		N_e =	Reproductive strategy
1	1	N_obs	Breeding success is random
1	0	2 x N_obs	A zoo-breeding strategy
1	>1	< N_obs	K-strategy, as in Homo
1	>>1	<< N_obs	r-strategy, as in Gadus

(3) Population size variation over time

           N_e = harmonic mean of N = inverse of arithmetic mean of inverses
                    [a harmonic mean is much closer to lowest value in series]
                           n
           N_e = n / [ (1/N_i) ] where N_i = pop size in i th generation
                           i=1

       Populations exist in changing environments:
                Populations are unlikely to be stable over very long periods of time
                  10^-2 forest fire / 10^-3 flood / 10^-4 ice age

Ex.: if typical N = 1,000,000 & every 100th generation N = 10 :
then N_e = (100) / [(99)(10^-6) + (1)(1/10)] 100 / 0.1 = 1,000

       Founder Effect & Bottlenecks:
                Populations are started by (very) small number of individuals,
                    or undergo dramatic reduction in size.

         Ex.: Origin of Newfoundland moose (Alces):
                            2 bulls + 2 cows at Howley in 1904
                            [1 bull + 1 cow at Gander in 1878 didn't succeed].

       Population cycles: Hudson Bay Co. trapping records (Elton 1925)
                Population densities of lynx, hare, muskrat cycle over several orders of magnitude
                Lynx cycle appears to "chase" hare cycle

The effect of drift on genetic variation in populations

          Larger populations are more variable (higher H) than smaller
                  if s = 0: H reflects balance between loss of alleles by drift
                              and replacement by mutation

H = (4N_eµ) / (4N_eµ + 1)

Ex.: if µ= 10^-7 & N_e = 10⁶ then N_eµ = 1 and H_exp = (0.4)/(0.4 + 1) = 0.29

But typical H_obs 0.20 which suggests N_e 10⁵
Most natural populations have a much smaller effective size than their typically observed size.

Stochastic effects may be as or more important than deterministic processes in evolution.

	#	expected	observed	(obs-exp)	d²/exp
MM	(0.539)²(6129)	1781	1787	6	0.020
MN	(2)(0.539)(0.461)(6129)	3046	3039	-7	0.012
NN	(0.461)²(6129)	1302	1303	1	0.000
				² =	0.032^ns

	exp	obs	d=(o-e)	d²/exp
MM	192	327	135	94.9
MN	532	268	-264	131.0
NN	367	496	129	45.3
			² =	271.2^**

	AA	AB	BB
	W₀	W₁	W₂	q
Island	1	1-t	1-2t	q_i 0
Mainland	0	0	1	q_m 1