Consider three
sub-populations of a global population, with observed
genotype counts for each as indicated in the grey box. The
following calculations are more easily appreciated if N
is constant over sub-populations, otherwise their
contributions must be weighted by its size.
For each sub-population, observed allele
counts #A & #a and frequencies
f(A) & f(a) are easily calculated. Expected
genotype frequencies are also easily calculated
from the observed allele frequencies, for
example Hexp =
(2)(fA)(fa).
Global
fA
("f(A) bar") is the mean of the
observed f(A) over
all three
sub-populations.
This is
easily calculated as fA
= (1200 + 1400 + 1600) / (2)(3000) = 0.7000.
Then global fa
("f(a) bar") =
(1 - fA).
Finally, we
showed
previously
that the "local"
F for
each
sub-population
is F
= (Hexp
-
Hobs)
/ Hexp
= 1 -
(Hobs
/ (Hexp).
Heterozygosity
indices Hi, Hs,
and Ht are simply H,
calculated at different levels of the
population structure. With equal N,
these are easily calculated from the bold
values in the table above, as
Hi =
mean of observed f(Aa)
= (0.432 + 0.378 + 0.288) / 3 = 0.3660
This is the observed
probability of heterozygosity for
an individual drawn at random
from the global population.
Hs
=
mean of expected
f(Aa) =
(0.480 + 0.420
+ 0.320) / 3 =
0.4067
This is the expectation
of
heterozygosity
if the
individual
were drawn at
random from
the global
population.
Hi
and Hs
differ when sub-populations
have different
genetic
structures.
Ht =
Expected "global"
heterozygosity
from A bar and a
bar = (2)(0.7)(0.3) = 0.4200
This is simply the global (total)
expectation of heterozygosity based
on the global allele frequencies.
Previously, we
used F
to measure the
deficiency
of
heterozygotes
due to inbreeding
within
sub-populations.
This is one
form of non-random
mating,
based on
relationship.
Non-random
mating may
also arise due
to geographic
population
structure.
For
example,
if members of
a
sub-population
are spread out
geographically
and do not mix
uniformly,
individuals
are more
likely to mate
with a closer
neighbor.
Sub-populations
are more
likely to
comprise
related
individuals,
so geographic
structure also
increases
non-random
mating of
relatives. The
same is true
of different sub-populations
separated by
larger or
smaller
geographic
distances.
The
three F statistics are
hierarchical versions of the
same concept. When N and
local F are
constant,
Fis
= mean deficiency of observed
heterozygotes among individuals
with respect to that expected
across sub-populations.
In this example, where local F is
the same across sub-populations, Fis
is equivalent to local F
Fit
= mean deficiency of observed
heterozygotes among individuals
with respect to that expected
for the total population,
which in equivalent to Wahlund
Effect, when allele frequencies
differ across sub-populations.
Fst
= mean deficiency of expected
heterozygotes among sub-populations
with respect to that expected for
the total population,
which in this case is a measure of population
differentiation within the
total.
Fst
in various forms is the most
widely used descriptor of population
genetic structure with diploid data (nuclear
DNA sequences, or
allozymes). The concept can be
extended to multiple sub-populations
within a population, or multiple
population levels within species.
Equivalent measures can be calculated
for haploid data (mtDNA).
Heterozygosity and
F-statistics can also be
thought of as
random-draw-with-replacement
experiments. Draw an allele at random
from a particular sub-population, and
replace it. What is the expectation
that a second allele drawn from the same,
or a different, sub-population
will be different? That is, what is
the chance of drawing a heterozygous
pair? Genetic structure will always
reduce the expectation calculated
from the global allele frequency, if
different sub-populations exhibit
different degrees of inbreeding and
(or) differences in allele
frequencies.
HOMEWORK:
Two ways of calculating FST
are shown, in terms of FIT
& FIS or
HT & HS.
SHOW that the two calculations
are equivalent.