Calculation of
        F-statistics

F-Statistics as measures of population structure
a numerical example

    Consider three sub-populations of a global population, with observed genotype counts for each as indicated in the grey box. The following calculations are more easily appreciated if N is constant over sub-populations, otherwise their contributions must be weighted by its size.

    For each sub-population, observed allele counts #A & #a and frequencies f(A) & f(a) are easily calculated. Expected genotype frequencies are also easily calculated from the observed allele frequencies, for example
Hexp = (2)(fA)(fa). Global fA ("f(A) bar") is the mean of the observed f(A) over all three sub-populations. This is easily calculated as fA = (1200 + 1400 + 1600) / (2)(3000) = 0.7000. Then global fa ("f(a) bar") = (1 - fA). Finally, we showed previously that the "local" F for each sub-population is  F = (Hexp - Hobs) / Hexp = 1 - (Hobs / (Hexp). 

    Heterozygosity indices Hi, Hs, and Ht are simply H, calculated at different levels of the population structure. With equal N, these are easily calculated from the bold values in the table above, as


    Hi
= mean of observed f(Aa) = (0.432 + 0.378 + 0.288) / 3 = 0.3660
            This is the observed probability of heterozygosity for an individual drawn at random from the global population.


    Hs = mean of expected f(Aa) =  (0.480 + 0.420 + 0.320) / 3 = 0.4067
            This is the expectation of heterozygosity if the individual were drawn at random from the global population.
            H
i and Hs differ when sub-populations have different genetic structures.

    Ht
Expected
"global" heterozygosity from A bar and a bar = (2)(0.7)(0.3) = 0.4200
             This is simply the global (total) expectation of heterozygosity based on the global allele frequencies.

    Previously, we used F to measure the deficiency of heterozygotes due to inbreeding within sub-populations. This is one form of non-random mating, based on relationship. Non-random mating may also arise due to geographic population structure. For example,  if members of a sub-population are spread out geographically and do not mix uniformly, individuals are more likely to mate with a closer neighbor. Sub-populations are more likely to comprise related individuals, so geographic structure also increases non-random mating of relatives. The same is true of different sub-populations separated by larger or smaller geographic distances.

The three F statistics are hierarchical versions of the same concept. When N and local F are constant,

     Fis = mean deficiency of observed heterozygotes among individuals with respect to that expected across sub-populations.
                In this example, where local F is the same across sub-populations, Fis is equivalent to local F
    
Fit = mean deficiency of observed heterozygotes among individuals with respect to that expected for the total population,
                which in equivalent to Wahlund Effect, when allele frequencies differ across sub-populations.
     Fst = mean deficiency of expected heterozygotes among sub-populations with respect to that expected for the total population,
                which in this case is a measure of population differentiation within the total.
               
    F
st in various forms is the most widely used descriptor of population genetic structure with diploid data (nuclear DNA sequences, or allozymes). The concept can be extended to multiple sub-populations within a population, or multiple population levels within species. Equivalent measures can be calculated for haploid data (mtDNA).

    Heterozygosity and F-statistics can also be thought of as random-draw-with-replacement experiments. Draw an allele at random from a particular sub-population, and replace it. What is the expectation that a second allele drawn from the same, or a different, sub-population will be different? That is, what is the chance of drawing a heterozygous pair? Genetic structure will always reduce the expectation calculated from the global allele frequency, if different sub-populations exhibit different degrees of inbreeding and (or) differences in allele frequencies.

HOMEWORK: Two ways of calculating FST are shown, in terms of FIT & FIS  or HT & HS. SHOW that the two calculations are equivalent.


Figures & Text material © 2022 by Steven M. Carr