Derivation of Inbreeding equations 

Inbreeding coefficient = expectation that two alleles are identical by descent (autozygous):
                                                    exact genetic copies of a DNA sequence in common ancestor.

Alleles may be identical by allelic state, and not identical by descent.
    Theory and equations should be applied only to the latter.
          If it is assumed [see note below] that observed allelic variants arise only once,
                identity by allelic state is the same as identity by descent
            

F is also the proportion of population that is inbred at any locus: the fraction of individuals with two alleles identical by descent.
        Then, homozygosity at any locus indicates identity by descent

What is the effect of inbreeding on genotype proportions?

In the absence of inbreeding, expected f(AA) = p2
                                                               f(AB) = 2pq
                                                               f(BB) = q2
In the presence of inbreeding,

f(AA) = (1 - F)(p2)  +  (F)(p)(1)  =  p2 - Fp2 + Fp  =  p2 + Fp(1 - p)  =  p2 + Fpq
       fraction (1 - F) of population not inbred:
            expected frequency of AA homozygotes among these = p2
       fraction (F) of population inbred:
            fraction p of these individuals have A allele
            If inbred, other allele must also be A, with probability = 1

f(AB) = (1 - F)(2pq) + (F)(0)     =  2pq - 2Fpq      =  2pq (1 - F)
        fraction (1- F) of population not inbred:
           the expected frequency of AB heterozygotes among these is 2pq
        fraction (F) of population inbred:
           among these, no heterozygotes, since alleles not identical.

f(BB) = (1 - F)(q2) + (F)(q)(1)  =  q2 - Fq2 + Fq  =  q2 + Fpq
     Follow same logic for f(AA) above, applied to B allele


Historical note: Classical versus Balanced views on genetic variation in natural populations

    The so-called Classical School argued that only a small fraction of 1% of loci were polymorphic (with more than one allele), and that such alternative alleles as existed were typically rare. Such alleles were most obvious in genetic diseases caused by homozygosity for a deleterious recessive allele (aa), where the a allele arose by new mutation. Under such circumstances, the assumption of the equivalence of allelic state and identify by descent in the above calculations is justified. The Hardy-Weinberg expectation of homozygosity for a rare recessive allele is extremely low: recall that if f(a) = 0.001 then f(aa) = 10-6. However, if deleterious alleles arrive in a small population by chance migration of a single family that includes multiple Aa heterozygotes so as to increase f(a), marriages in subsequent generations may dramatically increase f(aa) [see calculations]. Studies of particular medico-genetic anomalies in small, closed populations seemed to support this.

    The alternative Balanced School argued that a considerably larger fraction of loci, on the order of several percent, were polymorphic, because the alternative alleles at a locus were beneficial to a population or species when maintained in heterozygous genotypes (Aa). Such variation was evident in natural populations, where morphological variation could be shown to follow Mendelian rules. Different allelic variants were adaptive in different places and at different times, and perhaps in different tissues. The classical studies of altitudinal and seasonal variation in chromosome types of wild Drosophila were argued to be due to different alleles brought together over multiple loci. In such circumstances, inbreeding might be beneficial to a species, as a means of creating multiple homozygous genotypes (AA, A'A', A"A", aa, etc.) that may be adaptive for novel environments. Studies of natural populations seemed to support this.

    The introduction of protein electrophoretic data to studies of humans and other species starting in 1967 showed much more variation than even the Balanced School had anticipated [see lecture notes]. Rather than resolving the argument in favor of the Balanced interpretation, the ground shifted to an argument as to whether the observed variation (amino acid substitutions resulting in protein charge differences) was adaptively significant (Selectionists vs Neutralists). The ground has shifted again with the widespread availability of DNA sequence data, which shows directly the relative proportions of third-position "silent" mutations versus substitution mutations that alter amino acid sequences, which may or may not alter function.
The essential argument, whether any particular class of observable genetic variation is adaptive, and if so how, remains after more than 100 years.


Text material © 2021 by Steven M. Carr