Estimation of allele frequencies from genotype data, with multiple alleles and dominance

 


    Estimation of allele frequencies for a locus with two co-dominant alleles, for example the MN blood system with two alleles M & N and three genotypes MM, MN, and NN corresponding to three phenotypes M, MN, and N, is straightforward from basic population genetics principles.

    Estimation of allele frequencies for a locus with multiple alleles & dominance is more complicated. The ABO blood group system is a good example. There are three alleles (A, B, & O) that give rise to six genotypes (AA, AO, BB, BO, OO, & AB) that determine four blood group phenotypes (A, B, AB, & O). Alleles A & B are dominant to O: AA & AO are both type A, and BB & BO are both type B. Population genetic data are typically reported as the observed frequencies or counts of phenotypes, based on the agglutination test. The task is to estimate the allele frequencies from the data, so as to generate the expected frequencies and counts of phenotypes. Observed and expected data can then be compared. However, basic algebra says that an exact solution of  n variables from n-1 quantities cannot be obtained (the system is under-determined).

    We use instead an approximate solution based on a Likelihood approach, with successive corrections. Likelihood methods use observed data or informed predictions to make or modify an a priori expectation.
 
Let
 

 ABO
                Blood group estimations

HOMEWORK [NOTE: the data are from a tribe properly referred to as the Aka, of the Mbenga people]: Calculate a Chi-Square analysis of the difference between the Observed vs Expected ("Reconstructed") counts, based on n = 163. Does the population show expected Hardy-Weinberg proportions? Would it makes a difference if n = 1630, with the same proportions?


Text © 2022 by Steven M  Carr