Estimation of selection
coefficients for the sickle-cell allele S
The standard allele at the Beta-Globin
locus is designated A, and most individuals
are homozygous for this allele (AA). An alternative
allele, S, when homozygous (SS) results in
sickle-cell anemia, a severe form of anemia that
typically results in death at an early age. Individuals
that are heterozygous (AS) are said to have sickle-cell
trait, a much milder form of anemia that is
seldom life-threatening. [Be sure to distinguish "trait"
from "anemia"]. In the early 1950s, it was proposed
that AS individuals in West Africa were at a
selective advantage to AA individuals, because
the AS phenotype protected individuals from
malaria. [Both A and S actually
represent multiple alleles with the same or similar
effect. Recall that S alleles arise from a
2nd-position SNP in the sixth triplet, so as to
substitute Val -> Glu
in the protein].
In a study designed to test this
hypothesis, a group of 30,923 West African adults were
typed, with results as shown (Line 1). From these
data, the observed frequencies of each genotype
are easily determined (Line 2, left), and from
these data the observed frequencies f(A)
and f(S) are determined by the usual calculation (Line
2, right).
Given p = f(A)
and q = f(S), the expected genotype
frequencies f(AA), f(AS), and f(SS)
are p2, 2pq, and q2
, respectively (Line 3), and the expected numbers
of individuals is that proportion out of 30,923 (Line 4).
A Chi-Square analysis (Line 5)
of the observed (Line 1) versus the expected
(Line 4) numbers of adults with each genotype
indicates a highly significant (p <<< 0.0001)
deviation from expectation. As predicted by the
hypothesis, the deviation is due to (1) a higher than
expected proportion of AS individuals, consistent
with a selective advantage relative to AA individuals,
and (2) a much lower than expected number of SS individuals,
consistent with the known selective disadvantage of
sickle-cell anemia.
To estimate the selection
coefficients against the AA (sAA)
and SS (sSS)
genotypes in a malarial environment, we would ideally need
the genotype counts in a group of Newborns and in those
group members who survive to adulthood. In this study, we
lack the former: however, estimation requires only
knowledge of relative viabilities of AA and
SS with respect to AS as the optimal
genotype (W = 1.0). We may assume that allele
frequencies are presently at equilibrium (q
= 0.0), that is, f(A) = 0.9092 and f(S) =
0.0908 (Line 2), such that the expected frequencies in
Newborns are given by Line 3. Suppose that the 30,923
adults examined in Line 1 are the survivors of a group of
arbitrary size*
(here, 40,000) with genotype counts as expected
for a population of that size (Line 6). Then, the Viability
(V) of each genotype is simply the observed
adult count divided by the estimated newborn
count (Line 1 / Line 6 = Line 7). Divide all
viabilities by the optimum viability (VAS)
to obtain their normalized Fitness with respect
to WAS =
1.0 (Line 8). Express the Fitness values WAA
and WSS as selection
coefficients sAA = 1 - WAA
and sSS = 1 - WSS
respectively. [Note the
alternative calculations below the lower Box 2 that
combines these into a single operation].
HOMEWORK 1: The S
allele occurs at lower frequencies in some Middle
Eastern countries. A recent survey of 56,000 hospitalized
patients in one such country identified 1,120 with
sickle-cell disease (SS) and 13,440 with
sickle-cell trait (AS). Based on the example here,
calculate observed and expected allele and genotype
frequencies, perform the Chi-Square analysis, and estimate
the selection coefficients against AA and SS.
HOMEWORK 2:
(1) We estimated selection coefficients based on estimated
numerical proportions of newborns, based on
equilibrium allele frequencies calculated from adults.
Is this valid, or not? Explain. (2) The arbitrary
assumption of a newborn group of size N (Line 6)
may strike you as odd: why does the exact size not make a
difference? [Hint: both questions can be answered
numerically].
HOMEWORK
3: The example is based on
calculations in Box 7.7 in Nielsen
& Slatkin (2013), who obtain
different numbers in Lines 4 ~ 7. Can you account for
the differences? [Hint: is there a possible rounding
error from using 3- vs 4-place numbers?]
Example
modified from © 2013 Sinauer; © 2021 by Steven M. Carr