Coalescence Theory: of Marbles, SNPs & Taxis

(I) Place 2N marbles on a circular table, and let them roll about at random. It is a certainty that one marble will be the first to roll off, then another, and another, and so on until only one marble remains on the table. We want to know (or predict) something about the statistics of marbles, and proceed as follows.

The chance that any particular marble will be the first to roll off the table is 1/2N; the chance that any particular marble will be the last to roll off is also 1/2N. If the rate at which marbles roll off the table is r, then the expected interval between events is 1/r. Note that r is stochastically constant.

(II) Now consider a population of 2N individuals, where 2N allows the population to be treated as diploid individuals. Nielsen & Slatkin (2013) show

Eqn 3.1: The probability that any particular individual will coalesce [disappear into the population past] in the present generation is Pr(C) = 1/(2N)
                Then the probability that any particular individual will not coalesce is Pr(C') = (1 - 1/2N))

Eqn 3.2: The probability that any particular individual will not coalesce in r successive generations is Pr(C'r) = (1 - 1/(2N))r

Eqn 3.3: The probability that any particular individual will not coalesce in (r - 1) generations, and then coalesce in the rth generation, is
                    Pr(Cr) = [
(1 - 1/(2N))r-1][1/(2N)]

Eqn 3.4: Define an interval t = (r)(1/(2N)) as the expected time to coalescence. Then r = (2N)(t).
                    From Eqn 3.2, the expected interval between coalescence events is then

              Pr(C'r) = (1 - 1/(2N))2Nt

            "The Calculus" shows that as N approaches infinity, Pr(C'r) = e-t, where e is the base of natural logarithms.

            That is, the interval between coalescence events follows an Exponential Function.

(III) This function is also called the Taxi Cab Function. Suppose taxis arrive at a taxi stand at a stochastically constant rate of one every six minutes (r = 1/6). Having just arrived, we ask the other person waiting, How long since the last cab? She says, 10 minutes. Intuitively, we might think that this means that a cab is more likely to arrive sooner rather than later. However, the probability of a cab arriving in the next minute remains 1/6, no matter what has happened before. The next cab arrives after a further 2 minutes. Another person arrives, and asks you the same question. You answer, 2 minutes, and the newcomer says, Well I guess it will be awhile. A cab arrives after 3 minutes, you get in and commence to cypher. The previous two cabs arrived at an average interval (10+2)/2 = 6, and all three at (10+2+3)/3 = 5.

(IV) Simulate this with a six-sided die [one of a pair of dice]. A roll of six is the arrival of a cab. Keeping track of the count, roll the die until you get a six. Repeat for 50 ~ 100 or more sixes: determine the average interval between sixes, and plot the distribution of intervals. As the sample size n increases, the distribution will become exponential for (1/6)n.

For the advanced student: A greater sample set can be obtained in Excel with the RANDOMBETWEEN function as A1 = RANDOMBETWEEN(1,6) which returns random integers between 1 and 6. If repeated for lines A1 - A12000, the expectation is n = 2000 '6s' and the intervals between '6s' approaches a more accurate distribution.
 

Figure & Text material © 2022 by Steven M. Carr