µ  |  N
0 1 2 3 4 5   > 0 >(0+1)
0.100 90.5% 9.0% 0.5% 0.0% 0.0% 0.0%
9.5% 0.5%
0.125 88.2% 11.0% 0.7% 0.0% 0.0% 0.0%
11.8% 0.7%
0.250 77.9% 19.5% 2.4% 0.2% 0.0% 0.0%
22.1% 2.6%
0.500 60.7% 30.3% 7.6% 1.3% 0.2% 0.0%
39.3% 9.0%
0.750 47.2% 35.4% 13.3% 3.3% 0.6% 0.1%
52.8% 17.3%
1.000 36.8% 36.8% 18.4% 6.1% 1.5% 0.3%   63.2% 26.4%

The Poisson Distribution

    The Poisson distribution is a special case of the binomial distribution that applies where the phenomenon under study occurs as rare, discrete events (count data). The characteristic statistical property of a Poisson distribution is that the variance equals the mean (σ2 = µ). For a Poisson-distributed process, the probability P of observing Y events given a mean of u is

P(Y,u) = e-u uY / Y!

    (read as, the Probability of Y given u), where e is the base of natural logarithms, and Y! = Y factorial [e.g., if Y = 5, Y! = 5 x 4 x 3 x 2 x 1 = 120]. This distribution provides a number of useful tools.

    (1) In a population of constant size, each individual produces
µ = 1 ± 1 offspring. If reproductive success occurs randomly, the distribution of offspring numbers follows the last line in the table above, 37% of individuals will have no offspring, another 37% one offspring, 13% with two offspring, and the remaining 8% with three or more. This means that more than a third of individuals in any one generation do not contribute genetically to the next generation.

   (2) In an ecological study of the distribution of a rare plant species among a number of standardized quadrat plots, a majority of plots may be expected to contain no plants, a smaller number a single plant, and still smaller numbers two, three, or more plants. If 16 plants are distributed randomly over the 4 x 4 checkerboard of quadrat squares [heavy outlines] (mean µ = 1 ± 1), the same last line of the table shows that among the 16 cells, cells with "0" and "1" plants occur at 37% each, with "2" plants at 18%, with "3" plants at 6%, and with "4+" plants taking up the remaining 2%. In the example, the 16 plants are distributed over 6, 5, 4, and 1 cells with 0, 1, 2, and 3 plants, respectively. A Chi-square test based on an expected µ = 1 ± 1 distribution would indicate whether or not the rare plant species is distributed randomly.

Poisson quadrat
4X4 quadrat with 16 plants

    (3) The Poisson can simplify analysis of an "either / or" data set. In the quadrat example with µ = 1, the Poisson random expectation is that 37% of the quadrat plots will be unoccupied (0) and the remaining 63% occupied (> 0). In the example, there are 6 unoccupied cells and 10 occupied cells in the 4 x 4 quadrat, thus 37.5% of cells are occupied. A 2x2 contingency test (for example, Fisher's Exact Test) can test for a significant excess of empty cells (plants are clumped), or a significant deficiency (plant distribution is more uniform). The former might occur if suitable soil is patchily distributed, the latter if successful plants are spaced out as a result of competition for resources.

    (4) The same principle can be extended to a multiple hits correction. Suppose I throw rocks at a building with 100 windows. A good early estimate of the number of thrown rocks is the count of broken windows. After a bit, this count is an underestimate, because once a window is broken, any subsequent rock that goes through the same window space is not counted. The underestimate becomes worse as time goes on. We can revise the estimate by applying a Poisson Correction to estimate the total number of hits, based on the zero class (the number of unbroken windows). From the formula above, the expected probability of the zero class (P0) simplifies to

      P0 = e-u u0 / 0! = e-u

     where u = corrected fraction of hits. For example, if 39 out of 100 windows are broken, then 61 are unbroken (P0 = 0.61), so set 0.61 = e-u
 
     Taking the negative natural log of both sides gives u = - ln(0.61) = 0.50

    The expected number of "hits" is (100)(0.50) = 50 rather than the observed 39 broken windows: the correction is 11 extra "hits". This requires a correction of (50 - 39) / 39 = (11 / 39) = 28%. For u = 0.50 In the table, note that the correction of 11 extra events occur as roughly 8 windows with "double" hits and 1 with "triple" hits, total (8)(1) + (1)(2) = 10.

    The Poisson Correction is valuable in evolutionary population genetics, where it can be used to obtain the expected from the observed number of nucleotide or amino acid differences between two macromolecules (King & Wilson 1975).

 (5)  In a classic study, Bortkiewicz (1898) studied the distribution of 122 soldiers kicked to death by horses among ten Prussian army corps over 20 years. The data show that, in most years in most corps, nobody dies from horse kicks, whereas in one corp in one year, four men were kicked to death. Do the data suggest that members of this particular corp were careless?  Statistical analysis indicates that the observed counts conform quite closely to the Poisson Expectation: the mean and variance are equal. The corp was "unlucky" rather than careless: it fell in the extreme tail of the expected distribution of events.

Number of men kicked to death by horses in ten Prussian army corps

# men killed
/ year / corp
Observation
(# deaths)
Poisson
Expectation
0
109 (0)
108.7 (0.0)
1
65 (65)
 66.3 (66.3)
2
22 (44)
 20.2 (40.4)
3
 3 (9)
  4.1 (12.3)
4
 1 (4)
  0.6 (2.4)
5+
 0 (0)
  0.1 (0.5)
# corp-years
200
200.0
Total deaths
122
121.9
Mean
0.610
0.610
Variance
0.611
0.610


Homework:

Calculate the Chi-square probability of the deviation from Poisson expectation of the quadrat data, as the distribution of 16 plants over (1) 4 x 4= 16 squares, and (2) 8 x 8 = 64 squares (µ = 0.25)

Calculate the distribution as a 2x2 contingency test (Fisher's exact test) of the occupied / unoccupied data set.

Calculate the Chi-square probability of the Bortkiewicz data.


All figures & text material ©2025 by Steven M. Carr