µ | N |
0 | 1 | 2 | 3 | 4 | 5 | > 0 | >(0+1) | |
0.100 | 90.5% | 9.0% | 0.5% | 0.0% | 0.0% | 0.0% | 9.5% | 0.5% | |
0.125 | 88.2% | 11.0% | 0.7% | 0.0% | 0.0% | 0.0% | 11.8% | 0.7% | |
0.250 | 77.9% | 19.5% | 2.4% | 0.2% | 0.0% | 0.0% | 22.1% | 2.6% | |
0.500 | 60.7% | 30.3% | 7.6% | 1.3% | 0.2% | 0.0% | 39.3% | 9.0% | |
0.750 | 47.2% | 35.4% | 13.3% | 3.3% | 0.6% | 0.1% | 52.8% | 17.3% | |
1.000 | 36.8% | 36.8% | 18.4% | 6.1% | 1.5% | 0.3% | 63.2% | 26.4% |
The Poisson distribution is a
special case of the binomial distribution
that
applies
where the phenomenon under study occurs as rare, discrete
events (count data). The characteristic statistical
property of a Poisson distribution is that the variance equals
the mean (σ^{2}
= µ).
For a Poisson-distributed process, the probability P of
observing Y events given a mean = u is
P(Y,u) =
e^{-u} u^{Y}
/ Y!
(1) In an ecological
study of the distribution of a rare plant species among a
number of standardized quadrat plots, a majority of
plots may be expected to contain no plants, a smaller number a
single plant, and still smaller numbers two, three, or more
plants. If 16 plants are distributed randomly over the 4
x 4 checkerboard of 2 m^{2} quadrats
[heavy outlines] below (mean µ
= 1), the table shows that a random Poisson distribution
over the cells should produce "0"
and "1" classes at 37%
each, a "2" class at 18%,
the "3" class at 6%, and
the more frequent classes will take up the remaining 2%. In the
example, the 16 plants are distributed over 6, 5, 4, and 1 cells
with 0, 1, 2, and 3 plants, respectively. A Chi-square
test based on an expected µ = 1 ± 1
distribution would indicate whether or not the rare plant species
is distributed randomly.
(2) The Poisson can simplify analysis of a simple "either
/ or" data set. In the quadrat example with µ = 1, the Poisson
random expectation is that 37% of the quadrat plots will
be unoccupied (0) and the remaining 63%
occupied (> 0). In the example, there are 6
unoccupied cells and 10 occupied cells in the 4 x 4
quadrat. In a 2x2 contingency test
(for example, Fisher's
Exact Test), a significant excess of empty cells means the plants are clumped,
and a significant deficiency
means the plant distribution is more uniform. The
former might occur if suitable soil is patchily distributed,
the latter if successful plants are spaced out so as
to avoid competition for resources.
(3) The same principle can be extended to a multiple hits correction. For
example, if I throw rocks at a building with 100 windows, a good
early estimate of the number of thrown rocks is the count
of broken windows. After a bit, this count is an underestimate,
because once a window is broken, any subsequent rock that goes
through the same space is not counted. We can then apply a Poisson Correction to estimate the
number of multiple hits from the
zero class (the number of unbroken windows).
From the above, the expected probability of the zero class (P0) simplifies to
P_{0}
= e^{-u} u^{0}
/ 0! = e^{-u}
where u = corrected fraction
of hits. For example, if 39 out of 100
windows are broken, then 61
are unbroken (P_{0} =
0.61), so set 0.61
= e^{-u}
Taking the negative natural log of
both sides gives u = - ln(0.61) = 0.50
That is, the estimated
number of "hits" is (100)(0.50) = 50 rather
than the observed 39 broken windows, that is, 11
extra "hits". This requires a correction of (50 -
39) / 39 = (11 / 39) = 28%. From the table
above, note that the correction of 11 extra events
correspond roughly to 8 windows with "double" hits and 1
with "triple" hits, total (8)(1) + (1(2) = 10.
The Poisson
Correction is extremely valuable in molecular population
genetics, where it can be used to correct the number of
actual nucleotide and (or) amino acid differences from the
number observed between two macromolecules
(4) In a classic
study, Bortkiewicz
(1898) studied the distribution of 122 soldiers kicked to death
by horses among ten Prussian
army corps over 20 years.
The data show that, in most years in most corps nobody dies from
horse kicks, whereas in one corp in one year, four men were
kicked to death. Do the data suggest that members of this
particular corp were careless? Statistical analysis
indicates that the observed frequencies conform quite closely to
the expected Poisson frequencies: the mean and variance are identical. The corp
was "unlucky" rather than
careless: it fell in the extreme tail of the expected distribution
of events.
Number of men kicked to death by
horses in ten Prussian army corps
# men killed / year / corp |
Observation (# deaths) |
Poisson Expectation |
0 |
109 (0) |
108.7 (0.0) |
1 |
65
(65) |
66.3
(66.3) |
2 |
22 (44) |
20.2
(40.4) |
3 |
3
(9) |
4.1 (12.3) |
4 |
1
(4) |
0.6 (2.4) |
5+ |
0
(0) |
0.1 (0.5) |
# corp-years |
200 |
200.0 |
Total deaths |
122 |
121.9 |
Mean |
0.610 |
0.610 |
Variance |
0.611 |
0.610 |