µ | N |
0 | 1 | 2 | 3 | 4 | 5 | > 0 | >(0+1) | |
0.100 | 90.5% | 9.0% | 0.5% | 0.0% | 0.0% | 0.0% | 9.5% | 0.5% | |
0.125 | 88.2% | 11.0% | 0.7% | 0.0% | 0.0% | 0.0% | 11.8% | 0.7% | |
0.250 | 77.9% | 19.5% | 2.4% | 0.2% | 0.0% | 0.0% | 22.1% | 2.6% | |
0.500 | 60.7% | 30.3% | 7.6% | 1.3% | 0.2% | 0.0% | 39.3% | 9.0% | |
0.750 | 47.2% | 35.4% | 13.3% | 3.3% | 0.6% | 0.1% | 52.8% | 17.3% | |
1.000 | 36.8% | 36.8% | 18.4% | 6.1% | 1.5% | 0.3% | 63.2% | 26.4% |
The Poisson distribution is a
special case of the binomial distribution
that
applies
where the phenomenon under study occurs as rare, discrete
events (count data). The characteristic statistical
property of a Poisson distribution is that the variance equals
the mean (σ2
= µ).
For a Poisson-distributed process, the probability P of
observing Y events given a mean of u
is
P(Y,u) =
e-u uY
/ Y!
(2) In an ecological
study of the distribution of a rare plant species among a
number of standardized quadrat plots,
a majority of plots may be expected to contain no plants, a
smaller number a single plant, and still smaller numbers two,
three, or more plants. If 16 plants are distributed
randomly over the 4 x 4 checkerboard of quadrat squares
[heavy outlines] (mean µ
= 1 ± 1), the same last line of the table shows that
among the 16 cells, cells with "0"
and "1" plants occur at 37%
each, with "2" plants at 18%,
with
"3" plants at 6%,
and with
"4+" plants taking up the remaining 2%. In the
example, the 16 plants are distributed over 6, 5, 4, and 1 cells
with 0, 1, 2, and 3 plants, respectively. A Chi-square
test based on an expected µ = 1 ± 1
distribution would indicate whether or not the rare plant species
is distributed randomly.
(3) The Poisson can simplify analysis of an "either
/ or" data set. In the quadrat example with µ = 1, the Poisson
random expectation is that 37% of the quadrat plots will
be unoccupied (0) and the remaining 63%
occupied (> 0). In the example, there are 6
unoccupied cells and 10 occupied cells in the 4 x 4
quadrat, thus 37.5% of cells are occupied. A 2x2 contingency test (for
example, Fisher's
Exact Test) can test for a significant excess of empty cells
(plants are clumped), or a significant deficiency (plant
distribution is more uniform).
The former might occur if suitable soil is patchily
distributed, the latter if successful plants are
spaced out as a result of competition for resources.
(4) The same principle can be extended to a multiple hits correction.
Suppose I throw rocks at a building with 100 windows. A good
early estimate of the number of thrown rocks is the count
of broken windows. After a bit, this count is an underestimate,
because once a window is broken, any subsequent rock that goes
through the same window space is not counted. The
underestimate becomes worse as time goes on. We can revise the
estimate by applying a Poisson
Correction to estimate the total number of
hits, based on the zero class (the number of
unbroken windows). From the formula above, the expected
probability of the zero class (P0) simplifies to
P0
= e-u u0
/ 0! = e-u
where u = corrected fraction
of hits. For example, if 39 out of 100
windows are broken, then 61
are unbroken (P0 =
0.61), so set 0.61
= e-u
Taking the negative natural log of
both sides gives u = - ln(0.61) = 0.50
The expected number
of "hits" is (100)(0.50) = 50 rather than the observed
39 broken windows: the correction is 11 extra
"hits". This requires a correction of (50 - 39) /
39 = (11 / 39) = 28%. For u = 0.50 In
the table, note that the correction of 11 extra events
occur as roughly 8 windows with "double" hits and 1
with "triple" hits, total (8)(1) + (1)(2) = 10.
The Poisson
Correction is valuable in evolutionary population genetics,
where it can be used to obtain the expected from the observed
number of nucleotide or amino acid differences between two
macromolecules (King
& Wilson 1975).
(5) In a classic
study, Bortkiewicz
(1898)
studied the distribution of 122
soldiers kicked to death by horses among ten Prussian army corps over
20 years. The data show
that, in most years in most corps, nobody dies from horse kicks,
whereas in one corp in one year, four men were kicked to
death. Do the data suggest that members of this particular corp
were careless? Statistical analysis indicates that the
observed counts conform quite closely to the Poisson Expectation:
the mean and variance are equal. The corp was "unlucky" rather than
careless: it fell in the extreme tail of the expected distribution
of events.
Number of men kicked to death by
horses in ten Prussian army corps
# men killed / year / corp |
Observation (# deaths) |
Poisson Expectation |
0 |
109 (0) |
108.7 (0.0) |
1 |
65
(65) |
66.3
(66.3) |
2 |
22 (44) |
20.2
(40.4) |
3 |
3
(9) |
4.1 (12.3) |
4 |
1
(4) |
0.6 (2.4) |
5+ |
0
(0) |
0.1 (0.5) |
# corp-years |
200 |
200.0 |
Total deaths |
122 |
121.9 |
Mean |
0.610 |
0.610 |
Variance |
0.611 |
0.610 |