TheThePoissonPoissonDistributionDistribution
The Poisson probability distribution assigns probabilities to the counts of events that occur in a fixed interval of time or space. The events must be independent of each other; the occurence of one event does not affect the probability of the occurence of another event.
Tracking caribou herds from a helicopter, we expect to observe $3$ arctic foxes per day. What is the probability we will observe $5$ arctic foxes in a day?
$$ P(X=5) = \frac{e^{-3}\, 3^5}{5!} = 0.1008 $$
The number of potholes in 10m stretches on the service road of HW15 has been recorded in a frequency table:$$ \begin{array}{c|ccccc} \text{Number of potholes} & 0 & 1 & 2 & 3 & 4 \\ \hline \text{Frequency} & 48 & 43 & 27 & 10 & 2 \end{array} $$
We will assume that the potholes occur independently and we will build a Poisson distribution to represent the number of potholes in a 10m stretch of road. We will use the frequency table to approximate the mean of the distribution with the sample mean.$$\begin{aligned} \mu = \bar{x} &= \frac{0\cdot 48 + 1\cdot 43 +2\cdot 27 + 3\cdot 10 + 4\cdot 2 }{130} =1.038 \\ \end{aligned} $$In a later lecture we will learn how to check if the Poisson distribution is a good fit for the data. In this example we will simply use the Poisson distribution to make predictions.$$\begin{aligned} P(X=0) &=\frac{e^{-1.038}\cdot (1.038)^0}{0!}=0.354 \\ &\\ P(X=2) &=\frac{e^{-1.038}\cdot (1.038)^2}{2!}=0.191 \end{aligned} $$
In a Poisson distribution, the mean and variance are equal. Thus the coefficient of dispersion is 1:$$CD=\frac{\sigma^2}{\mu}=1$$We can use this fact for a quick test of whether a sample is generated from a Poisson distribution. For a sample of data, if the coefficient of dispersion is near 1, then the Poisson distribution model is likely a good fit for the data.
Consider again the data on the occurrence of potholes on a service road. We already computed the sample mean: $\bar{x}=1.038.$For the sample variance we will use the following formula which takes into account the frequency $f$ of each value:$$\begin{aligned} s^2 &=\frac{\sum f(x-\bar{x})^2}{\sum f-1} \\ &=\frac{48(0-1.038)^2+43(1-1.038)^2+27(2-1.038)^2+10(3-1.038)^2+2 \cdot(4-1.038)^2}{129} \\ &=1.030 \\ \end{aligned} $$$\because\quad s^2 \approx \bar{x} \quad \Rightarrow \quad $Looks like the Poisson will be a good fit for this data.
This is a classical example. The data shows the number of men killed by being kicked by a horse in ten Prussian Army Corps in the course of 20 years (Bortkiewicz, 1898). For the Army Corps in twenty years, we have $10\times 20=200 $ samples. The data is as follows:$$ \begin{array}{c|ccccc} x & 0 & 1 & 2 & 3 & 4 \\ \hline \text { Freq. } & 109 & 65 & 22 & 3 & 1 \end{array} $$The sample mean is:$$\bar{x}=\frac{109\cdot 0+65\cdot 1+22\cdot 2+3\cdot 3+1\cdot 4}{200}=0.61$$$$\begin{aligned} s^2&=\frac{109(0-0.61)^2+65(1-0.61)^2+22(2-0.61)^2+3(3-0.61)^2+1(4-0.61)^2}{200-1} \\ &=0.611 \end{aligned} $$$CD=\frac{s^2}{\bar{x}}=\frac{0.611}{0.61}\approx 1.0016 \quad \Rightarrow \quad$ Looks like the Poisson distibution will be a fit.
We can make predictions using the Poisson distribution: What is the probability that four men will be killed by a horse kick in a year?$$P(X=4)=\frac{e^{-0.61}\cdot (0.61)^4}{4!}=0.003 $$