Lecture 8

TheTheBinomialBinomialDistributionDistribution

Remark

Probability distributions are models that compute the expected probabilities of more than one event. The empirical frequency distributions of commonly encountered types of data might match closely with a probability distribution model. If we have such a match between data and a model, using the model we may begin to make generalizations and predictions about the population which generated the data. We will consider four of the most frequently encountered probability distributions in this course.

For discrete variables: Binomial, Poisson, and Negative Binomial distributions.

For continuous variables: Normal (Gaussian) distribution.

Note

As we will use data to develop and validate our models, we will have to carefully distinguish between variables with similar names but different meanings. The basic distinction is that some of these variables denote quantities in the poputation and some variables will denote sample (data-based) quantities.

The population quantities are denoted by Greek letters, such as $\mu$ for the population mean, $\sigma$ for the population standard deviation. These quantities are fixed and unchanging and typically are not directly observable.

The sample quantities are denoted by Latin letters, such as $\overline{X}$ for the sample mean, $s$ for the sample standard deviation. These quantities are computed from the data and are subject to change from sample to sample.

BinomialBinomialProbabilitiesProbabilities

Remark

The binomial probabilities arise in contexts where multiple independent trials are conducted, and each trial has only two possible outcomes.

Example 1

In a certain area of upstate NY black-legged deer ticks carry Lyme disease with a probability $0.2$. While hiking you find two ticks. What is the probability distribution of the number of carriers out of 2 ticks?

Solution

We will assume that each tick is a carrier of Lyme disease independently of any other tick. Finding a tick is an independent trial with two possible outcomes: the tick is a carrier or it is not. Let $ C=$ tick is a carrier of lyme disease, let $ H=$tick is healthy and let $ X $ be the number of carriers out of the 2 ticks you found.
Then$$\begin{array} {c|ccc} x & 0 & 1 & 2 \\ \hline & HH & \lbrace HC \, \text{or} \, CH \rbrace & CC \\ \hline P(x) & (0.8)(0.8) & (0.8)(0.2)+(0.2)(0.8) & (0.2)(0.2) \\ & 0.64 & 0.32 & 0.04 \end{array} $$Here, $P(x) = P(X=x)$ is the probability of finding $x$ carriers out of 2 ticks.

Example 2

What would be the probability distribution of the number of carriers out of three ticks?

Solution

$$\begin{array} {c|cccc} x & 0 & 1 & 2 & 3\\ \hline & HHH & \lbrace HHC, \, HCH, \, CHH \rbrace & \lbrace CCH, \, CHC, \,HCC \rbrace & CCC \\ \hline P(x)& (0.8)^3 & 3(0.8)^2(0.2) & 3(0.8)(0.2)^2 & (0.2)^3\\ & 0.512 & 0.384 & 0.096 & 0.008 \end{array} $$We can present these probabilities in a histogram, which looks just like an empirical frequency table.

Example 3

What would be the probability distribution of the number of carriers out of six tics?

Solution

Let $X=$the number of ticks carrying Lyme disease out of six.$$ \begin{array} {c|l} x & P(X=x) \\ \hline 0 & (0.8)^6 =0.262 \\ 1 & {}_6 C_1 (0.8)^{5}(0.2)=0.393 \\ 2 & {}_6 C_2 (0.8)^{4}(0.2)^2=0.246 \\ 3 & {}_6 C_3 (0.8)^{3}(0.2)^3=0.082 \\ 4 & {}_6 C_4 (0.8)^{2}(0.2)^4=0.015 \\ 5 & {}_6 C_5 (0.8)(0.2)^5=0.0015 \\ 6 & (0.2)^6=0.000064 \\ \end{array} $$The solution uses the notation $_nC_k$ for the combinatorial coefficients which are introduced in the following note.

Note

The quantity$$_{n}C_k = \frac{n!}{k!(n-k)!}$$ is called a combinatorial coefficient. It is also denoted as $\binom{n}{k}$. This combinatorial coefficient counts in how many ways we can choose $k$ items from a set of $n$ items. In the context of the binomial distribution, $n$ is the number of trials and $k$ is the number of successes. The combinatorial coefficient is the number of ways to choose $k$ successes within the sequence of $n$ trials.

Remark

Here are some examples of factorials and combinatorial coefficients:$$3!= 3\times 2 \times 1 = 6; \qquad 5!=120; \qquad {}_8C_2=\frac{8!}{6!2!}=28; \qquad {}_{12}C_5=792 $$

FormulaFormulaforforthetheBinomialBinomialDistributionDistribution

Formula

The binomial probability formula applies in contexts where there is a series of $n$ independent trials, each with two outcomes, which we will call $S=\text{ Success }$ and$\, F=\text{ Failure. }$

Let $p=P(S)$ be probability of Success and let $q=1-p$ the probability of Failure.

Let $X=$ the number of successes in $n$ trials.

Then$$P(X=x)={}_nC_x\, \cdot p^x\cdot q^{n-x} \quad;\quad X=0,1,2, \dots,n $$

Example 4

About $10\%$ of people are left handed. Compute the probability distribution of the number of lefties in a sample of 4 randomly selected ecologists.

$ n=4 \quad p=0.1 \quad q=0.9 $$$ \begin{array}{c|ccccc} x & 0 & 1 & 2 & 3 & 4 \\ \hline P(x) & 0.656 & 0.292 & 0.049 & 0.0036 & 0.0001 \end{array} $$Here is how we compute the probabilities:$$P(X=0)={}_4C_0\; (0.1)^0\; (0.9)^4=0.656$$$$P(X=1)={}_4C_1\; (0.1)^1\; (0.9)^3=0.292$$$$P(X=2)={}_4C_2\; (0.1)^2\; (0.9)^2=0.049$$$$P(X=3)={}_4C_3\; (0.1)^3\; (0.9)^1=0.0036$$$$P(X=4)={}_4C_4\; (0.1)^4\; (0.9)^0=0.0001$$

MeanMeanandandVarianceVarianceofofthetheBinomialBinomialDistributionDistribution

Formula

The mean of a Binomial distribution is denoted by $\mu$ and is given by$$\mu=np$$
The standard deviation of a Binomial distribution is denoted by $\sigma$ and is given by$$\sigma=\sqrt{np(1-p)}$$

Example 5

In a class of $23$ students how many lefties are expected and what is the standard deviation?

$ n=23 \quad p=0.1 \quad q=0.9 $

$ \mu=np=23(0.1)=2.3 $; The expected value does not have to be an integer.

$ \sigma=\sqrt{np(1-p)}=\sqrt{23(0.1)(0.9)}=4.5 $

Example 6

The probability for male birth is $P_m=0.505$ and the the probability for female birth is $P_f=0.495$. Consider families with $3$ children. The table below uses the binomial formula to compute the probabilities for $X=0,1,2,3$ males out of $3$. It then uses these probabilities to compute the expected number of families with $X=0,1,2,3$ males in $200$ families with three children $(n=3)$.
We have:$$\begin{array}{r|cccc} x & 0 & 1 & 2 & 3 \\ \hline p(x) & 0.121 & 0.371 & 0.379 & 0.129 \\ \hline \text{freq} & 24 & 74 & 76 & 26\end{array}$$The frequencies are obtained by multiplying the probabilities by the total number of families.

In practice we start building a binomial model from the observations of the empirical frequency distribution. Say, this distribution has the same counts as above:$\ 24$ families with no male children, $74$ with one, $76$ with two, and $26$ with three male children. How do we recover the probabilities $P_m$ and $P_f$ from the empirical frequency distribution?

We first compute the mean of the empirical frequency distribution of the number of male births::$$ \mu \sim \frac{24(0)+74(1)+76(2)+26(3)}{200}=\frac{304}{200} = 1.52$$
And then we can recover the (empirical) probability for a male birth:$$p=\frac{\mu}{n}=\frac{1.52}{3}=0.506$$
We see that we recovered a value close to the true probability $P_m=0.505$. Now we can deploy our binomial model to make predictions about male and female children in the population. For example let us compute the probabilility for $3$ boys and $2$ girls in a family of $5.$$$ P(X=3)={ }_5 C_3(0.506)^3(0.494)^2=0.316 $$

CoefficientCoefficientofofDispersionDispersion

Remark

In Ecology one application of the discrete probability distributions we will consider, is to model the dispersion of individuals through space. Some species are clumped, some are regularly dispersed, and some are randomly dispersed. To decide which is the appropriate model we will use the coefficient of dispersion.

Definition 1

Thecoefficient of dispersion of a sample of data is defined as $$CD=\frac{s^2}{\bar{x}}$$

Remark

For a binomial distribution the coefficient of dispersion is smaller than one:$$CD=\frac{\sigma^2}{\mu}=\frac{np(1-p)}{np}=1-p <1$$The binomial distribution will be used to describe reglarly dispersed populations.

Lecture 7 Lecture 9