Lecture 17

ConfidenceConfidenceIntervalsIntervalsforforDifferenceDifferenceofofMeansMeans

LargeLargeSamplesSamples

Remark

Considerations similar to those for a single population mean allow us to construct a confidence interval for the difference of two population means. The formulas will depend on the sizes of the samples (bigger than 30 or not).

Formula

A $100(1-\alpha)$% confidence interval for the difference of population means $\mu_1 - \mu_2$ is given by$$ \overline{X}_1-\overline{X}_2-z_{\alpha / 2} \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} \leq \mu_1-\mu_2 \leq \overline{X}_1-\overline{X}_2+z_{\alpha / 2} \sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}} $$This confidence interval holds for large samples $n_1 > 30, n_2 > 30$.

Example 1

A sample of $40$ Blue Jays have average weight $\overline{X}_1=87.6 \mathrm{~g}$ with $s=2 \mathrm{~g}$.
A sample of $48$ American Robins have $\overline{X}_2=74 \mathrm{~g}$ with $s=3 \mathrm{~g}$.
Construct a $90 \%$ CI fur the difference in weights.
$$87.6-74.5 \pm 1.645 \sqrt{\frac{2^2}{40}+\frac{3^2}{48}}$$$12.22 \leq \mu_1-\mu_2 \leq 13.98 \mathrm{~g} $ with $90 \%$ confidence.

Example 2

A ecology graduate student, Hanah, studying the skink Scelotes bojerii (a sort of lizard) on small islands off the coast of Mauritius observe that samples captured on Gunner`s Island are smaller than those on Round Island. She suggests that rats which have been introduced to G.I. selectively predate on larger animals, thus reducing the average size.
Hanah colected two samples to quantify the difference of lengths (mm).$$\begin{array} {|l|3*|c|} \hline \text{Gunner`s Island} & n_1=40 & \bar{x}_1=35.33 & s_1=7.124 \\ \hline \text{Round Island} & n_2=40 & \bar{x}_2=56.50 & s_2=7.714 \\ \hline \end{array} $$Let us compute a $95\%$ confidence interval for the difference in lengths:
$$56.50-35.33+1.96 \sqrt{\frac{7.124^2}{40}+\frac{7.714^2}{40}}$$$17.95 \leq \mu_2-\mu_1 \leq 24.45 \mathrm{~mm}$ with $95\%$ confidence. Since the interval does not contain zero, we can conclude that the average length of skinks on Round Island is greater than that on Gunner`s Island with $95\%$ confidence.

SmallSmallSamplesSamples

Formula

A $100(1-\alpha)$% confidence interval for the difference of population means $\mu_1 - \mu_2$ is given by$$ \overline{X}_1-\overline{X}_2 \pm t_{\alpha / 2} \sqrt{\frac{\left(n_1-1\right) s_1^2+\left(n_2-1\right) s_2^2}{n_1+n_2-2}} \sqrt{\frac{1}{n_1}+\frac{1}{n_2}} $$This confidence interval holds for samples samples $n_1 \leq 30$ or $n_2 \leq 30$ and under the assumption that the two population are normal with equal variances. The degrees of freedom are $df=n_1+n_2-2$.

Example 3

Another graduate student, Briana is also studying the Mauritius skinks among other species. Briana collects two small samples.$$\begin{array} {|l|3*c|} \hline \text{Gunner`s Island} & n_1=12 & \bar{x}_1=35.33 & s_1=7.124 \\ \hline \text{Round Island} & n_2=10 & \bar{x}_2=56.50 & s_2=7.714 \\ \hline \end{array} $$
Assuming normal populations with equal variances Briana computes a $95\%$ confidence interval for the difference in lengths:$df=10+12-2=20$ and at $95\%$ confidence, $t=2.086$ instead of $1.96$
$$ 56.50-35.33 \pm 2.086 \sqrt{\frac{11(7.124)^2+9(7.714)^2}{12+10-2}} \sqrt{\frac{1}{12}+\frac{1}{10}}$$
$14.57 \leq \mu_1-\mu_2 \leq 27.77 $ with $95 \%$ confidence. Since the interval does not contain zero, Briana concludes that the average length of skinks on Round Island is greater than that on Gunner`s Island with $95\%$ confidence.

Example 4

Grass mass produced when using $100\mathrm{~g}$ of two different fertilizers:

$$\begin{array} {|r|8*c|} \hline \text{Fertilizer 1} & 91.50 & 94.18 & 92.18 & 95.39 & 91.79 & 89.07 & 94.72 & 89.21 \\ \hline \text{Fertilizer 2} & 89.19 & 90.95 & 91.46 & 93.21 & 97.19 & 97.04 & 91.07 & 92.75 \\ \hline \end{array} $$Here is the data summary:
$n_1=8 \quad \quad \overline{X}_1= 92.255 \quad \quad s_1=2.39$
$n_2=8 \quad \quad \overline{X}_2= 92.733 \quad \quad s_2=2.89$
$df=14$
Let us compute a $95\%$ confidence interval for the difference of means:
$$92.733-92.255 \pm 2.145 \sqrt{\frac{(7)(2.39)^2+(7)(2.89)^2 }{8+8-2} }\sqrt{\frac{1}{8}+\frac{1}{8}}$$
Crunching the numbers gives: $0.478 \pm 2.145(1.35).$
$-2.42 \leq \mu_2-\mu_1 \leq 3.37 \mathrm{~g}$ with $95\%$ confidence. Since the interval contains zero, we cannot conclude that the average mass of grass produced by the two fertilizers is different with $95\%$ confidence.

PairedPairedSamplesSamples

Formula

In many situations the samples are paired. For example, the same group of patients is tested before and after a treatment.
In these casee, the differences between the paired observations are computed and the confidence interval is constructed for the mean of the differences.
A $100(1-\alpha)$% confidence interval for the average of population differences is given by$$ \bar{d}-t_{\alpha / 2} \frac{s_d}{\sqrt{n}} \leq \mu_D \leq \bar{d}+t_{\alpha / 2} \frac{s_d}{\sqrt{n}} $$with $(n-1)$ degrees of freedom.

Example 5

An article in Neurology discussed shared traits between monozygotic twins. An intelligence test was administered and the results are shown below. Here is the summary of the differences in intelligence scores:$$\begin{array} {l|10*c} \text{Birth Order: 1} & 6.08 & 6.22 & 7.99 & 7.44 & 6.48 & 7.99 & 6.32 & 7.60 & 6.03 & 7.52\\ \hline \text{Birth Order: 2} & 5.73 & 5.80 & 8.42 & 6.84 & 6.43 & 8.76 & 6.32 & 7.62 & 6.59 & 7.67 \\ \hline d & 0.35 & 0.42 & -0.43 & 0.60 & 0.05 & -0.77 & 0 & -0.02 & -0.56 & -0.15 \end{array} $$

$\bar{d}=0.051 \quad \quad s_d=0.44 \quad \quad df=10-1=9 \quad \quad t=2.626$Plugging into the formula we have:$$ 0.051 \pm 2.262 \frac{0.44}{\sqrt{10}}$$$-0.365 \leq \mu_d \leq 0.264 $ with $95 \%$ confidence. Since the interval contains zero, we cannot conclude that the average intelligence scores of the two groups are different with $95\%$ confidence.

Example 6

Every year volunteers count the number of horseshoe crabs on the beaches of Delaware Bay. Here is data for 2011-2012$$\begin{array} {r|6*c|} 2011 & 35\,282 & 49\,005 & 24\,936 & 17\,620 & 12\,120 & 44\,022 \\ 2012 & 21\,814 & 30\,015 & 29\,800 & 23\,114 & 11\,987 & 32\,140 \\ \hline \mu_1-\mu_2=D & -13\,468 & -18\,855 & 4\,864 & 5\,494 & -133 & -11\,882 \end{array} $$This data is summarized as follows:$\bar{d}=-5\,663.33 \quad \quad s_d=10\,387.49 \quad \quad df=6-1=5$Let us compute a $99\%$ confidence interval: $\Rightarrow t=4.023$$$ -5\,663.33 \pm 4.023 \frac{10\,387.49}{\sqrt{6}}$$$-22\,761.7\leq \mu_d \leq 11\,435.1 $ with $99 \%$ confidence. Since the interval contains zero, we cannot conclude that the average number of horseshoe crabs counted on the beaches of Delaware Bay for these two years is different with $99\%$ confidence.