Chi-SquareChi-SquareTestTestforforIndependenceIndependence
The following table shows the distribution of bison in Yellowstone National Park by age and location.$$ \begin{array}{c|c|c|c|c|c} & \text { Lamar } & \text { Nez Percé } & \text { Firehole } & \\ \text { Age } & \\ \hline \text { Calf } & 13 & 13 & 15 & 41 \\ \hline \text { Yearling } & 10 & 11 & 12 & 33 \\ \hline \text { Adult } & 34 & 28 & 30 & 92 \\ \hline & 57 & 52 & 57 & 166\end{array}$$
Is the age distribution independent of the location in the park? Test at the $\alpha=0.05$ level of significance.
Null hypothesis, $H_0$: Age distribution is independent of location.
Alternative hypothesis, $H_1:$ Age distribution and location are dependent.
The cell expected frequencies are computed by multiplying the two marginal counts and dividing by the total count. For example for the Calfs in Lamar, the expected frequency is:$$ E_{11}=\frac{57\times 41}{166}=14.08 $$The remaining expected frequencies are computed similarly.$$ \begin{array}{c|c|c|c|c|c} & \text { Lamar } & \text { Nez Percé } & \text { Firehole } & \\ \text { Age } & \\ \hline \text { Calf } & 14.08 & 12.84 & 14.08 & 41 \\ \hline \text { Yearling } & 11.33 & 10.34 & 11.33 & 33 \\ \hline \text { Adult } & 31.59 & 28.82 & 31.59 & 92 \\ \hline & 57 & 52 & 57 & 166\end{array}$$Next we compute the $\chi^2$ statistic by using the observed frequancies from the data table and the expected frequencies.$$ \begin{aligned} \chi^2 &=\frac{(13-14.08)^2}{14.08}+\cdots+\frac{(30-31.59)^2}{31.59} \\ &\\ &=0.670355, \quad \quad df=(3-1)(3-1)=4 \end{aligned} $$Using software (e.g. Excel) we find: $p-value= 0.9549 > \alpha$.
We fail to reject $H_0.\; $This data does not provide enough evidence to refute the claim that location and age distribution of bison in Yellowstone are independent.
Consider the following data from a Myer-Briggs personality test and the occupation of the test takers.$$ \begin{array}{l|c|c|c} \text { Occupation } & \text{ Extroverted } & \text { Introverted } \\ \hline \text { Clergy } & 62 & 45 & 107 \\ \text { Medical Doctor } & 68 & 94 & 162 \\ \text { Lawyer } & 56 & 81 & 137 \\ \hline & 186 & 220 & 406 \end{array}$$Use a $\chi^2$-test at $0.05$ level of significance to determine if the listed occupations and personality traits are independent.
Null hypothesis, $H_0$: Occupation and personality are independent.
Alternative hypothesis, $H_1:$ Occupation and personality are dependent.
We start by computing the expected frequencies for each cell in the table under the assumption of independence. These expected frequencies are computed by multiplying the row and column marginal counts and dividing by the total count.$$ \begin{array}{l|c|c|c} & \text{Extroverted} & \text{Introverted} & \\ \hline \text{Clergy} & 49.02 & 57.98 & 107 \\ \text { Medical Doctor } & 74.22 & 87.78 & 162 \\ \text { Lawyer } & 62.76 & 74.23 & 137 \\ \hline & 186 & 220 & 406\end{array}$$The $\chi^2$ statistic is computed using the observed and expected frequencies.$$ \begin{aligned} \chi^2 &=\frac{(62-49.02)^2}{49.02}+\cdots+\frac{(81-74.33)^2}{74.33} \\ &\\ &=8.65, \quad \quad df=(3-1)(2-1)=2 \end{aligned} $$Using software we find: $p-value= 0.0013 < \alpha=0.05$
Since the $p-$value is less than the level of significance, we reject $H_0$ and accept $H_1$. We conclude that occupation and personality are dependent variables.