Lecture 12

Chi-SquareChi-SquareGoodnessGoodnessofofFitFitTestTest

Remark

When building a model to describe a set of data, it is always important to ensure that the model fits the data. The Chi-Square Goodness of Fit Test is a statistical test used to determine if there is a significant difference between the expected and observed frequency distribution in one or more categories.

In our context, we can use the coefficient of dispersion $CD$ to guide us in the selection of a modeling distribution. To test if the selected distribution really fits the data we will deploy a $\chi^2\; $(chi-square) goodness of fit test.

Statistical tests are formulated in terms of null hypothesis $H_0$ and alternative hypotheses $H_1$. For a $\chi^2\; $-goodness of fit test the null hypothesis is the statement that the model is appropriate. The alternative hypothesis is the statement that the model is not appropriate. The $\chi^2\; $-value defined next is computed from the data and is used to decide whether to reject the null hypothesis and discard the model.

Formula

Before we proceed with examples, let us write down the formula for the $\chi^2-$goodness of fit test. The test statistic for the $\chi^2-$goodness of fit test is given by$$\chi^2 =\sum \frac{\left(O_i-E_i\right)^2}{E_i}$$where $O_i$ is the observed frequency and $E_i$ is the expected frequency for each category.

The degrees of freedom for this test are equal to $k-m-1, $ where $k$ is number of cells and $ m$ is the number of parameters we estimated from the data.

The $p-$value for the test is the probability of observing a test statistic as large as the one we observed, assuming the null hypothesis is true. Large test statistic here means that the observed and expected frequencies are very different.

Example 1

Let us treat the historical dataset on the number of men killed by horse kicks in the Prussian Army formally. This sample has $n=200$ observations.$$ \begin{aligned} \begin{array}{c|ccccc} X & 0 & 1 & 2 & 3 & 4 \\ \hline \text { freq. } & 109 & 65 & 22 & 3 & 1 \end{array} \end{aligned} $$Previously we computed$$\overline{X}=0.61; \quad s^2=0.611; \quad CD=1.002 $$The coefficient of dispersion $CD$ is close to 1, so we expect that the Poisson distribution is a good fit for this data.

To run the $\chi^2-$goodness of fit test, we need to compute the expected frequency for each category. The expected frequency for each category is given by $E_i=nP(x_i)=200 P(x_i)$, where $P(x_i)$ is the probability of the $i^{th}$ category. These probabilities are computed using thePoisson distributionwith parameter $\mu=\overline X = 0.61$.$$ \begin{array}{c|cccc} x & 0 & 1 & 2 & \geq 3 \\ \hline \text { observed freq. } & 109 & 65 & 22 & 4 \\ \hline P(x) & 0.543 & 0.331 & 0.101 & 0.024 \\ \hline \text { predicted freq. } & 108.7 & 66.3 & 20.2 & 4.8 \end{array} $$Notice that the last two columns of the data table were merged. In $\chi^2-$goodness of fit test, we need to have at least 3 observations in each category. Cells containing less than 3 counts should be merged.

Next we run the $\chi^2-$goodness of fit test. The null and alternative hypotheses are:$\; H_0:$ Poisson is an appropriate model.
$\; H_1:$ Poisson is not an appropriate model.

The test statistic is given by$$\begin{aligned} \chi^2 &=\sum \frac{\left(0_i-E_i\right)^2}{E_i} \\ &=\frac{(109-108.7)^2}{108.7}+\frac{(65-66.3)^2}{66.3}+\frac{(22-20.2)^2}{20.2}+\frac{(4-4.8)^2}{4.8} \\ &=0.331 \end{aligned}$$The degrees of freedom for this test are equal to $k-m-1, $ where $k$ is number of cells and $ m$ is the number of parameters we estimated from the data. In this case, $m=1$ because we estimated one parameter, the population mean $\mu$ from the data. So, $df=4-1-1=2$.

Next we look up the $p-$value for the test statistics on Excel or in a $\chi^2-$table. The $p-$value is the probability of observing a test statistic as extreme as the one we observed, assuming the null hypothesis is true. In this case, Excel tells us that the $p-$value is $0.847$.$$ p-value =0.847 \qquad 0.50 \leq p-value \leq 0.90 $$Observing a dataset as extreme as the one we observed is not rare. Since the $p-$value is more than $0.05$, we fail to reject the null hypothesis $H_0$. The null hypothesis is that the Poisson is an appropriate distribution. There is no indication that the Poisson distribution is not an appropriate fit for this data.

Example 2

Consider again the dataset consisting of $200$ families with $3$ children each. Let $X=$ the number of boys in a family. A standard computation yields$$\overline{X}=1.52; \quad s^2=0.753; \quad CD=0.5 $$The coefficient of dispersion $CD$ is less than 1, so we expect that the Binomial distribution is a good fit for this data. We estimate the parameter $p = \frac{\mu}{n}$ with $p=\frac{\overline{X}}{3}=0.5067.$ Next we compute the probabilities and the expected frequency for each category using the Binomial formula with parameter $p=0.5067$ and $n=3$. The expected frequency for each category is given by $E_i=200\cdot P(x_i)$.$$ \begin{array}{c|cccc} x & 0 & 1 & 2 & 3 \\ \hline \text { Observed freq. } & 24 & 74 & 76 & 26 \\ \hline P(x) & 0.120 & 0.370 & 0.380 & 0.130 \\ \hline \text { Predicted freq. } & 24.01 & 73.99 & 75.99 & 26.01 \end{array} $$$$ \chi^2 =\frac{(24-24.01)^2}{24.01}+\cdots+\frac{(26-26.01)^2}{26.01}=0.00002; \quad df=4-1-1=2 $$The $p-$value for the test statistics is$$ p-value =0.9999; \quad p-value > 0.95 $$The null hypothesis is that the Binomial is an appropriate distribution. Since the $p-$value is more than $0.05$, we fail to reject the null hypothesis. No indication that the Binomial distribution is not appropriate.

Example 3

Consider again the number of aquatic invertebrates in a quadrat on the lake bottom. In the original table the count in the last category was less than 3. We merged the last two categories. A standard computation yields$$\overline{X}=0.68; \quad s^2=0.795; \quad CD=1.17 $$The coefficient of dispersion $CD$ is larger than 1, so we expect that the negative binomial distribution is a good fit for this data. To estimate of the parameters of the negative binomial distribution, we use the formulas from the previous lecture. We estimate the parameter $p=\frac{\overline{X}}{s^2}=0.856$ and $r=\frac{\overline{X}^2}{s^2-\overline{X}}=4.04$. Next we compute the probabilities and the expected frequency for each category using Excel.$$ \begin{array}{c|ccccc} x & 0 & 1 & 2 & 3 & 4\\ \hline \text { Observed freq. } & 213 & 128 & 37 & 18 & 4 \\ \hline p(x) & 0.533 & 0.310 & 0.113 & 0.033 & 0.008 \\ \hline \text { Predicted freq. } & 213.4 & 124.1 & 45.1 & 13.1 & 3.3 \end{array} $$The value for the test statistics is$$ \chi^2=\frac{(213-213.4)^2}{213.4}+\cdots+\frac{(4-3.3)^2}{3.3}=3.560 \quad df=4-2-1=1 $$The $p-$value for the test statistics is$$ p-value =0.0592; \quad 0.05 \leq p-value \leq 0.10 $$The null hypothesis is that the Negative Binomial is an appropriate distribution. Since the $p-$value is more than $0.05$, we fail to reject the null hypothesis. We found no sufficient evidence to claim that the Negative Binomial distribution is not an appropriate model for this data. Notice however that the $p-$value is very close to $0.05$ and our conclusion is very weak.