Lecture 5

MeasuresMeasuresofofDispersionDispersion

Definition 1

Therangeis the difference between the largest and smallest values in the sample.

Example 1

Ladybug sizes (mm):$$\begin{array}{llllllll} 7.6 & 8.4 & 8.8 & 7.5 & 9.2 & 9.0 & 8.9 & 6.2\end{array}$$$ \text{Range} = 9.2-6.2 = 3.0 \, \mathrm{mm}$

Remark

A much more useful quantity to measure the variability (dispersion) of the values is standard deviation.

Definition 2

Let $Y_1,Y_2,\dots,Y_n $ be sample values and let $\overline{Y}=\frac{1}{n}\sum Y_i$ be the sample mean.

Then the sample variance is defined via the following formula (the second formula is obtained via an algebraic identity from the first)$$s^2 = \frac{1}{n-1}\sum(Y_i-\overline{Y})^2 = \frac{1}{n-1}\left[\sum Y_i^2 - n\overline{Y}^2\right] $$

The sample standard deviation is defined as follows$$s=\sqrt{s^2} = \sqrt{\frac{1}{n-1}\sum(Y_i-\overline{Y})^2} = \sqrt{\frac{1}{n-1}\left[\sum Y_i^2 - n\overline{Y}^2\right]} $$

Remark

The formula for the sample variance has a denominator of $(n-1)$. There are $(n-1)$ independent deviations from the average. Indeed, by the very definition of average the sum of all deviations from the average is zero: $\sum\left(Y_i-\overline{Y}\right)=0.$ In technical terms we say that there are $(n-1)$ degrees of freedom.

Example 2

Ladybug data

$\overline{Y}=\frac{1}{8}\left(7.6+7.8+\cdots +6.2 \right)=8.2 $$$ \begin{array}{c|c|c} Y_i & Y_i-\overline{Y} & \left(Y_i-\overline{Y}\right)^2 \\ \hline 7.6 & -0.6 & 0.36 \\ 8.4 & 0.2 & 0.04 \\ 8.8 & 0.6 & 0.36 \\ 7.5 & -0.7 & 0.49 \\ 9.2 & 1.0 & 1.0 \\ 9.0 & 0.8 & 0.64 \\ 8.9 & 0.7 & 0.49 \\ 6.2 & -2.0 & 4.00 \\ \hline && 7.38 \end{array}$$$\begin{aligned} s^2&=\frac{1}{8-1} \left( 0.36+0.04+\cdots +4.00 \right) \\&=\frac{1}{7} (7.38) = 1.054286 \\ &\\ s&=\sqrt{1.054286} = 1.207 \mathrm{mm} \end{aligned}$

Remark

It is possible to get a rough estimate for the standard deviation from the range. Namely, the standard deviation is approximated as the range divided by 4. We are not going to use this approximation, since it could be quite inaccurate.

Remark

What happens to the sample mean $\overline{Y}$ and the standard devition $s$ if we:


1. Shift all values $\lbrace Y_1, \dots, Y_n \rbrace \rightarrow \lbrace Y_1+a, \dots, Y_n+a \rbrace $?


Then, $\overline{Y}\rightarrow \overline{Y}+a$ and $s\rightarrow s.$

Thus, the mean shifts but the standard deviation does not. Adding the same quantity to all values in the sample shifts the mean by the same quantity. On the other hand the standard deviation is a measure of the spread of the values and does not change.


2. Scale all values $\lbrace Y_1, \dots, Y_n \rbrace \rightarrow \lbrace k\cdot Y_1, \dots, k\cdot Y_n \rbrace $?


Then, $\overline{Y}\rightarrow k\cdot \overline{Y}$ and $s\rightarrow k\cdot s.$

Thus, if we rescale all sample values by the same factor $k$, then both the mean and standard deviation scale by the same factor. For example, such rescaling happens if we change the units of measurement.

Remark

The value of the standard deviation depends on the units used. To normalize the standard deviation, in many contexts it is meaningful to divide it by the mean. This gives us the coefficient of variation which is a dimensionless quantity (no units can be assigned to it).

Definition 3

Thecoefficient of variation is defined as follows

$$CV=\frac{s}{\overline{Y}}$$

Example 3

Fishing in the Lake of Two Mountains over the weekend, you caught six pumpkinseed which weighed (g)$$ \begin{array}{llllll} 540 & 520 & 460 & 600 & 580 & 570 \end{array}$$and five brown bullhead with weights (g)$$ \begin{array}{llllll} 850 & 2120 & 3160 & 940 & 1050 \end{array}$$Based on these samples compare the coefficients of variation of the weights of the two species.

Solution

Pumpkinseed

$$ \begin{array}{c|c } Y & (Y-\overline{Y})^2 \\ \hline 540 & 25 \\ 520 & 625 \\ 460 & 7225 \\ 600 & 3025 \\ 580 & 1225 \\570 & 625 \\ \hline 3270 & 12750 \end{array} $$$\overline{Y}_P =\frac{3270}{6}=545 \quad s^2_P =\frac{1}{5}(12750)=2550 \quad s_P=50.4975 $$$CV_P=\frac{50.4975}{545}=0.093 = 9.3\% $$

Brown Bullhead

$$ \begin{array}{c|c } Y & (Y-\overline{Y})^2 \\ \hline 850 & 599076 \\ 2120 & 246016 \\ 3160 & 2359296 \\ 940 & 467856\\ 1050 & 329476 \\ \hline 8120 & 4001720 \end{array} $$$\overline{Y}_B =\frac{8120}{5}=1620 \quad s^2_B =\frac{1}{4}(4001720)=1000430 \quad s_B=1000.2198 $$$CV_B=\frac{1000.2198}{1620}=0.6159 = 61.59\% $$Comparing the coefficients of variation, we see that the brown bullhead has a much higher variability in weight than the pumpkinseed.