Lecture 3

FrequencyFrequencyDistributionsDistributions

FrequencyFrequencyTableTableofofAAQualitativeQualitativeVariableVariable

Example 1

Fish caught and released in the Lake of Two Mountains in four days. (If the charts are not properly displayed in portrait mode on your cellphone, please switch to landscape mode.)

FrequencyFrequencyTableTableofofAADiscreteDiscreteVariableVariable

Example 2

Number of poison ivy plants per quadrat in a suburban park.

FrequencyFrequencyTableTableofofAAContinuousContinuousVariableVariable

Remark

For continuous variables, we need to group the data into classes (bins). The number of classes depends on the size of the dataset. For small datasets, we can use 4-8 classes. For large datasets, we can use up to 20 classes. The class width is calculated as follows:$$ \text { Class Width } = \frac {\text { Largest value - Smallest value }}{\text { Number of Classes }} $$The class width is then rounded up to the nearest number reflecting the precision of the data.

Example 3

Size of ladybugs that I found in my apple trees (in mm).

$$ \begin{array}{llllllllll} 7.6 & 7.2 & 8.6 & 7.0 & 8.6 & 6.6 & 8.6 & 7.8 & 8.6 & 7.6 \\ 8.2 & 7.8 & 8.8 & 7.6 & 9.4 & 7.2 & 8.2 & 8.8 & 9.0 & 7.2 \\ 7.6 & 8.2 & 7.2 & 8.4 & 7.8 & & & & & \end{array} $$

Let`s group the data into 5 classes (use up to 20 classes for really large datasets).

$$\text { Class Width } \quad \frac {9.4-6.6}{ 5}=0.56 \quad \rightarrow \quad 0.6$$

Note that this is a bimodal distribution with two peaks. Are these the two sexes (males tend to be smaller)?

Example 4

Distributions (histograms) could be bell-shaped, skewed or multimodal based on their shape.

Click each of the legends to ``turn-off`` the distribution

StemStemandandLeafLeafDisplaysDisplays

Remark

A stem and leaf plot is used to organize data as they are collected. A stem and leaf plot looks something like a bar graph. Each number in the data is broken down into a stem and a leaf, thus the name. The stem of the number includes all but the last digit. The leaf of the number will always be a single digit.

Example 5

Ladybug data

$$ \begin{array}{llllllllll} 7.6 & 7.2 & 8.6 & 7.0 & 8.6 & 6.6 & 8.6 & 7.8 & 8.6 & 7.6 \\ 8.2 & 7.8 & 8.8 & 7.6 & 9.4 & 7.2 & 8.2 & 8.8 & 9.0 & 7.2 \\ 7.6 & 8.2 & 7.2 & 8.4 & 7.8 & & & & & \end{array} $$

Stem & Leaf Plot

$$ \begin{array}{l|llllllllllll} 6 & 6 & & & & & & & & & & & \\ 7 & 6 & 2 & 0 & 8 & 6 & 8 & 6 & 2 & 2 & 6 & 2 & 8 \\ 8 & 6 & 6 & 6 & 6 & 8 & 2 & 8 & 8 & 2 & 4 & & \\ 9 & 4 & 0 & & & & & & & & & & \end{array} $$

Example 6

Back-to-back stem and leaf plot

$$ \begin{array}{r|c|l} \text{ Sample A} & & \text{ Sample B } \\ \hline 9 & 10 & 0\quad 5 \quad7\quad 7\quad8 \\ 9\quad 9\quad 8\quad 6\quad 2 & 11 & 1\quad 6 \\ 8\quad 0\quad 8\quad 1\quad 3\quad 7 & 12 & 0 \quad 1\quad 3 \end{array} $$

These two samples clearly differ in distribution.