Lecture 14

TheTheNormalNormal(Gaussian)(Gaussian)DistributionDistribution

BellBellshapedshapedcurvescurves

Remark

Continuous values cannot be measured precisely and it is not possible to attach probabilities to individual values. Instead, we can attach probabilities to intervals of values. These probabilities are described as the area under a curve and above the interval.

Very frequently, but not always, the curve is a bell-shaped curve, called a Gaussian curve. Later in the course we will see the why the Gaussian curve is so important, why it is so frequently used, and how to check that the Gaussian curve is an appropriate model for data. For the moment, we will just use it as a convenient way to describe the probability of obtaining a continuous random value within a given interval.

Definition 1

The Gaussian curve is a bell-shaped curve given by the equation$$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} $$It is used to describe the probability of obtaining a continuous random value within a given interval.

The curve is described by two parameters, the mean $\mu$ and the standard deviation $\sigma$. The total area, in yellow, under the curve is 1. We will use the designations Gaussian, normal, and bell-shaped interchangeably.

Remark

The mean $\mu$ is the value around which the curve is centered. Curves with different means are shifted to the left or right.

The standard deviation $\sigma$ is a measure of the spread of the curve. Curves with different standard deviations are wider or narrower.

Remark

For any continuous random variable modelled by the Gaussian distribution the probability of obtaining a value within one standard deviation from the mean is approximately 68.26%.

The probability of obtaining a value within two standard deviations from the mean is approximately 95.44%.

The probability of obtaining a value within three standard deviations from the mean is approximately 99.74%.

Remark

We can also ask the question: what interval symmetric around the mean contains a given percentage of the values? For example, we can ask what interval contains 95% of the values. The answer is that the interval which is 1.96 standard deviations wide contains 95% of the values.

Similarly, the interval which is 2.58 standard deviations wide contains 99% of the values.

TheThestandardstandardnormalnormaldistributiondistribution

Definition 2

The standard normal distributionis the Gaussian distribution with mean $\mu = 0$ and standard deviation $\sigma = 1$. A value from the standard normal distribution is typically denoted by $z$.

Example 1

Probabilities for the standard normal distribution are tabulated in the standard normal table. This probabilities can also be looked up with the help of a computer software, e.g. Excel.
For example, the probability of obtaining a standard normal value less than $-0.44$ is $0.33$.

Probabilities for values which are very far from the mean and fall ouside the range of the standard normal table can be be approximated:$$ P(z<-4.47) \approx 0,\quad P(Z<6.2) \approx 1.$$Probability of obtaining a standard normal value greater than a given value is the complement of the probability of obtaining a value less than the given value. For example, the probability of obtaining a standard normal value greater than $-0.44$ is $P(z>-0.44) = 1- P(z<-0.44) = 1-0.33=0.67$.

The probability that a standard normal values wil fall within a given interval can be calculated by subtracting the probability of obtaining a value less than the lower limit of the interval from the probability of obtaining a value less than the upper limit of the interval. For example $P(-0.44<z<1.96)=P(z<1.96)-P(z<-0.44)=0.975-0.33=0.645$.

Z-scoresZ-scores

Theorem

If observation $X$ is taken from a normal distribution with mean $\mu$ and standard deviation $\sigma$, then the z-scores$$ z = \frac{X-\mu}{\sigma} $$are also normally distributed with mean $\mu = 0$ and standard deviation $\sigma = 1$.

Example 2

The above theorem allows us to transfer probabilities between the standard normal distribution and any other normal distribution. Consider the normal distribution with mean $\mu = 2.3$ and standard deviation $\sigma = 0.8$. What is the probability of obtaining a value less than $1.1$?

We can calculate the z-score for $1.1$:$$ z = \frac{1.1-2.3}{0.8} = -1.5 $$The probability of obtaining a value less than $1.1$ is the probability of obtaining a value less than the z-score $-1.5$:$$ P(X<1.1) = P(z<-1.5) = 0.0668$$

Remark

The z-score is a measure of how many standard deviations a value is from the mean. The z-score is positive if the value is above the mean and negative if the value is below the mean. The z-score is zero if the value is equal to the mean.

Example 3

The weights of the Hermit Thrushes are normally distributed with mean $\mu = 28$ grams and standard deviation $\sigma = 4.8$ grams. How likely is it that a Hermit Thrush caught for banding has a weight of more than $34$ grams?

We can calculate the z-score for $34$:$$ z = \frac{34-28}{4.8} = 1.25 $$The probability of obtaining a value larger than $34$ in this population is the probability of obtaining a value larger than $1.25$ standard deviations above the mean:$$ P(X>34) = P(z>1.25) = 1-P(Z<1.25) = 1- 0.8944 = 0.1056$$

Example 4

The common garter snake which is the state reptile of Massachusetts has litter sizes that are approximately normally distributed with mean $\mu = 64$ and standard deviation $\sigma = 25$. What proportion of litters are in the range $[30, 80]$?

We need to calculate the z-scores for both interval bounds:$$ z_1 = \frac{30-64}{25} = -1.36, \quad z_2 = \frac{80-64}{25} = 0.64$$$$ P(30\leq X \leq 80) \approx P(-1.36\leq z \leq 0.64) = 0.7389 - 0.0869 = 0.6520 $$

PercentilesPercentilesandandz-scoresz-scores

Remark

The z-score describes the location of a value in a normal distribution in terms of standard deviations from the mean. But we can also describe the location of a value in terms of percentiles. The percentile of a value is the percentage of values in the distribution that are less than the value. The z-score and the percentile are related and we can use the z-score to find the percentile and vice versa. For example, the z-score for the $95$th percentile is $1.96$.

We can also relate the z-score to an actual value in the distribution by using the mean and standard deviation.$$ X = \mu + z\sigma $$This equation simply follows from the definition of the z-score by using simple algebra to solve for the value $X$.

Example 5

Consider a normal population with mean $\mu = 24.2$ and standard deviation $\sigma = 3.5$. What is the $10$th percentile, $P_{10}$ and what is the $75$th percentile, $P_{75}$ of this population?

We can look up the z-score for the $10$th percentile: $ z = -1.28 $. The value at the $10$th percentile is$$ X = 24.2 - 1.28 \times 3.5 = 19.72 $$

Similarly the z-score for the $75$th percentile is $ z = 0.67 $. The value at the $75$th percentile is$$ X = 24.2 + 0.67 \times 3.5 = 26.55 $$

Example 6

The adult grey wolf males in a certain area of Northern Ontario have average weight of $\mu = 44$ kg and standard deviation $\sigma = 3.5$ kg. Determine the bottom 1% and the top 1% of the weight distribution.

For both the bottom and the top 1% we need to look up the z-scores and then use the z-scores to find the actual weights.$$ P_1 \rightarrow z = -2.33 \rightarrow X = 44 - 2.33\times 3.5 = 35.85\ kg$$The top 1% is the 99th percentile.$$ P_{99} \rightarrow z = 2.33 \rightarrow X = 44 + 2.33\times 3.5 = 52.15\ kg $$

Example 7

In the same context as in the previous example, a random adult male grey wolf is captured in this area of Northern Ontario and its weight is measured. The weight of the wolf is $X = 34$ kg. Is the event of finding a wolf of this weight unlikely?

Yes, the event of finding a wolf of this weight is highly unlikely. This wolf's weight is in the bottom 1% of the weight distribution.

Lecture 13 Lecture 15