Lecture 7

ConditionalConditionalProbability,Probability,Bayes'Bayes'RuleRule

ConditionalConditionalProbabilityProbability

Remark

Conditional probability is the probability of an event occurring given that another event has already occurred. Observing an event allows us to update our belief about the probabilities of all other events. The rule for these updates is called Bayes' rule. To build intuition for conditional probability, we will start with examples.

Example 1

Apple trees on a farm in Saint-Joseph-du-Lac:$$ \begin{array}{l|cc|c} & \text { Soft } & \text { Hard } & \\ \hline \text { Red } & 500 & 2500 & 3000 \\ \text { Green } & 1500 & 500 & 2000 \\ \hline & 2000 & 3000 & 5000 \end{array}$$What is the probability that a randomly selected apple tree bears red apples given that it bears hard apples?

Given the observation that the tree bears hard apples, we can restrict our attention to the second column of the table. We can see that there are 3000 hard apples in total, of which 2500 are red. Therefore, the probability that a randomly selected apple tree bears red apples given that it bears hard apples is$$P(R | H)=\frac{2500}{3000}=0.833 \quad \Rightarrow \quad 83.3\% $$Notice the notation $P(R | H)$ for this conditional probability. There are two events,$\ R$ and $H$, and the vertical bar $|$ separates the event we are interested in$\ R$ from the event that we are conditioning on $H$.

Notice the same conditional probability can be computed as a ratio of probabilities (instead of ratio of counts):$$P(R | H)=\frac{P(R\cap H)}{P(H)} = \frac{2500/5000}{3000/5000}=0.833 $$
Let us compute a few more conditional probabilities:$$P(S | R)=\frac{P(S\cap R)}{P(R)} = \frac{500/5000}{3000/5000}=0.167 $$$$P(R | S)=\frac{P(S\cap R)}{P(S)} = \frac{500/5000}{2000/5000}=0.25 $$

Definition 1

Theconditional probabilityof the event $A$ given that the event $B$ has occurred is defined as:$$P(A | B)=\frac{P(A\cap B)}{P(B)}$$

Remark

If we want to know the probability of $A$ and $B$ occurring together, we can use the following equation which is a rearrangement of the definition of conditional probability:$$P(A \cap B) = P(A | B)\ P(B)$$This equation has the following interpretation: the probability of $A$ and $B$ occurring is the probability of $A$ occurring given that $B$ has occurred times the probability of $B$ occurring.

Remark

Notice that in general $P(A | B) \neq P(B | A)$. In conditional probability, the order of the events matters, both in terms of the algebraic expression and in terms of the interpretation.

Example 2

This example is based on sample of fish in the Orinoco river. The sample contains 200 fish. The labels of the column headings are as follows:

  • $D $: Detritivore (feeds on dead organic material)
  • $O $: Omnivore (will eat just about anything)
  • $I $: Invertivore (only feeds on invertebrates)
  • $P $: Piscivore (eats other fish)
  • For the row labels we have $Y $ if the fish had an empty stomach and $N $ if the fish had a non-empty stomach.$$ \begin{array}{l|cccc|c} & D & O & I & P & \\ \hline Y & 12 & 1 & 24 & 5 & 42 \\ N & 38 & 69 & 16 & 35 & 158 \\ \hline & 50 & 70 & 40 & 40 & 200\end{array}$$We will compute a number of unconditional and conditional probabilities as frequency ratios. The first three are unconditional probabilities:$$P(Y) = \frac{42}{200} = 0.21, \quad P(O) = \frac{70}{200} = 0.35 $$$$ P(O\cup P) = \frac{70}{200} + \frac{40}{200} = 0.55$$The next six are conditional probabilities. Notice that switching the order of the events in the conditional probability changes the result:

    $ P(Y | O) = \frac{1}{70} = 0.0143: $ The probability that an Omnivore has an empty stomach is $1.43\% $.

    $ P(Y | D) = \frac{12}{50} = 0.24: $ The probability that a Detritivore has an empty stomach is $24\% $.

    $ P(N | P) = \frac{35}{40} = 0.875: $ The probability that an Piscivore has a non-empty stomach is $87.5\% $.

    $ P(O | Y) = \frac{1}{42} = 0.0238: $ The probability that a fish that has an empty stomach is Omnivore is $2.38\% $.

    $ P(D | Y) = \frac{12}{42} = 0.286: $ The probability that a fish that has an empty stomach is Detritivore is $28.6\% $.

    $ P(P | N) = \frac{35}{158} = 0.222: $ The probability that a fish that has an non-empty stomach is Piscivore is $22.2\% $.

    IndependenceIndependence

    Example 3

    Draw two cards $\it without\ replacement $ from a deck of 52 cards. What is the probability that the second card is a Spade given that the first card is a Spade? Since the first card is a Spade, there are 12 Spades left in the deck of 51 cards.$$ P(S_2 | S_1) = \frac{12}{51}$$Now, draw two cards $\it with\ replacement $ from a deck of 52 cards. What is the probability that the second card is a Spade given that the first card is a Spade?$$ P_(S_2 | S_1) = \frac{13}{52} = \frac{1}{4}$$The probability of drawing a Spade on the second draw (with replacement) is the same whether the first card was a Spade or not.

    Definition 2

    Two events $A$ and $B$ areindependentif the occurrence of one does not affect the probability of the occurrence of the other. There are three equivalent ways to express independence:
  • $P(A | B) = P(A)$
  • $P(B | A) = P(B)$
  • $P(A \cap B) = P(A)P(B)$
  • Example 4

    Consider the following data on eye color and gender.$$ \begin{array}{l|ccc|c} & \text{ Brown } & \text{ Blue } & \text{Green} \\ \hline \text{ M } & 40 & 30 & 10 & 80 \\ \text{ F } & 60 & 45 & 15 & 120 \\ \hline & 100 & 75 & 25 & 200 \end{array}$$We can compare unconditional and conditional probabilities to check for independence.$$ P(Br) = \frac{100}{200} = 0.5, \; P(Br | M) = \frac{40}{80} = 0.5, \; P(Br | F) = \frac{60}{120} = 0.5 $$$$ P(Bl) = \frac{75}{200} = 0.375, \; P(Bl | M) = \frac{30}{80} = 0.375, \; P(Bl | F) = \frac{45}{120} = 0.375 $$$$ P(Gr) = \frac{25}{200} = 0.125, \; P(Gr | M) = \frac{10}{80} = 0.125, \; P(Gr | F) = \frac{15}{120} = 0.125 $$This sample supports independence of eye color and gender. This is indeed approximately true in reality. Later in the course we will develop a robust test for independence, which takes into account the randomness of the sample.

    Example 5

    Refering back to the apple data above, we can check for independence of color and hardness. We will compare unconditional and conditional probabilities to check for independence.$$ P(R) = \frac{3000}{5000} = 0.6, \quad P(R | H) = \frac{2500}{3000} = 0.833 $$Since the two values differ, color and hardness are dependent. Indeed, observing that an apple is hard increases the probability that it is red.

    Bayes'Bayes'RuleRule

    Definition 3

    Bayes' ruleis a way to update our beliefs about the probability of an event given new information. It is a fundamental tool in data analysis, in statistics, in cognitive science and in machine learning. It is named after the Reverend Thomas Bayes, who first formulated it in the 18th century. The rule is a direct consequence of the definition of conditional probability.$$P(A | B)=\frac{P(A\cap B)}{P(B)}$$To obtain the rule, note that the event $B$ can be written as the union of $A\cap B$ and $A'\cap B$, where $A'$ is the complement of $A$. Indeed, any sample point in $B$ is either in $A$ and $B$ or in $A'$ and $B$. Correspondingly,$$P(B)=P(A\cap B)+P(A'\cap B) = P(B|A)P(A) + P(B|A')P(A')$$Substituting the second equation into the first, we get Bayes' rule:$$P(A | B)=\frac{P(B|A)\ P(A)}{P(B|A)\ P(A)+P(B|A')\ P(A')}$$Notice that Bayes' rule allows us to update our belief about the probability of $A$ given that $B$ has occurred: $P(A)$ is updated to $P(A | B)$.

    Remark

    Notice that in general $P(A | B) \neq P(B | A)$. In conditional probability, the order of the events matters, both in terms of the algebraic expression and in terms of the interpretation.

    Example 6

    $57\% $ of the citizens of Brossard have cats. $17\% $ of the citizens who have cats also have dogs, while $48\% $ of the citizens who do not have cats have dogs.
  • What is the proportion of citizens of Brossard who have dogs?
  • What is the probability that a citizen of Brossard has a cat given that they have a dog?
  • How does owning a dog affect the probability of owning a cat in Brossard?
  • Solution

    As a first step we will translate the information into probabilities:$$P(C)=0.57, \quad P(D|C)=0.17, \quad P(D|C')=0.48$$The proportion of citizens of Brossard who have dogs is the sum of the proportion of citizens who have cats and dogs and the proportion of citizens who have dogs but no cats:$$P(D)=P(D|C)P(C)+P(D|C')P(C')=0.17\times 0.57+0.48\times (1-0.57)=0.3033$$The probability that a citizen of Brossard has a cat given that they have a dog is:$$P(C|D)=\frac{P(D|C)P(C)}{P(D)}=\frac{0.17\times 0.57}{0.3033}=0.3195$$Owning a dog decreases the probability of owning a cat in Brossard from $57\% $ to $31.95\% $.

    Example 7

    The prevalence of prostate cancer in white males is $110$ in $100,000$. A test for prostate cancer has the following properties:
  • The probability of a positive test for a person with cancer is $0.97$. This is the sensitivity of the test.
  • The probability of a negative test for a person without cancer is $0.94$. This is the specificity of the test.
  • What is the probability that a white male who had a positive test has prostate cancer?

    Solution

    Again, as a first step we will translate the information into probabilities:$$P(C)=0.0011, \quad P(T|C)=0.97, \quad P(T'|C')=0.94$$Now answering the questions is a matter of applying Bayes' rule:$$P(C|T)=\frac{P(T|C)P(C)}{P(T|C)P(C)+P(T|C')P(C')}=$$$$\frac{0.97\times 0.0011}{0.97\times 0.0011+(1-0.94)\times (1-0.0011)}=0.0175$$So observing the positive test increases the probability of having prostate cancer from $0.11\% $ to $1.75\% $. Since the initial probability was very low, the updated probability is still low.

    Example 8

    K2 is the most dangerous mountain to climb in the world. Climbing attempts are made by experienced climbers who are aware of the risks. Still only $29\%$ of the climbers use oxygen tanks. Every climber who has used oxygen tanks on K2 has survived, while $17\%$ of the climbers who have not used oxygen tanks have perished. What is the probability that a climber who has survived K2 has not used oxygen tanks?

    Solution

    This is a matter of applying Bayes' rule. However instead of using the formula directly we will take an equivalent approach based on a contingency table. We will start filling in this contingency table with the given information and then complete it step-by-step.

    Without loss of generality we can assume that there are $100$ climbers. The numbers in yellow represent the given information.$$ \begin{array}{l|cc|c} & \text { Oxygen } & \text { No Oxygen } & \\ \hline \text { Survived } & 29 & 59 & 88 \\ \text { Perished } & \textcolor{yellow}{0} & \textcolor{yellow}{12} & 12 \\ \hline & \textcolor{yellow}{29} & 71 & \textcolor{lightblue}{100} \end{array}$$Here is how we complete the table:
  • Since $29\%$ of the climbers use oxygen, the total number in the Oxygen column is $29.$
  • Since the total count is $100$ the marginal in the second column is $71.$
  • No climbers who have used oxygen have perished which accounts for the $0$ in the first column, second row
  • Since $17\%$ of the climbers who have not used oxygen tanks have perished, the total number in the second column - second row is $71\times 0.17 = 12$
  • The rest of the values in the table follow by summation/substraction of the available values


  • Now we can easily compute the required conditional probability:$$P(O' | S)=\frac{P(S\cap O')}{P(S)} = \frac{59}{88} = 0.67 $$