Chapter 12 Bayes’ Theorem
12.1 Introduction
Now that you have an idea of how simple, complex, and conditional probabilities work, it is time to introduce a new formula called Bayes’ Theorem. This formula, although a bit more complicated than the others, can be incredibly useful. Sometimes, we know the probability of A given B, but need to know the probability of B given A. Bayes’ Theorem provides a way of converting one to the other.
For example, imagine that you have recently donated a pint of blood to your local blood bank. You receive in the mail a letter informing you that your blood has tested positive for HIV antibodies. The letter informs you that you could have AIDS. How worried should you be?
You need to know the probability that you have the disease given a positive test for the disease, or \(\Pr(D \vert +)\). Now, if you contact the company that produces the test, they will be glad to give you some information about the test. Each test has certain true positive, false positive, true negative, and false negative rates. These rates have been determined by extensive testing.
The true positive rate is the percentage of times that the test will correctly identify the samples that have the disease. The false positive rate is the percentage of times that the test will say that the disease is present when it really is not. The true negative rate is the percentage of times that the test says negative when the #ject does not have disease. The false negative rate is the rate of positive tests when the #ject did not have the disease.
In terms of probabilities, it looks like this:
True Positive: \(\Pr(+ \vert D)\) False Positive: \(\Pr(+ \vert \neg D)\) True Negative: \(\Pr(- \vert \neg D)\) False Negative \(\Pr(- \vert D)\)
If you know the true positive rate and the true negative rate, you can figure out the other two. The false negative rate is equal to one minus the true positive rate, and so on.
The blood bank is concerned about contamination of the blood supply. Therefore, they want a test that has a false negative rate of zero. They aren’t as concerned about the false positive rate, though. They don’t want a high false positive rate, because then they would end up throwing out blood that was just fine. If they have to needlessly dispose of a few pints out of the many that they receive, though, it’s no great loss. The other thing they need is an inexpensive test. They have to test every single donation, so the test they use must be one they can afford. It turns out that tests with good false negative rates are fairly inexpensive, but tests with low false positive rates are not. The blood bank takes the cheaper test, because it adequately keeps the blood supply safe, even though a few donations test positive that really were not contaminated.
So, the blood bank gives you the false positive rate: \(\Pr(+ \vert \neg D)\), but you need to know \(\Pr(D \vert +)\). What do you do? You use Bayes’ Theorem.
12.2 The Theorem
The short form of Bayes’ Theorem is this: \[ \Pr (A\vert B)= \frac{\Pr (A)\times \Pr (B \vert A)}{\Pr (B)} \]
Here is an expanded version of the same formula: \[ \Pr (A\vert B)= \frac{\Pr (A)\times \Pr (B \vert A)}{\Pr (A)\times \Pr (B \vert A) + \Pr (\neg A)\times \Pr (B \vert \neg A)} \]
Believe it or not, it’s often easier to use the expanded version than the shorter version. In fact, you will very rarely use the shorter version. It’s generally easier to just automatically begin working the problem with the longer version.
12.3 Examples
Let’s look at some examples, beginning with a very easy one. This is a problem that you wouldn’t need the formula to solve, but it helps us understand how the formula works.
- What is the probability that a card is a heart given that it is red?
We know that the answer to this problem is 1/2. Half of the red cards are hearts. Just to get used to the formula, we’ll solve it using Bayes’ Theorem:
\[ \Pr(H \vert R) = \frac{\Pr (H)\times \Pr (R \vert H)}{\Pr (R)} \]
Since, 1/4 of all cards in a deck are hearts, \(\Pr(H)=1/4\). All hearts are red, so \(\Pr(R \vert H)=1\), and half of the cards are red, so \(\Pr(R)=1/2\).
So,
\[ \begin{aligned} \Pr(H \vert R) &= \frac{\Pr (H)\times \Pr (R \vert H)}{\Pr (R)}\\ &= \frac{\frac{1}{4} \times 1}{\frac{1}{2}}\\ &= \frac{\frac{1}{4}}{\frac{1}{2}}\\ &=\frac{1}{2} \end{aligned} \]
The longer formula will give us the same answer:
\[ \Pr (H\vert R)= \frac{\Pr (H)\times \Pr (R \vert H)}{\Pr (H)\times \Pr (R \vert H) + \Pr (\neg H)\times \Pr (R \vert \neg H)} \]
The only additions are the probability that a card is not a heart, which is 3/4; and the probability that a card is red, given that it is not a heart, which is 1/3 (If we don’t include the hearts, there are three suits, one of which is red).
So,
\[ \begin{aligned} \Pr (H\vert R) &= \frac{\Pr (H)\times \Pr (R \vert H)}{\Pr (H)\times \Pr (R \vert H) + \Pr (\neg H)\times \Pr (R \vert \neg H)}\\ &= \frac{\frac{1}{4}\times 1}{\left( \frac{1}{4}\times 1 \right) + \left( \frac{3}{4} \times \frac{1}{3} \right)}\\ &= \frac{\frac{1}{4}}{ \left(\frac{1}{4} + \frac{1}{4} \right)}\\ &=\frac{1}{2} \end{aligned} \]
Now, for a more difficult one. Here’s a question you might remember from the pre-test:
Exactly two cab companies operate in Belleville. The Blue Company has blue cabs, and the green Company has Green Cabs. Exactly 85% of the cabs are blue and the other 15% are green. A cab was involved in a hit-and-run accident at night. A witness, Wilbur, identified the cab as a Green cab. Careful tests were done to ascertain peoples’ ability to distinguish between blue and green cabs at night. The tests showed that people were able to identify the color correctly 80% of the time, but they were wrong 20% of the time. What is the probability that the cab involved in the accident was indeed a green cab, as Wilbur says?
The probability that a cab is blue is 0.85, and the probability that it is green is 0.15. The probability that Wilbur will say it is green if it is in fact green is 0.80. The probability that Wilbur will not say it is in green if it is in fact green is 0.20. Symbolized, if G = “The cab was green” and W = “Wilbur says the cab was green,” this is \(\Pr(G)=.15\), \(\Pr(\neg G)=.85\), \(\Pr(W \vert G)=.80\), and \(\Pr(W \vert \neg G)= .20\).
Using the expanded formula:
\[ \begin{aligned} \Pr (G\vert W) &= \frac{\Pr (G)\times \Pr (W \vert G)}{\Pr (G)\times \Pr (W \vert G) + \Pr (\neg G)\times \Pr (W \vert \neg G)}\\ &= \frac{.15 \times .80}{(.15 \times .80) + (.85 \times .20)}\\ &= \frac{.12}{.12 + .17} \\ &= \frac{.12}{.29} \\ &= .41 \textrm{(rounded off)} \end{aligned} \]
12.4 The Blood Donation
Now, we can solve the problem of the blood donor’s positive test. The probability that he has the disease give a positive test is a function of the probability that a person in the population has the disease, which is called the base-rate of the disease, the probability that a person tests positive if they have disease, which is the true positive rate for the test, and the probability that a person will test positive.
One current estimate that I read is that one million people in the USA now have AIDS. The most recent census reports the population as 304,059,724. This means that the base-rate, or the percentage of the population that has AIDS is 0.32%. False-positive rates for the most common, low-cost, AIDS test vary. They range from from 50% to 90%. A more expensive test, the Western Blot test appears to have a false positive rate of 4.8% of Western blood donors.
Let’s plug some data into the expanded formula:
\[ \Pr(D \vert +) = \frac{\Pr(D) \times \Pr( + \vert D)}{\Pr(D) \times \Pr( + \vert D) + \Pr(\neg D) \times \Pr( + \vert \neg D)} \]
Let’s go with the base-rate of .32%. So, \(\Pr(D) = .0032\). We’ll also assume the more expensive test, so \(\Pr(+ \vert \neg D) = .048\). Let’s also assume that the test is very good at catching all cases of the disease, so \(\Pr(+ \vert D) = 1\).
\[ \begin{aligned} \Pr(D \vert +) &= \frac{\Pr(D) \times \Pr( + \vert D)}{\Pr(D) \times \Pr( + \vert D) + \Pr(\neg D) \times \Pr( + \vert \neg D)} \\ &= \frac{.0032 \times 1}{(.0032 \times 1) + (.9968 \times .048)} \\ &= \frac{.0032}{.0032 + .0478} \\ &= \frac{.0032}{.051} \\ &= .063 \end{aligned} \]
So, assuming these numbers, there is only a 6.3% chance that you have AIDS given that you got a positive test for the disease, and this is assuming that the Western Blot was used and not the ELISA test, which has a much worse false-positive rate! There are more expensive and reliable ways to test for the disease, so if a person gets a positive result on one of these screening tests, they should not panic, but get the more expensive test. There have been tragic reports of people committing suicide because they got a positive result on one of the initial screening tests.
12.5 Hints
As with other probability problems, once the right numbers are plugged into the right formula, then the answers are generally easy to find. The most common problem is finding the right values in what looks like a complex paragraph.
Here’s an example conditional probability problem requiring Bayes’ Theorem:
1% of OBU students are philosophy majors. 90% of OBU philosophy majors are accepted into their preferred graduate program. 30% of OBU non-philosophy majors are accepted into their preferred graduate program. Jane is an OBU student that was accepted into her preferred graduate program. What is the probability that she is a philosophy major?
Let, P = “An OBU student is a philosophy major” and A = “An OBU student was accepted into her preferred graduate program.”
The first step is to determine the conditional probability that the problem is asking us to solve. The first part is generally easy, just look for the question. In this case, “What is the probability that she is a philosophy major?” To find the given, look for the one thing that is known for certain; it won’t be a probability or percentage. We know that she was accepted into her preferred graduate program. So, we want to know the probability that Jane is a philosophy major given that she was accepted into her preferred graduate program, or \(\Pr(P\vert A)\). Once that is determined, then simply write out the formula:
\[ \Pr(P \vert A) = \frac{\Pr(P) \times \Pr( A \vert P)}{\Pr(P) \times \Pr( A \vert P) + \Pr(\neg P) \times \Pr( A \vert \neg P)} \]
Now, we have to find the numbers to plug into the formula. Many, if not most, of the problems are stated in terms of percentages. The probability of A given B is a function of the percentage of B’s that are A’s. That is, if A’s comprise half of the B’s, then \(\Pr (A \vert B) = 0.5\)
So,
- \(\Pr(A \vert P) = 0.9\)
- \(\Pr(A \vert \neg P) = 0.3\)
- \(\Pr(P) = .01\)
\[ \begin{aligned} \Pr(P \vert A) &= \frac{\Pr(P) \times \Pr( A \vert P)}{\Pr(P) \times \Pr( A \vert P) + \Pr(\neg P) \times \Pr( A \vert \neg P)} \\ &= \frac{.01 \times .9}{(.1 \times .9) + (.99 \times .03)} \\ &= \frac{.009}{.009 + .0297} \\ &= \frac{.009}{.0387} \\ &= .258 \end{aligned} \]