5  Bayes Rule

The mechanism that underpins all of Bayesian statistical analysis is Bayes’ rule, which describes how to update uncertainty in light of new information, evidence, or data.

Example 5.1 A recent survey of American adults asked: “Based on what you have heard or read, which of the following two statements best describes the scientific method?”

  • 70% selected “The scientific method produces findings meant to be continually tested and updated over time”. (We’ll call this the “iterative” opinion.)
  • 14% selected “The scientific method identifies unchanging core principles and truths”. (We’ll call this the “unchanging” opinion).
  • 16% were not sure which of the two statements was best.

How does the response to this question change based on education level? Suppose education level is classified as: high school or less (HS), some college but no Bachelor’s degree (college), Bachelor’s degree (Bachelor’s), or postgraduate degree (postgraduate). The education breakdown is

  • Among those who agree with “iterative”: 31.3% HS, 27.6% college, 22.9% Bachelor’s, and 18.2% postgraduate.
  • Among those who agree with “unchanging”: 38.6% HS, 31.4% college, 19.7% Bachelor’s, and 10.3% postgraduate.
  • Among those “not sure”: 57.3% HS, 27.2% college, 9.7% Bachelor’s, and 5.8% postgraduate
  1. Use the information to construct an appropriate two-way table.




  2. Overall what percentage of adults have a postgraduate degree? How is this related to the values 18.2%, 10.3%, and 5.8%?




  3. What percent of those with a postgraduate degree agree that the scientific method is “iterative”? How is this related to the values provided?




Bayes’ rule for events specifies how a prior probability \(P(H)\) of event \(H\) is updated in response to the evidence \(E\) to obtain the posterior probability \(P(H|E)\).

\[ P(H|E) = \frac{P(E|H)P(H)}{P(E)} \]

Example 5.2 Continuing Example 5.1. Randomly select an American adult.

  1. Consider the conditional probability that a randomly selected American adult agrees that the scientific method is “iterative” given that they have a postgraduate degree. Identify the prior probability, hypothesis, evidence, likelihood, and posterior probability, and use Bayes’ rule to compute the posterior probability.




  2. Find the conditional probability that a randomly selected American adult with a postgraduate degree agrees that the scientific method is “unchanging”.




  3. Find the conditional probability that a randomly selected American adult with a postgraduate degree is not sure about which statement is best.




  4. How many times more likely is it for an American adult to have a postgraduate degree and agree with the “iterative” statement than to have a postgraduate degree and agree with the “unchanging” statement?




  5. How many times more likely is it for an American adult with a postgraduate degree to agree with the “iterative” statement than to agree with the “unchanging” statement?




  6. What do you notice about the answers to the two previous parts?




  7. How many times more likely is it for an American adult to agree with the “iterative” statement than to agree with the “unchanging” statement?




  8. How many times more likely is it for an American adult to have a postgraduate degree when the adult agrees with the “iterative” statement than when the adult agree with the “unchanging” statement?




  9. How many times more likely is it for an American adult with a postgraduate degree to agree with the “iterative” statement than to agree with the “unchanging” statement?




  10. How are the values in the three previous parts related?




Bayes rule is often used when there are multiple hypotheses or cases. Suppose \(H_1,\ldots, H_k\) is a series of distinct hypotheses which together account for all possibilities, and \(E\) is any event (evidence). Then Bayes’ rule implies that the posterior probability of any particular hypothesis \(H_j\) satisfies

\[ P(H_j |E) = \frac{P(E|H_j)P(H_j)}{P(E)} \]

The marginal probability of the evidence, \(P(E)\), in the denominator can be calculated using the law of total probability

\[ P(E) = \sum_{i=1}^k P(E|H_i) P(H_i) \]

The law of total probability says that we can interpret the unconditional probability \(P(E)\) as a probability-weighted average of the case-by-case conditional probabilities \(P(E|H_i)\) where the weights \(P(H_i)\) represent the probability of encountering each case.

Combining Bayes’ rule with the law of total probability,

\[\begin{align*} P(H_j |E) & = \frac{P(E|H_j)P(H_j)}{P(E)}\\ & = \frac{P(E|H_j)P(H_j)}{\sum_{i=1}^k P(E|H_i) P(H_i)}\\ P(H_j |E) & \propto P(E|H_j)P(H_j) \end{align*}\]

In short, Bayes’ rule says

\[ \textbf{posterior} \propto \textbf{likelihood} \times \textbf{prior} \]

Bayes rule calculations are often organized in a Bayes’ table which illustrates “posterior is proportional to likelihood times prior”. The table has one row for each hypothesis and columns for

The process of conditioning can be thought of as “slicing and renormalizing”.

Example 5.3 Continuing Example 5.1. Randomly select an American adult. Now suppose we want to compute the posterior probabilities for an American adult’s perception of the scientific method given that the randomly selected American adult has some college but no Bachelor’s degree (“College”).

  1. Before computing, make an educated guess for the posterior probabilities. In particular, will the changes from prior to posterior be more or less extreme given the American has some college but no Bachelor’s degree than when given the American has a postgraduate degree? Why?




  2. Construct a Bayes table and compute the posterior probabilities. Compare to the posterior probabilities given postgraduate degree from the previous examples.




Example 5.4 Suppose that you are presented with six boxes, labeled 0, 1, 2, \(\ldots\), 5, each containing five marbles. Box 0 contains 0 green and 5 gold marbles, box 1 contains 1 green and 4 gold, and so on with box \(i\) containing \(i\) green and \(5-i\) gold. One of the boxes is chosen uniformly at random (perhaps by rolling a fair six-sided die), and then you will randomly select marbles from that box, without replacement. Based on the colors of the marbles selected, you will update the probabilities of which box had been chosen.

  1. Suppose that a single marble is selected and it is green. Which box do you think is the most likely to have been chosen? Make a guess for the posterior probabilities for each box. Then construct a Bayes table to compute the posterior probabilities. How do they compare to the prior probabilities?




  2. Now suppose a second marble is selected from the same box, without replacement, and its color is gold. Which box do you think is the most likely to have been chosen given these two marbles? Make a guess for the posterior probabilities for each box. Then construct a Bayes table to compute the posterior probabilities, using the posterior probabilities from the previous part after the selection of the green marble as the new prior probabilities before seeing the gold marble.




  3. Now construct a Bayes table corresponding to the original prior probabilities (1/6 each) and the combined evidence that the first ball selected was green and the second was gold. How do the posterior probabilities compare to the previous part?




  4. In the previous part, the first ball selected was green and the second was gold. Suppose you only knew that in a sample of two marbles, 1 was green and 1 was gold. That is, you didn’t know which was first or second. How would the previous part change? Should knowing the order matter? Does it?




5.1 Notes

5.1.1 Two-way table

hypothesis = c("iterative", "unchanging", "not sure")

prior = c(0.70, 0.14, 0.16)

evidence = c("HS", "College", "Bachelors", "Postgrad")

likelihood_E_given_H1 = c(0.313, 0.276, 0.229, 0.182) # given iterative
likelihood_E_given_H2 = c(0.386, 0.314, 0.197, 0.103) # given unchanging
likelihood_E_given_H3 = c(0.573, 0.272, 0.097, 0.058) # given not sure

n = 100000

twoway_table = data.frame(n * rbind( 
             prior[1] * likelihood_E_given_H1,
             prior[2] * likelihood_E_given_H2,
             prior[3] * likelihood_E_given_H3))

colnames(twoway_table) = evidence

twoway_table |>
  add_column(hypothesis, .before = 1) |>
  adorn_totals(c("row", "col")) |>
  kbl() |>
  kable_styling()
hypothesis HS College Bachelors Postgrad Total
iterative 21910 19320 16030 12740 70000
unchanging 5404 4396 2758 1442 14000
not sure 9168 4352 1552 928 16000
Total 36482 28068 20340 15110 100000

5.1.2 Bayes table, given postgraduate degree

hypothesis = c("iterative", "unchanging", "not sure")

prior = c(0.70, 0.14, 0.16)

# likelihood of evidence = postgraduate degree, given each hypothesis
likelihood = c(0.182, 0.103, 0.058) 

product = prior * likelihood

posterior = product / sum(product)

bayes_table = data.frame(hypothesis,
                         prior,
                         likelihood,
                         product,
                         posterior)

bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |>
  kable_styling()
hypothesis prior likelihood product posterior
iterative 0.70 0.182 0.1274 0.8432
unchanging 0.14 0.103 0.0144 0.0954
not sure 0.16 0.058 0.0093 0.0614
Total 1.00 0.343 0.1511 1.0000

5.1.3 Bayes table, given college

hypothesis = c("iterative", "unchanging", "not sure")

prior = c(0.70, 0.14, 0.16)

# likelihood of evidence = college, given each hypothesis
likelihood = c(0.276, 0.314, 0.272)

product = prior * likelihood

posterior = product / sum(product)

bayes_table = data.frame(hypothesis,
                         prior,
                         likelihood,
                         product,
                         posterior)

bayes_table |>
  adorn_totals("row") |>
  kbl(digits = 4) |>
  kable_styling()
hypothesis prior likelihood product posterior
iterative 0.70 0.276 0.1932 0.6883
unchanging 0.14 0.314 0.0440 0.1566
not sure 0.16 0.272 0.0435 0.1551
Total 1.00 0.862 0.2807 1.0000