1.1 The Bayes’ rule

As expected the point of departure to perform Bayesian inference is the Bayes’ rule,1 that is, the conditional probability of \(A_i\) given \(B\) is equal to the conditional probability of \(B\) given \(A_i\) times the marginal probability of \(A_i\) over the marginal probability of \(B\),

\[\begin{align} P(A_i|B)&=\frac{P(A_i,B)}{P(B)}\\ &=\frac{P(B|A_i) \times P(A_i)}{P(B)}, \tag{1.1} \end{align}\]

where by the law of total probability \(P(B)=\sum_i P(B|A_i)P(A_i)\neq 0\), \(\left\{A_i, i=1,2,\dots\right\}\) is a finite or countably infinite partition of a sample space.

In the Bayesian framework, \(B\) is sample information that updates a probabilistic statement about an unknown object \(A_i\) following probability rules. This is done by means of the Bayes’ rule using prior “beliefs” about \(A_i\), that is, \(P(A_i)\), sample information relating \(B\) to the particular state of the nature \(A_i\) through a probabilistic statement, \(P(B|A_i)\), and the probability of observing that specific sample information \(P(B)\).

Let’s see a simple example, the base rate fallacy:

Assume that the sample information comes from a positive result from a test whose true positive rate (sensitivity) is 98%, \(P(+|\text{disease})=0.98\). On the other hand, the prior information regarding being infected with this disease comes from a base incidence rate that is equal to 0.002, that is \(P(\text{disease})=0.002\). Then, what is the probability of being actually infected?

This is an example of the base rate fallacy, where having a positive test result from a disease whose base incidence rate is tiny gives a low probability of actually having the disease.

The key to answer the question is based on understanding the difference between the probability of having the disease given a positive result, \(P(\text{disease}|+)\), versus the probability of a positive result given the disease, \(P(+|\text{disease})\). The former is the important result, and the Bayes’ rule help us to get the answer. Using the Bayes’ rule (equation (1.1)):

\[\begin{equation} P(\text{disease}|+) = \frac{P(+|\text{disease})\times P(\text{disease})}{P(+)} = \frac{0.98 \times 0.002}{0.98 \times 0.002 + (1-0.98) \times (1-0.002)}=0.09, \end{equation}\]

where \(P(+)=P(+|\text{disease})\times P(\text{disease})+P(+|\lnot\text{disease})\times P( \lnot\text{disease})\).2

PD <- 0.002 # Probability of disease
PPD <- 0.98 # True positive (Sensitivity)
PDP <- PD * PPD / (PD * PPD + (1 - PD) * (1 - PPD)) # Probability of disease given positive
paste("Probability of disease given a positive test is", sep = " ", round(PDP, 2))
## [1] "Probability of disease given a positive test is 0.09"

We observe that despite of having a positive result, the probability of having the disease is low. This due to the base rate being tiny.

Another interesting example, which is at the heart of the origin of the Bayes’ theorem (Bayes 1763), is related to the existence of God (Stigler 2018). The Section X of David Hume’s “An Inquiry concerning Human Understanding, 1748” is named Of Miracles. There, Hume argues that when someone claims to have seen a miracle, this is poor evidence it actually happened, since it goes against what we see every day. Then, Richard Price, who actually finished and published “An essay towards solving a problem in the doctrine of chances” in 1763 after Bayes died in 1761, argues against Hume saying that there is a huge difference between impossibility as used commonly in conversation and physical impossibility. Price used an example of a dice with a million sides, where the former is getting a particular side when throwing this dice, and the latter is getting a side that does not exist. In millions throws, the latter case never would occur, but the former eventually would.

Let’s say that there are two cases of resurrection (Res), Jesus Christ and Elvis, and the total number of people who have ever lived is 108.5 billion,3 then the prior base rate is \(2/(108.5\times 10^{9})\). On the other hand, the sample information comes from a very reliable witness whose true positive rate is 0.9999999. Then, what is the probability of this miracle?4

Using the Bayes’ rule:

\[\begin{equation} P(\text{Res}|\text{Witness}) = \frac{P(\text{Witness}|\text{Res})\times P(\text{Res})}{P(\text{Witness})}, \end{equation}\]

where \(P(\text{Witness})=P(\text{Witness}|\text{Res})\times P(\text{Res})+(1-P(\text{Witness}|\text{Res}))\times (1-P(\text{Res}))\).

PR <- 2/(108.5 * 10^9) # Probability of resurrection
PWR <- 0.9999999 # Very reliable witness (true positive rate)
PRW <- PR * PWR / (PR * PWR + (1 - PR) * (1 - PWR)) # Probability of resurrection given witness
paste("Probability of resurrection given witness is", sep = " ", PRW)
## [1] "Probability of resurrection given witness is 0.000184297806959661"

Observe that we can get a conditional version of the Bayes’ rule. Let’s have two conditioning events \(B\) and \(C\), then equation (1.1) becomes

\[\begin{align} P(A_i|B,C)&=\frac{P(A_i,B,C)}{P(B,C)}\\ &=\frac{P(B|A_i,C) \times P(A_i|C) \times P(C)}{P(B|C)P(C)}. \tag{1.2} \end{align}\]

Let’s use one of the most intriguing statistical puzzles, the Monty Hall problem, to illustrate how to use equation (1.2)((Selvin 1975a), (Selvin 1975b)). This was the situation faced by a contestant in the American television game show Let’s Make a Deal. There, the contestant was asked to choose a door where behind one door there is a car, and behind the others, goats. Let’s say that the contestant picks door No. 1, and the host (Monty Hall), who knows what is behind each door, opens door No. 3, where there is a goat. Then, the host asks the tricky question to the contestant, do you want to pick door No. 2?

Let’s name \(P_i\) the event contestant picks door No. \(i\), which stays close, \(H_i\) the event host picks door No. \(i\), which is open, and there is a goat, and \(C_i\) the event car is behind door No. \(i\). In this particular setting, the contestant is interested in the probability of the event \(P(C_2|H_3,P_1)\). A naive answer would be that it is irrelevant as initially \(P(C_i)=1/3, \ i=1,2,3\), and now \(P(C_i|H_3)=1/2, \ i=1,2\) as the host opened door No. 3. So, why bothering changing the initial guess if the odds are the same (1:1)? The important point here is that the host knows what is behind each door and randomly picks a door given contestant choice. That is, \(P(H_3|C_3,P_1)=0\), \(P(H_3|C_2,P_1)=1\) and \(P(H_3|C_1,P_1)=1/2\). Then, using equation (1.2)

\[\begin{align} P(C_2|H_3,P_1)&= \frac{P(C_2,H_3,P_1)}{P(H_3,P_1)}\\ &= \frac{P(H_3|C_2,P_1)P(C_2|P_1)P(P_1)}{P(H_3|P_1)\times P(P_1)}\\ &= \frac{P(H_3|C_2,P_1)P(C_2)}{P(H_3|P_1)}\\ &=\frac{1\times 1/3}{1/2}\\ &=\frac{2}{3}, \end{align}\] where the third equation uses the fact that \(C_i\) and \(P_i\) are independent events, and \(P(H_3|P_1)=1/2\) due to this depending just on \(P_1\) (not on \(C_2\)).

Therefore, changing the initial decision increases the probability of getting the car from 1/3 to 2/3!

Let’s see a simulation exercise to check this answer:

set.seed(0101) # Set simulation seed
S <- 100000 # Simulations
Game <- function(switch = 0){
  # switch = 0 is not change, and switch = 1 is to change
  opts <- 1:3 
  car <- sample(opts, 1) # car location
  guess1 <- sample(opts, 1) # Initial guess pick
  if(car != guess1) {
    host <- opts[-c(car, guess1)]
    } else {
    host <- sample(opts[-c(car, guess1)], 1)
  win1 <- guess1 == car # Win given no change
  guess2 <- opts[-c(host, guess1)]
  win2 <- guess2 == car # Win given change
  if(switch == 0){
    win <- win1
  } else {
      win <- win2

Prob <- mean(replicate(S, Game(switch = 0))) #Win probabilities not changing
paste("Winning probabilities no changing door is", Prob, sep = " ")
## [1] "Winning probabilities no changing door is 0.3334"
Prob <- mean(replicate(S, Game(switch = 1))) #Win probabilities changing
paste("Winning probabilities changing door is", Prob, sep = " ")
## [1] "Winning probabilities changing door is 0.6654"


Bayes, Thomas. 1763. “LII. An Essay Towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, FRS Communicated by Mr. Price, in a Letter to John Canton, AMFR s.” Philosophical Transactions of the Royal Society of London, no. 53: 370–418.
Laplace, Pierre Simon. 1774. “Mémoire Sur La Probabilité de Causes Par Les évenements.” Mémoire de l’académie Royale Des Sciences.
Selvin, Steve. 1975a. “A Problem in Probability (Letter to the Editor).” The American Statistician 11 (1): 67–71.
———. 1975b. “A Problem in Probability (Letter to the Editor).” The American Statistician 11 (3): 131–34.
Stigler, Stephen. 2018. “Richard Price, the First Bayesian.” Statistical Science 33 (1): 117–25.

  1. Observe that I use the term “Bayes’ rule” rather than “Bayes’ theorem.” It was Laplace (Laplace 1774) who actually generalized the Bayes’ theorem (Bayes 1763). His generalization is named the Bayes’ rule.↩︎

  2. \(\lnot\) is the negation symbol. In addition, we have that \(P(B|A)=1-P(B|A^c)\) in this example, where \(A^c\) is the complement of \(A\). However, this is not true in general.↩︎

  3. https://www.wolframalpha.com/input/?i=number+of+people+who+have+ever+lived+on+Earth↩︎

  4. https://www.r-bloggers.com/2019/04/base-rate-fallacy-or-why-no-one-is-justified-to-believe-that-jesus-rose/↩︎