1.1 The Bayes’ rule

As expected, the starting point for performing Bayesian inference is Bayes’ rule, which provides the solution to the inverse problem of determining causes from observed effects. This rule combines prior beliefs with objective probabilities based on repeatable experiments, allowing us to move from observations to probable causes.¹

Formally, the conditional probability of $A_i$ given $B$ is equal to the conditional probability of $B$ given $A_i$ , multiplied by the marginal probability of $A_i$ , divided by the marginal probability of $B$ :

$\begin{align} P(A_i|B)&=\frac{P(A_i,B)}{P(B)}\\ &=\frac{P(B|A_i) \times P(A_i)}{P(B)}, \tag{1.1} \end{align}$ where equation (1.1) is Bayes’ rule.

By the law of total probability, $P(B) = \sum_i P(B \mid A_i) P(A_i) \neq 0$ , and $\{ A_i, i = 1, 2, \dots \}$ is a finite or countably infinite partition of the sample space.

In the Bayesian framework, $B$ represents sample information that updates a probabilistic statement about an unknown object $A_i$ according to probability rules. This is done using Bayes’ rule, which incorporates prior “beliefs” about $A_i$ , i.e., $P(A_i)$ , sample information relating $B$ to the particular state of nature $A_i$ through a probabilistic statement, $P(B \mid A_i)$ , and the probability of observing that specific sample information, $P(B)$ .

Let’s consider a simple example, the base rate fallacy:

Assume that the sample information comes from a positive result from a test whose true positive rate (sensitivity) is 98%, i.e., $P(+ \mid \text{disease}) = 0.98$ . On the other hand, the prior information regarding being infected with this disease comes from a base incidence rate of 0.002, i.e., $P(\text{disease}) = 0.002$ . The question is: What is the probability of actually being infected, given a positive test result?

This is an example of the base rate fallacy, where a positive test result for a disease with a very low base incidence rate still gives a low probability of actually having the disease.

The key to answering this question lies in understanding the difference between the probability of having the disease given a positive test result, $P(\text{disease} \mid +)$ , and the probability of a positive result given the disease, $P(+ \mid \text{disease})$ . The former is the crucial result, and Bayes’ rule helps us to compute it. Using Bayes’ rule (equation (1.1)):

$P(\text{disease} \mid +) = \frac{P(+ \mid \text{disease}) \times P(\text{disease})}{P(+)} = \frac{0.98 \times 0.002}{0.98 \times 0.002 + (1-0.98) \times (1-0.002)} = 0.09$

where $P(+) = P(+ \mid \text{disease}) \times P(\text{disease}) + P(+ \mid \lnot \text{disease}) \times P(\lnot \text{disease})$ .²

The following code shows how to perform this exercise in R.

PD <- 0.002 # Probability of disease
PPD <- 0.98 # True positive (Sensitivity)
PDP <- PD * PPD / (PD * PPD + (1 - PD)*(1 - PPD))
paste("Probability of disease given a positive test is", sep = " ", round(PDP, 2))

## [1] "Probability of disease given a positive test is 0.09"

We observe that despite having a positive result, the probability of actually having the disease remains low. This is due to the base rate being so small.

Another interesting example, which lies at the heart of the origin of Bayes’ theorem (Thomas Bayes 1763), is related to the existence of God (Stigler 2018). In Section X of David Hume’s “An Inquiry concerning Human Understanding” (1748), titled Of Miracles, Hume argues that when someone claims to have seen a miracle, this provides poor evidence that the event actually occurred, as it contradicts our everyday observations. In response, Richard Price, who finished and published “An Essay Towards Solving a Problem in the Doctrine of Chances” in 1763 (after Bayes’ death in 1761), argues against Hume by highlighting the difference between impossibility in casual conversation and physical impossibility. Price used an example of a die with a million sides, where impossibility refers to rolling a specific side, and physical impossibility refers to rolling a side that does not exist. In millions of throws, the latter would never happen, while the former would eventually occur.

Now, let’s consider a scenario involving two cases of resurrection (Res): Jesus Christ and Elvis. The total number of people who have ever lived is approximately 108.5 billion,³ so the prior base rate is given by $\frac{2}{108.5 \times 10^9}$ . On the other hand, suppose the sample information comes from a highly reliable witness with a true positive rate of 0.9999999. The question then is: What is the probability of this miracle occurring?⁴

Using Bayes’ rule:

$\begin{align*} P(\text{Res}\mid \text{Witness}) & = \frac{P(\text{Witness}\mid \text{Res})\times P(\text{Res})}{P(\text{Witness})}\\ & =\frac{2/(108.5 * 10^9) \times 0.9999999}{2/(108.5 * 10^9) \times 0.9999999 + (1-2/(108.5 * 10^9)) \times (1-0.9999999)}\\ & = 0.000184297806959661 \end{align*}$

where $P(\text{Witness}) = P(\text{Witness} \mid \text{Res}) \times P(\text{Res}) + (1 - P(\text{Witness} \mid \text{Res})) \times (1 - P(\text{Res}))$ .

Thus, the probability of a resurrection, given a very reliable witness, is approximately $1.843 \times 10^{-4}$ .

The following code shows how to perform this exercise in R.

# Probability of resurrection
PR <- 2/(108.5 * 10^9) 
PWR <- 0.9999999 # True positive rate
PRW <- PR * PWR / (PR * PWR + (1 - PR)*(1 - PWR)) 
paste("Probability of resurrection given witness is", sep = " ", PRW)

## [1] "Probability of resurrection given witness is 0.000184297806959661"

Observe that we can condition on multiple events in Bayes’ rule. Let’s consider two conditioning events, $B$ and $C$ . Then, equation (1.1) becomes

$\begin{align} P(A_i\mid B,C)&=\frac{P(A_i,B,C)}{P(B,C)}\nonumber\\ &=\frac{P(B\mid A_i,C) \times P(A_i\mid C) \times P(C)}{P(B\mid C)P(C)}. \tag{1.2} \end{align}$

Let’s use this rule in one of the most intriguing statistical puzzles, the Monty Hall problem, to illustrate how to use equation (1.2) (Selvin 1975; Morgan et al. 1991). This was the situation faced by a contestant in the American television game show Let’s Make a Deal. In this game, the contestant was asked to choose a door, behind one of which there is a car, and behind the others, goats.

Let’s say the contestant picks door No. 1, and the host (Monty Hall), who knows what is behind each door, opens door No. 3, revealing a goat. Then, the host asks the tricky question: Do you want to pick door No. 2?

Let’s define the following events:

$P_i$ : the event contestant picks door No. $i$ , which stays closed,
$H_i$ : the event host picks door No. $i$ , which is open and contains a goat,
$C_i$ : the event car is behind door No. $i$ .

In this particular setting, the contestant is interested in the probability of the event $P(C_2 \mid H_3, P_1)$ . A naive answer would be that it is irrelevant, as initially, $P(C_i) = \frac{1}{3}, \ i = 1, 2, 3$ , and now $P(C_i \mid H_3) = \frac{1}{2}, \ i = 1, 2$ , since the host opened door No. 3. So, why bother changing the initial guess if the odds are the same (1:1)?

The important point here is that the host knows what is behind each door and always picks a door where there is a goat, given the contestant’s choice. In this particular setting:

$P(H_3 \mid C_3, P_1) = 0, \quad P(H_3 \mid C_2, P_1) = 1, \quad P(H_3 \mid C_1, P_1) = \frac{1}{2}.$

Then, using equation (1.2), we can calculate the posterior probability.

$\begin{align*} P(C_2\mid H_3,P_1)&= \frac{P(C_2,H_3,P_1)}{P(H_3,P_1)}\\ &= \frac{P(H_3\mid C_2,P_1)P(C_2\mid P_1)P(P_1)}{P(H_3\mid P_1)\times P(P_1)}\\ &= \frac{P(H_3\mid C_2,P_1)P(C_2)}{P(H_3\mid P_1)}\\ &=\frac{1\times 1/3}{1/2}, \end{align*}$

Where the third equation uses the fact that $C_i$ and $P_i$ are independent events, and $P(H_3 \mid P_1) = \frac{1}{2}$ because this depends only on $P_1$ (not on $C_2$ ).

Therefore, changing the initial decision increases the probability of getting the car from $\frac{1}{3}$ to $\frac{2}{3}$ ! Thus, it is always a good idea to change the door.

Let’s see a simulation exercise in R to check this answer:

set.seed(0101) # Set simulation seed
S <- 100000 # Simulations
Game <- function(switch = 0){
    # switch = 0 is not change  
    # switch = 1 is to change
    opts <- 1:3 
    car <- sample(opts, 1) # car location
    guess1 <- sample(opts, 1) # Initial guess 
    
    if(car != guess1) {
     host <- opts[-c(car, guess1)]
    } else {
     host <- sample(opts[-c(car, guess1)], 1)
    }   
    win1 <- guess1 == car # Win no change
    guess2 <- opts[-c(host, guess1)]    
    win2 <- guess2 == car # Win change
    if(switch == 0){
        win <- win1
    } else {
        win <- win2
    }
    return(win)
}

#Win probabilities not changing
Prob <- mean(replicate(S, Game(switch = 0))) 
paste("Winning probabilities no changing door is", Prob, sep = " ")

## [1] "Winning probabilities no changing door is 0.3334"

#Win probabilities changing
Prob <- mean(replicate(S, Game(switch = 1))) 
paste("Winning probabilities changing door is", Prob, sep = " ")

## [1] "Winning probabilities changing door is 0.6654"

References

Bayes, Thomas. 1763. “LII. An Essay Towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, FRS Communicated by Mr. Price, in a Letter to John Canton, AMFR S.” Philosophical Transactions of the Royal Society of London, no. 53: 370–418.

Laplace, Pierre Simon. 1774. “Mémoire Sur La Probabilité de Causes Par Les évenements.” Mémoire de l’académie Royale Des Sciences.

Morgan, John P, N Rao Chaganty, Ram C Dahiya, and Michael J Doviak. 1991. “Let’s Make a Deal: The Player’s Dilemma.” The American Statistician 45 (4): 284–87.

Selvin, Steve. 1975. “A Problem in Probability (Letter to the Editor).” The American Statistician 11 (1): 67–71.

Stigler, Stephen. 2018. “Richard Price, the First Bayesian.” Statistical Science 33 (1): 117–25.

Note that I use the term “Bayes’ rule” rather than “Bayes’ theorem.” It was Laplace (P. S. Laplace 1774) who generalized Bayes’ theorem (Thomas Bayes 1763), and his generalization is referred to as Bayes’ rule.↩︎
$\lnot$ is the negation symbol. In addition, we have that $P(B \mid A) = 1 - P(B \mid A^c)$ in this example, where $A^c$ is the complement of $A$ . However, it is not always the case that $P(B \mid A) \neq 1 - P(B \mid A^c)$ .↩︎
Source.↩︎
Source.↩︎