3.5 Bayes Factors

The Bayes Factor (BF) is a measure of the relative evidence of one model over another. Take another look at Bayes’ formula:

\[P(\theta|D) = \frac{P(D|\theta)P(\theta)}{P(D)}.\]

Suppose you want to compare how two models explain and observed data outcome, \(D\). Model \(M_1:f_1(D|\theta_1)\) says the observed data \(D\) was produced by a generative model with pdf \(f_1\) parameterized by \(\theta_2\). Model \(M_2:f_2(D|\theta_2)\) says it was produced by a generative model with pdf \(f_2\) parameterized by \(\theta_2\). In each model you specify a prior probability distribution for the parameter

If you take the ratio of the posterior probabilities, the posterior odds, the \(P(D)\) terms cancel and you have

\[\frac{P(\theta_1|D)}{P(\theta_2|D)} = \frac{P(D|\theta_1)}{P(D|\theta_2)} \cdot \frac{P(\theta_1)}{P(\theta_2)}\]

The posterior odds equals the ratio of the likelihoods multiplied by the prior odds. That likelihood ratio is the Bayes Factor (BF). Rearranging, BF is the odds ratio of the posterior and prior odds.

\[BF = \frac{P(D|\theta_1)}{P(D|\theta_2)} = \mathrm{\frac{Posterior Odds}{Prior Odds}}\]

Return to the example of observing \(D\) = 7 ones and 3 zeros. You can compare an hypothesized \(\theta\) of .5 to a completely agnostic model where \(\theta\) is uniform over [0, 1]. The likelihood of observing \(D\) when \(\theta\) = .5 is \(P(D|\theta_1) = 5^7(1-.5)^3\) = 0.117. The likelihood of observing \(D\) where \(\theta\) is uniform on [0, 1] is \(P(D|\theta_2) = \int_0^1 \binom{10}{3}q^7(1-q)^3dq\)

.5^1 * .5^1
## [1] 0.25
dbinom(1, 1, .5)
## [1] 0.5
dbinom(11, 11, .5)
## [1] 0.0004882812
beta(11, 11)
## [1] 2.577402e-07

with a uniform Beta(1, 1) prior (i.e., complete agnosticism).

The Bayes factor at \(\theta\) = .7 quantifies how much the odds of H0: \(\theta\) = .7 over H1: \(\hat{\theta}\) = .7.

prior <- function(theta, alpha, beta) {
  (1 / beta(alpha, beta)) * theta^(alpha-1) * (1-theta)^(beta-1)
}
posterior <- function(theta, alpha, beta, a, b) {
  (1 / beta(alpha + a, beta + b)) * theta^(alpha-1+a) * (1-theta)^(beta-1+b)
}

prior(.5, 115, 85) 
## [1] 1.164377
posterior(.5, 1, 1, 10, 10)
## [1] 3.700138
posterior(.5, 1, 1, 10, 10) / prior(.5, 1, 1) 
## [1] 3.700138
1 / beta(115, 85)
## [1] 4.677704e+59
# Posterior Distribution 
1/beta(1+10, 1+10) * .5^(1-1+10) * (1-.5)^(1-1+10)
## [1] 3.700138
dbeta(.5, 11, 11)
## [1] 3.700138
# Prior Beta Distributions
1/beta(1, 1) * .5^(1-1) * (1-.5)^(1-1)
## [1] 1
dbeta(.5, 1, 1)
## [1] 1
dbeta(.5, 115, 85)
## [1] 1.164377

The Bayes factor measures how much your prior belief is altered by the evidence. It is the ratio of the likelihoods at some hypothesized value before and after observing the data. In this case, our confidence increased by a factor of…

theta <- 0.5

alpha <- 1
beta <- 1
a <- 10
b <- 10

(prior_likelihood <- (1 / beta(alpha, beta)) * theta^(alpha-1) * (1-theta)^(beta-1))
## [1] 1
(posterior_likelihood <- (1 / beta(alpha + a, beta + b)) * theta^(alpha-1+a) * (1-theta)^(beta-1+b))
## [1] 3.700138
(bayes_factor <- posterior_likelihood / prior_likelihood)
## [1] 3.700138

# 3.7 on alpha = beta = 1
# 1.91 on alpha = beta = 4