2.4 The likelihood principle
The likelihood principle states that in making inferences or decisions about the state of nature, all the relevant experimental information is given by the likelihood function. The Bayesian framework follows this statement, i.e., it is conditional on observed data.
We follow J. Berger (1993), who in turn followed D. V. Lindley and Phillips (1976), to illustrate the likelihood principle. We are given a coin and are interested in the probability, θ, of it landing heads when flipped. We wish to test H0:θ=1/2 versus H1:θ>1/2. An experiment is conducted by flipping the coin (independently) in a series of trials, with the result being the observation of 9 heads and 3 tails.
This is not yet enough information to specify p(y|θ), since the series of trials has not been explained. Two possibilities arise:
The experiment consisted of a predetermined 12 flips, so that Y=[Heads] follows a B(12,θ) distribution. In this case,
p_1(y|\theta) = \binom{12}{y} \theta^y (1 - \theta)^{12 - y} = 220 \times \theta^9 (1 - \theta)^3.
The experiment consisted of flipping the coin until 3 tails were observed (r = 3). In this case, Y, the number of heads (failures) before obtaining 3 tails, follows a NB(3, 1 - \theta) distribution. Here,
p_2(y|\theta) = \binom{y + r - 1}{r - 1} (1 - (1 - \theta)^y)(1 - \theta)^r = 55 \times \theta^9 (1 - \theta)^3.
Using a Frequentist approach, the significance level of y=9 using the Binomial model against \theta=1/2 would be:
\alpha_1=P_{1/2}(Y\geq 9)=p_1(9|1/2)+p_1(10|1/2)+p_1(11|1/2)+p_1(12|1/2)=0.073.
success <- 9
# Number of observed success in n trials
n <- 12
# Number of trials
siglevel <- sum(sapply(9:n,function(y)dbinom(y,n,0.5)))
siglevel
## [1] 0.07299805
For the Negative Binomial model, the significance level would be:
\alpha_2=P_{1/2}(Y\geq 9)=p_2(9|1/2)+p_2(10|1/2)+\ldots=0.0327.
success <- 3
# Number of target success (tails)
failures <- 9
# Number of failures
siglevel <- 1 - pnbinom((failures - 1),success,0.5)
siglevel
## [1] 0.03271484
We arrive at different conclusions using a significance level of 5%, whereas we obtain the same outcomes using a Bayesian approach because the kernels of both distributions are identical (\theta^9 \times (1 - \theta)^3).