6.7 The likelihood ratio test

Definition 6.7 (Likelihood ratio test) Let

\[\begin{align*} H_0:\boldsymbol{\theta}\in\Theta_0\quad \text{vs.}\quad H_1:\boldsymbol{\theta}\in\Theta_1, \end{align*}\]

be a testing problem where \(\Theta_0\) and \(\Theta_1\) are complementary subsets of the parameter space, i.e., such that \(\Theta_1=\bar{\Theta}_0.\)80 For a srs \((X_1,\ldots,X_n)\) from \(F(\cdot;\theta),\) let \(\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)\) be the maximized likelihood attained inside \(\Theta_0,\) that is,

\[\begin{align*} \mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)=\sup_{\boldsymbol{\theta}\in\Theta_0}\mathcal{L}(\boldsymbol{\theta};X_1,\ldots,X_n), \end{align*}\]

and, with the same notation, let \(\mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n)\) be the maximum likelihood attained inside \(\Theta.\) The Likelihood Ratio Test (LRT) is defined as the test with critical region of the form

\[\begin{align*} C=\left\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;x_1,\ldots,x_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};x_1,\ldots,x_n)}\leq k\right\}. \end{align*}\]

The quantity

\[\begin{align*} \lambda_n:=\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n)} \end{align*}\]

is referred to as the likelihood ratio statistic.

Remark. Note that the rejection region is at the left because a small \(\lambda_n\) indicates that \(\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)\ll \mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n),\) i.e., that the null hypothesis \(H_0\) is unlikely given the data.

The likelihood ratio test is a generalization of the Neyman–Pearson’s Lemma. In fact, if \(\Theta=\{\theta_0,\theta_1\},\) \(\Theta_0=\{\theta_0\},\) and \(\Theta_1=\{\theta_1\},\) then the LRT is equivalent to the UMP test for (6.5). In the following theorem, however, \(\Theta_1=\{\theta_1\}\) is not allowed.

Theorem 6.2 (Asymptotic distribution of the likelihood ratio statistic) Under certain regularity conditions81 about \(F(\cdot;\theta)\) the distribution of \(X,\) and under \(H_0,\) it is verified that

\[\begin{align} -2\log\lambda_n\stackrel{d}{\longrightarrow} \chi_p^2, \tag{6.7} \end{align}\]

if \(p=\dim\Theta-\dim\Theta_0\geq 1\) is the number of specified parameters under \(H_0.\)

Remark. Theorem 6.2 is also known as Wilk’s Theorem and result (6.7) is often referred to as a Wilk’s phenomenon (the null distribution of the log-likelihood ratio does not depend on \(\theta\) asymptotically).

The LRT based on \(-2\log\lambda_n\) rejects for large values of the test statistic, as these correspond to small values of \(\lambda_n\) (see the remark after Definition 6.7). Therefore, the asymptotic \(p\)-value for the LRT using the test statistic \(-2\log\lambda_n\) is the upper-tail probability of the \(\chi^2_p\) distribution: \(\mathbb{P}[\chi^2_p>-2\log\lambda_n].\) Note that this is the \(p\)-value for a two-sided test, yet still an upper tail is involved!82

Example 6.20 A labor union registers the number of complaints that are filled per week by the workers of two different shifts in the production line of a factory. The union gathers \(n_1=87\) and \(n_2=61\) independent observations about the number of complaints for both shifts \(X_1\) and \(X_2.\) The means \(\bar{X}_1=20\) and \(\bar{X}_2=22\) result. Assume that the number of complaints per week of the \(k\)-th shift has \(\mathrm{Pois}(\theta_k)\) distribution, \(k=1,2.\) The labor union wants to test if the average number of complaints per week of both shifts is significantly different or not at a significance level \(\alpha=0.01.\)

The hypothesis to test is

\[\begin{align*} H_0:\theta_1=\theta_2\quad \text{vs.} \quad H_1:\theta_1\neq \theta_2. \end{align*}\]

Recall that the parametric space is

\[\begin{align*} \Theta=\{(\theta_1,\theta_2)'\in\mathbb{R}^2: \theta_1\geq 0,\theta_2\geq 0\}, \end{align*}\]

and that the space \(\Theta_0\) that determines the null hypothesis is

\[\begin{align*} \Theta_0=\{(\theta_1,\theta_2)\in\mathbb{R}^2: \theta_1=\theta_2\}, \end{align*}\]

so the number of specified parameters is \(p=2-1=1.\)

The pmf of the Poisson of mean \(\theta_k\) is

\[\begin{align*} p(x;\theta_k)=\frac{\theta_k^x e^{-\theta_k}}{x!},\quad x=0,1,2,\ldots \end{align*}\]

Two samples of sizes \(n_1=87\) and \(n_2=61\) were observed, one for each shift, and we denote them by

\[\begin{align*} \boldsymbol{X}_1=(X_{11},\ldots,X_{1n_1})',\quad \boldsymbol{X}_2=(X_{11},\ldots,X_{2n_2})'. \end{align*}\]

The joint likelihood is

\[\begin{align*} \mathcal{L}(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta_1^{\sum_{i=1}^{n_1}X_{1i}}\theta_2^{\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1\theta_1+n_2\theta_2)}}{c}, \end{align*}\]

where \(c=X_{11}!\cdots X_{1n_1}!X_{11}!\cdots X_{2n_2}!\) does not depend on the parameters.

We compute the maximum likelihood in \(\Theta_0.\) This is attained at the MLE restricted to \(\Theta_0.\) But, under \(H_0:\theta_1=\theta_2=:\theta,\) the likelihood is

\[\begin{align*} \mathcal{L}(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta^{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1+n_2)\theta}}{c} \end{align*}\]

Computing the maximum of this likelihood, we obtain the MLE restricted to \(\Theta_0.\) Taking the log-likelihood

\[\begin{align*} \ell(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}\right)\log\theta -(n_1+n_2)\theta-\log c \end{align*}\]

and differentiating and equating to zero, we have

\[\begin{align*} \hat{\theta}_0=\frac{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}{n_1+n_2}=\frac{n_1 \bar{X}_{1}+n_2 \bar{X}_{2}}{n_1+n_2}. \end{align*}\]

The unrestricted MLE’s of \(\theta_1\) and \(\theta_2\) are obtained from the unrestricted log-likelihood, which is

\[\begin{align*} \ell(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1} X_{1i}\right) \log\theta_1+\left(\sum_{j=1}^{n_2} X_{2j}\right) \log\theta_2-(n_1\theta_1+n_2\theta_2)-\log c. \end{align*}\]

Differentiating and equating to zero, we have

\[\begin{align*} \hat{\theta}_1=\bar{X}_1,\quad \hat{\theta}_2=\bar{X}_2. \end{align*}\]

Then, the maximum likelihoods in \(\Theta_0\) and \(\Theta\) are, respectively,

\[\begin{align*} \mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2) &=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}e^{-(n_1+n_2)\hat{\theta}_0}}{c},\\ \mathcal{L}(\hat{\theta}_1,\hat{\theta}_2;\boldsymbol{X}_1,\boldsymbol{X}_2)&=\frac{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}e^{-(n_1\hat{\theta}_1+n_2\hat{\theta}_2)}}{c}. \end{align*}\]

Therefore, the likelihood ratio statistic is

\[\begin{align*} \lambda_n=\frac{\mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2)}{\mathcal{L}(\hat{\theta}_1,\hat{\theta}_2 ;\boldsymbol{X}_1,\boldsymbol{X}_2)}=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}}{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}}e^{-(n_1+n_2)\hat{\theta}_0+(n_1\hat{\theta}_1+n_2\hat{\theta}_2)} \end{align*}\]

and the log-likelihood ratio statistic is

\[\begin{align*} \log\lambda_n=&\;(n_1\bar{X}_1+n_2\bar{X}_2)\log(\hat{\theta}_0)-(n_1\bar{X}_1)\log(\hat{\theta}_1)-(n_2\bar{X}_2)\log(\hat{\theta}_2)\\ &-(n_1+n_2)\hat{\theta}_0+n_1\hat{\theta}_1+n_2\hat{\theta}_2. \end{align*}\]

We use R to compute \(-2\log\lambda_n\):

# Data
xbar_1 <- 20
xbar_2 <- 22
n_1 <- 87
n_2 <- 61

# MLE's under H0 and H1
theta_hat_0 <- (n_1 * xbar_1 + n_2 * xbar_2) / (n_1 + n_2)
theta_hat_1 <- xbar_1
theta_hat_2 <- xbar_2

# Log-likelihood ratio statistic
log_lamba_n <- (n_1 * xbar_1 + n_2 * xbar_2) * log(theta_hat_0) -
  (n_1 * xbar_1) * log(theta_hat_1) - (n_2 * xbar_2) * log(theta_hat_2) -
  (n_1 + n_2) * theta_hat_0 + (n_1 * theta_hat_1 + n_2 * theta_hat_2)
-2 * log_lamba_n
## [1] 6.851838

Then, \(-2\log\lambda_n\approx6.85>\chi_{1;0.01}^2\approx6.635.\) Therefore, \(H_0:\theta_1=\theta_2\) is rejected, that is, the data indicate that the average number of complaints in both shifts differ significantly. In addition, since the \(p\)-value of the test is

# p-value of the test
pchisq(-2 * log_lamba_n, df = 1, lower.tail = FALSE)
## [1] 0.008855011

we will also reject for any significance level \(\alpha>0.0089.\)

Example 6.21 Derive the LRT for \(H_0:\theta=\theta_0\) vs. \(H_1:\theta\neq\theta_0\) for a srs \((X_1,\ldots,X_n)\sim\mathrm{Exp}(1/\theta).\) For \(\theta_0=1.5\), compute the \(p\)-value of the test for the sample realizations (1.69, 1.15, 2.66, 0.06, 0.11) and (1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39) using R. What are the rejection decisions for \(\alpha=0.05\)?

First, from Exercise 4.11 we know that

\[\begin{align*} \ell(\theta;X_1,\ldots,X_n)=-n\log\theta-\frac{n\bar{X}}{\theta} \end{align*}\]

and that the unrestricted MLE of \(\theta\) is \(\hat{\theta}_\mathrm{MLE}=\bar{X}.\)

We now obtain \(-2\log\lambda_n.\) Since \(\Theta_0=\{\theta_0\}\) and \(\Theta_1=\{\theta\in\mathbb{R}_+: \theta\neq\theta_0\},\) then \(\hat{\theta}_0=\theta_0,\) i.e., there is no estimation involved under \(H_0.\) Therefore: \[\begin{align*} -2\log\lambda_n=-2\log\frac{\mathcal{L}(\theta_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\theta};X_1,\ldots,X_n)}=-2(\ell(\theta_0;X_1,\ldots,X_n)-\ell(\hat{\theta};X_1,\ldots,X_n)), \end{align*}\]

where \(\hat{\theta}\) is obtained by maximizing the log-likelihood under \(\Theta,\) giving \(\hat{\theta}=\hat{\theta}_\mathrm{MLE}=\bar{X}.\)

We are now ready to compute \(-2\log\lambda_n\) with the help of R:

# Log-likelihood
log_lik <- function(x, theta) {
  n <- length(x)
  -n * log(theta) - n * mean(x) / theta
}

# Estimator function
theta_hat <- function(x) {
  mean(x)
}

# theta_0
theta_0 <- 1.5

# Statistic and p-value for first sample
x <- c(1.69, 1.15, 2.66, 0.06, 0.11)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
                         log_lik(x = x, theta = theta_hat(x = x))))
## [1] 0.357139
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.5500995

# Statistic and p-value for second sample
x <- c(1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
                         log_lik(x = x, theta = theta_hat(x = x))))
## [1] 4.174641
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.04103325

In conclusion, we do not reject \(H_0\) in the first sample and we reject in the second sample for \(\alpha=0.05.\)

References

Lehmann, E. L., and J. P. Romano. 2005. Testing Statistical Hypotheses. Third. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/0-387-27605-X.

  1. Note that this setup rules out one-sided tests where \(\bar{\Theta}_0\neq\Theta_1.\)↩︎

  2. See the list at Theorem 12.4.2 in Lehmann and Romano (2005).↩︎

  3. An explanation for this different behavior of two-sided tests is that here we are working on with test statistics based on differences of log-likelihoods, rather than in terms of differences (or ratios) of parameters and estimators, as done with normal-, CLT-, and MLE-based two-sided tests.↩︎