6.7 The likelihood ratio test

Definition 6.7 (Likelihood ratio test) Let

\begin{align*} H_0:\boldsymbol{\theta}\in\Theta_0\quad \text{vs.}\quad H_1:\boldsymbol{\theta}\in\Theta_1, \end{align*}

be a testing problem where \Theta_0 and \Theta_1 are complementary subsets of the parameter space, i.e., such that \Theta_1=\bar{\Theta}_0.80 For a srs (X_1,\ldots,X_n) from F(\cdot;\theta), let \mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n) be the maximized likelihood attained inside \Theta_0, that is,

\begin{align*} \mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)=\sup_{\boldsymbol{\theta}\in\Theta_0}\mathcal{L}(\boldsymbol{\theta};X_1,\ldots,X_n), \end{align*}

and, with the same notation, let \mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n) be the maximum likelihood attained inside \Theta. The Likelihood Ratio Test (LRT) is defined as the test with critical region of the form

\begin{align*} C=\left\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;x_1,\ldots,x_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};x_1,\ldots,x_n)}\leq k\right\}. \end{align*}

The quantity

\begin{align*} \lambda_n:=\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n)} \end{align*}

is referred to as the likelihood ratio statistic.

Remark. Note that the rejection region is at the left because a small \lambda_n indicates that \mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)\ll \mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n), i.e., that the null hypothesis H_0 is unlikely given the data.

The likelihood ratio test is a generalization of the Neyman–Pearson’s Lemma. In fact, if \Theta=\{\theta_0,\theta_1\}, \Theta_0=\{\theta_0\}, and \Theta_1=\{\theta_1\}, then the LRT is equivalent to the UMP test for (6.5). In the following theorem, however, \Theta_1=\{\theta_1\} is not allowed.

Theorem 6.2 (Asymptotic distribution of the likelihood ratio statistic) Under certain regularity conditions81 about F(\cdot;\theta) the distribution of X, and under H_0, it is verified that

\begin{align} -2\log\lambda_n\stackrel{d}{\longrightarrow} \chi_p^2, \tag{6.7} \end{align}

if p=\dim\Theta-\dim\Theta_0\geq 1 is the number of specified parameters under H_0.

Remark. Theorem 6.2 is also known as Wilk’s Theorem and result (6.7) is often referred to as a Wilk’s phenomenon (the null distribution of the log-likelihood ratio does not depend on \theta asymptotically).

The LRT based on -2\log\lambda_n rejects for large values of the test statistic, as these correspond to small values of \lambda_n (see the remark after Definition 6.7). Therefore, the asymptotic p-value for the LRT using the test statistic -2\log\lambda_n is the upper-tail probability of the \chi^2_p distribution: \mathbb{P}[\chi^2_p>-2\log\lambda_n]. Note that this is the p-value for a two-sided test, yet still an upper tail is involved!82

Example 6.20 A labor union registers the number of complaints that are filled per week by the workers of two different shifts in the production line of a factory. The union gathers n_1=87 and n_2=61 independent observations about the number of complaints for both shifts X_1 and X_2. The means \bar{X}_1=20 and \bar{X}_2=22 result. Assume that the number of complaints per week of the k-th shift has \mathrm{Pois}(\theta_k) distribution, k=1,2. The labor union wants to test if the average number of complaints per week of both shifts is significantly different or not at a significance level \alpha=0.01.

The hypothesis to test is

\begin{align*} H_0:\theta_1=\theta_2\quad \text{vs.} \quad H_1:\theta_1\neq \theta_2. \end{align*}

Recall that the parametric space is

\begin{align*} \Theta=\{(\theta_1,\theta_2)'\in\mathbb{R}^2: \theta_1\geq 0,\theta_2\geq 0\}, \end{align*}

and that the space \Theta_0 that determines the null hypothesis is

\begin{align*} \Theta_0=\{(\theta_1,\theta_2)\in\mathbb{R}^2: \theta_1=\theta_2\}, \end{align*}

so the number of specified parameters is p=2-1=1.

The pmf of the Poisson of mean \theta_k is

\begin{align*} p(x;\theta_k)=\frac{\theta_k^x e^{-\theta_k}}{x!},\quad x=0,1,2,\ldots \end{align*}

Two samples of sizes n_1=87 and n_2=61 were observed, one for each shift, and we denote them by

\begin{align*} \boldsymbol{X}_1=(X_{11},\ldots,X_{1n_1})',\quad \boldsymbol{X}_2=(X_{11},\ldots,X_{2n_2})'. \end{align*}

The joint likelihood is

\begin{align*} \mathcal{L}(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta_1^{\sum_{i=1}^{n_1}X_{1i}}\theta_2^{\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1\theta_1+n_2\theta_2)}}{c}, \end{align*}

where c=X_{11}!\cdots X_{1n_1}!X_{11}!\cdots X_{2n_2}! does not depend on the parameters.

We compute the maximum likelihood in \Theta_0. This is attained at the MLE restricted to \Theta_0. But, under H_0:\theta_1=\theta_2=:\theta, the likelihood is

\begin{align*} \mathcal{L}(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta^{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1+n_2)\theta}}{c} \end{align*}

Computing the maximum of this likelihood, we obtain the MLE restricted to \Theta_0. Taking the log-likelihood

\begin{align*} \ell(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}\right)\log\theta -(n_1+n_2)\theta-\log c \end{align*}

and differentiating and equating to zero, we have

\begin{align*} \hat{\theta}_0=\frac{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}{n_1+n_2}=\frac{n_1 \bar{X}_{1}+n_2 \bar{X}_{2}}{n_1+n_2}. \end{align*}

The unrestricted MLE’s of \theta_1 and \theta_2 are obtained from the unrestricted log-likelihood, which is

\begin{align*} \ell(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1} X_{1i}\right) \log\theta_1+\left(\sum_{j=1}^{n_2} X_{2j}\right) \log\theta_2-(n_1\theta_1+n_2\theta_2)-\log c. \end{align*}

Differentiating and equating to zero, we have

\begin{align*} \hat{\theta}_1=\bar{X}_1,\quad \hat{\theta}_2=\bar{X}_2. \end{align*}

Then, the maximum likelihoods in \Theta_0 and \Theta are, respectively,

\begin{align*} \mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2) &=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}e^{-(n_1+n_2)\hat{\theta}_0}}{c},\\ \mathcal{L}(\hat{\theta}_1,\hat{\theta}_2;\boldsymbol{X}_1,\boldsymbol{X}_2)&=\frac{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}e^{-(n_1\hat{\theta}_1+n_2\hat{\theta}_2)}}{c}. \end{align*}

Therefore, the likelihood ratio statistic is

\begin{align*} \lambda_n=\frac{\mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2)}{\mathcal{L}(\hat{\theta}_1,\hat{\theta}_2 ;\boldsymbol{X}_1,\boldsymbol{X}_2)}=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}}{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}}e^{-(n_1+n_2)\hat{\theta}_0+(n_1\hat{\theta}_1+n_2\hat{\theta}_2)} \end{align*}

and the log-likelihood ratio statistic is

\begin{align*} \log\lambda_n=&\;(n_1\bar{X}_1+n_2\bar{X}_2)\log(\hat{\theta}_0)-(n_1\bar{X}_1)\log(\hat{\theta}_1)-(n_2\bar{X}_2)\log(\hat{\theta}_2)\\ &-(n_1+n_2)\hat{\theta}_0+n_1\hat{\theta}_1+n_2\hat{\theta}_2. \end{align*}

We use R to compute -2\log\lambda_n:

# Data
xbar_1 <- 20
xbar_2 <- 22
n_1 <- 87
n_2 <- 61

# MLE's under H0 and H1
theta_hat_0 <- (n_1 * xbar_1 + n_2 * xbar_2) / (n_1 + n_2)
theta_hat_1 <- xbar_1
theta_hat_2 <- xbar_2

# Log-likelihood ratio statistic
log_lamba_n <- (n_1 * xbar_1 + n_2 * xbar_2) * log(theta_hat_0) -
  (n_1 * xbar_1) * log(theta_hat_1) - (n_2 * xbar_2) * log(theta_hat_2) -
  (n_1 + n_2) * theta_hat_0 + (n_1 * theta_hat_1 + n_2 * theta_hat_2)
-2 * log_lamba_n
## [1] 6.851838

Then, -2\log\lambda_n\approx6.85>\chi_{1;0.01}^2\approx6.635. Therefore, H_0:\theta_1=\theta_2 is rejected, that is, the data indicate that the average number of complaints in both shifts differ significantly. In addition, since the p-value of the test is

# p-value of the test
pchisq(-2 * log_lamba_n, df = 1, lower.tail = FALSE)
## [1] 0.008855011

we will also reject for any significance level \alpha>0.0089.

Example 6.21 Derive the LRT for H_0:\theta=\theta_0 vs. H_1:\theta\neq\theta_0 for a srs (X_1,\ldots,X_n)\sim\mathrm{Exp}(1/\theta). For \theta_0=1.5, compute the p-value of the test for the sample realizations (1.69, 1.15, 2.66, 0.06, 0.11) and (1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39) using R. What are the rejection decisions for \alpha=0.05?

First, from Exercise 4.11 we know that

\begin{align*} \ell(\theta;X_1,\ldots,X_n)=-n\log\theta-\frac{n\bar{X}}{\theta} \end{align*}

and that the unrestricted MLE of \theta is \hat{\theta}_\mathrm{MLE}=\bar{X}.

We now obtain -2\log\lambda_n. Since \Theta_0=\{\theta_0\} and \Theta_1=\{\theta\in\mathbb{R}_+: \theta\neq\theta_0\}, then \hat{\theta}_0=\theta_0, i.e., there is no estimation involved under H_0. Therefore: \begin{align*} -2\log\lambda_n=-2\log\frac{\mathcal{L}(\theta_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\theta};X_1,\ldots,X_n)}=-2(\ell(\theta_0;X_1,\ldots,X_n)-\ell(\hat{\theta};X_1,\ldots,X_n)), \end{align*}

where \hat{\theta} is obtained by maximizing the log-likelihood under \Theta, giving \hat{\theta}=\hat{\theta}_\mathrm{MLE}=\bar{X}.

We are now ready to compute -2\log\lambda_n with the help of R:

# Log-likelihood
log_lik <- function(x, theta) {
  n <- length(x)
  -n * log(theta) - n * mean(x) / theta
}

# Estimator function
theta_hat <- function(x) {
  mean(x)
}

# theta_0
theta_0 <- 1.5

# Statistic and p-value for first sample
x <- c(1.69, 1.15, 2.66, 0.06, 0.11)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
                         log_lik(x = x, theta = theta_hat(x = x))))
## [1] 0.357139
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.5500995

# Statistic and p-value for second sample
x <- c(1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
                         log_lik(x = x, theta = theta_hat(x = x))))
## [1] 4.174641
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.04103325

In conclusion, we do not reject H_0 in the first sample and we reject in the second sample for \alpha=0.05.

References

Lehmann, E. L., and J. P. Romano. 2005. Testing Statistical Hypotheses. Third. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/0-387-27605-X.

  1. Note that this setup rules out one-sided tests where \bar{\Theta}_0\neq\Theta_1.↩︎

  2. See the list at Theorem 12.4.2 in Lehmann and Romano (2005).↩︎

  3. An explanation for this different behavior of two-sided tests is that here we are working on with test statistics based on differences of log-likelihoods, rather than in terms of differences (or ratios) of parameters and estimators, as done with normal-, CLT-, and MLE-based two-sided tests.↩︎