6.7 The likelihood ratio test

Definition 6.7 (Likelihood ratio test) Let

$\begin{align*} H_0:\boldsymbol{\theta}\in\Theta_0\quad \text{vs.}\quad H_1:\boldsymbol{\theta}\in\Theta_1, \end{align*}$

be a testing problem where $\Theta_0$ and $\Theta_1$ are complementary subsets of the parameter space, i.e., such that $\Theta_1=\bar{\Theta}_0.$ ⁸⁰ For a srs $(X_1,\ldots,X_n)$ from $F(\cdot;\theta),$ let $\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)$ be the maximized likelihood attained inside $\Theta_0,$ that is,

$\begin{align*} \mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)=\sup_{\boldsymbol{\theta}\in\Theta_0}\mathcal{L}(\boldsymbol{\theta};X_1,\ldots,X_n), \end{align*}$

and, with the same notation, let $\mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n)$ be the maximum likelihood attained inside $\Theta.$ The Likelihood Ratio Test (LRT) is defined as the test with critical region of the form

$\begin{align*} C=\left\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;x_1,\ldots,x_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};x_1,\ldots,x_n)}\leq k\right\}. \end{align*}$

The quantity

$\begin{align*} \lambda_n:=\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n)} \end{align*}$

is referred to as the likelihood ratio statistic.

Remark. Note that the rejection region is at the left because a small $\lambda_n$ indicates that $\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)\ll \mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n),$ i.e., that the null hypothesis $H_0$ is unlikely given the data.

The likelihood ratio test is a generalization of the Neyman–Pearson’s Lemma. In fact, if $\Theta=\{\theta_0,\theta_1\},$ $\Theta_0=\{\theta_0\},$ and $\Theta_1=\{\theta_1\},$ then the LRT is equivalent to the UMP test for (6.5). In the following theorem, however, $\Theta_1=\{\theta_1\}$ is not allowed.

Theorem 6.2 (Asymptotic distribution of the likelihood ratio statistic) Under certain regularity conditions⁸¹ about $F(\cdot;\theta)$ the distribution of $X,$ and under $H_0,$ it is verified that

$\begin{align} -2\log\lambda_n\stackrel{d}{\longrightarrow} \chi_p^2, \tag{6.7} \end{align}$

if $p=\dim\Theta-\dim\Theta_0\geq 1$ is the number of specified parameters under $H_0.$

Remark. Theorem 6.2 is also known as Wilk’s Theorem and result (6.7) is often referred to as a Wilk’s phenomenon (the null distribution of the log-likelihood ratio does not depend on $\theta$ asymptotically).

The LRT based on $-2\log\lambda_n$ rejects for large values of the test statistic, as these correspond to small values of $\lambda_n$ (see the remark after Definition 6.7). Therefore, the asymptotic $p$ -value for the LRT using the test statistic $-2\log\lambda_n$ is the upper-tail probability of the $\chi^2_p$ distribution: $\mathbb{P}[\chi^2_p>-2\log\lambda_n].$ Note that this is the $p$ -value for a two-sided test, yet still an upper tail is involved!⁸²

Example 6.20 A labor union registers the number of complaints that are filled per week by the workers of two different shifts in the production line of a factory. The union gathers $n_1=87$ and $n_2=61$ independent observations about the number of complaints for both shifts $X_1$ and $X_2.$ The means $\bar{X}_1=20$ and $\bar{X}_2=22$ result. Assume that the number of complaints per week of the $k$ -th shift has $\mathrm{Pois}(\theta_k)$ distribution, $k=1,2.$ The labor union wants to test if the average number of complaints per week of both shifts is significantly different or not at a significance level $\alpha=0.01.$

The hypothesis to test is

$\begin{align*} H_0:\theta_1=\theta_2\quad \text{vs.} \quad H_1:\theta_1\neq \theta_2. \end{align*}$

Recall that the parametric space is

$\begin{align*} \Theta=\{(\theta_1,\theta_2)'\in\mathbb{R}^2: \theta_1\geq 0,\theta_2\geq 0\}, \end{align*}$

and that the space $\Theta_0$ that determines the null hypothesis is

$\begin{align*} \Theta_0=\{(\theta_1,\theta_2)\in\mathbb{R}^2: \theta_1=\theta_2\}, \end{align*}$

so the number of specified parameters is $p=2-1=1.$

The pmf of the Poisson of mean $\theta_k$ is

$\begin{align*} p(x;\theta_k)=\frac{\theta_k^x e^{-\theta_k}}{x!},\quad x=0,1,2,\ldots \end{align*}$

Two samples of sizes $n_1=87$ and $n_2=61$ were observed, one for each shift, and we denote them by

$\begin{align*} \boldsymbol{X}_1=(X_{11},\ldots,X_{1n_1})',\quad \boldsymbol{X}_2=(X_{11},\ldots,X_{2n_2})'. \end{align*}$

The joint likelihood is

$\begin{align*} \mathcal{L}(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta_1^{\sum_{i=1}^{n_1}X_{1i}}\theta_2^{\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1\theta_1+n_2\theta_2)}}{c}, \end{align*}$

where $c=X_{11}!\cdots X_{1n_1}!X_{11}!\cdots X_{2n_2}!$ does not depend on the parameters.

We compute the maximum likelihood in $\Theta_0.$ This is attained at the MLE restricted to $\Theta_0.$ But, under $H_0:\theta_1=\theta_2=:\theta,$ the likelihood is

$\begin{align*} \mathcal{L}(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta^{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1+n_2)\theta}}{c} \end{align*}$

Computing the maximum of this likelihood, we obtain the MLE restricted to $\Theta_0.$ Taking the log-likelihood

$\begin{align*} \ell(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}\right)\log\theta -(n_1+n_2)\theta-\log c \end{align*}$

and differentiating and equating to zero, we have

$\begin{align*} \hat{\theta}_0=\frac{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}{n_1+n_2}=\frac{n_1 \bar{X}_{1}+n_2 \bar{X}_{2}}{n_1+n_2}. \end{align*}$

The unrestricted MLE’s of $\theta_1$ and $\theta_2$ are obtained from the unrestricted log-likelihood, which is

$\begin{align*} \ell(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1} X_{1i}\right) \log\theta_1+\left(\sum_{j=1}^{n_2} X_{2j}\right) \log\theta_2-(n_1\theta_1+n_2\theta_2)-\log c. \end{align*}$

Differentiating and equating to zero, we have

$\begin{align*} \hat{\theta}_1=\bar{X}_1,\quad \hat{\theta}_2=\bar{X}_2. \end{align*}$

Then, the maximum likelihoods in $\Theta_0$ and $\Theta$ are, respectively,

$\begin{align*} \mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2) &=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}e^{-(n_1+n_2)\hat{\theta}_0}}{c},\\ \mathcal{L}(\hat{\theta}_1,\hat{\theta}_2;\boldsymbol{X}_1,\boldsymbol{X}_2)&=\frac{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}e^{-(n_1\hat{\theta}_1+n_2\hat{\theta}_2)}}{c}. \end{align*}$

Therefore, the likelihood ratio statistic is

$\begin{align*} \lambda_n=\frac{\mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2)}{\mathcal{L}(\hat{\theta}_1,\hat{\theta}_2 ;\boldsymbol{X}_1,\boldsymbol{X}_2)}=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}}{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}}e^{-(n_1+n_2)\hat{\theta}_0+(n_1\hat{\theta}_1+n_2\hat{\theta}_2)} \end{align*}$

and the log-likelihood ratio statistic is

$\begin{align*} \log\lambda_n=&\;(n_1\bar{X}_1+n_2\bar{X}_2)\log(\hat{\theta}_0)-(n_1\bar{X}_1)\log(\hat{\theta}_1)-(n_2\bar{X}_2)\log(\hat{\theta}_2)\\ &-(n_1+n_2)\hat{\theta}_0+n_1\hat{\theta}_1+n_2\hat{\theta}_2. \end{align*}$

We use R to compute $-2\log\lambda_n$ :

# Data
xbar_1 <- 20
xbar_2 <- 22
n_1 <- 87
n_2 <- 61

# MLE's under H0 and H1
theta_hat_0 <- (n_1 * xbar_1 + n_2 * xbar_2) / (n_1 + n_2)
theta_hat_1 <- xbar_1
theta_hat_2 <- xbar_2

# Log-likelihood ratio statistic
log_lamba_n <- (n_1 * xbar_1 + n_2 * xbar_2) * log(theta_hat_0) -
  (n_1 * xbar_1) * log(theta_hat_1) - (n_2 * xbar_2) * log(theta_hat_2) -
  (n_1 + n_2) * theta_hat_0 + (n_1 * theta_hat_1 + n_2 * theta_hat_2)
-2 * log_lamba_n
## [1] 6.851838

Then, $-2\log\lambda_n\approx6.85>\chi_{1;0.01}^2\approx6.635.$ Therefore, $H_0:\theta_1=\theta_2$ is rejected, that is, the data indicate that the average number of complaints in both shifts differ significantly. In addition, since the $p$ -value of the test is

# p-value of the test
pchisq(-2 * log_lamba_n, df = 1, lower.tail = FALSE)
## [1] 0.008855011

we will also reject for any significance level $\alpha>0.0089.$

Example 6.21 Derive the LRT for $H_0:\theta=\theta_0$ vs. $H_1:\theta\neq\theta_0$ for a srs $(X_1,\ldots,X_n)\sim\mathrm{Exp}(1/\theta).$ For $\theta_0=1.5$ , compute the $p$ -value of the test for the sample realizations (1.69, 1.15, 2.66, 0.06, 0.11) and (1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39) using R. What are the rejection decisions for $\alpha=0.05$ ?

First, from Exercise 4.11 we know that

$\begin{align*} \ell(\theta;X_1,\ldots,X_n)=-n\log\theta-\frac{n\bar{X}}{\theta} \end{align*}$

and that the unrestricted MLE of $\theta$ is $\hat{\theta}_\mathrm{MLE}=\bar{X}.$

We now obtain $-2\log\lambda_n.$ Since $\Theta_0=\{\theta_0\}$ and $\Theta_1=\{\theta\in\mathbb{R}_+: \theta\neq\theta_0\},$ then $\hat{\theta}_0=\theta_0,$ i.e., there is no estimation involved under $H_0.$ Therefore: $\begin{align*} -2\log\lambda_n=-2\log\frac{\mathcal{L}(\theta_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\theta};X_1,\ldots,X_n)}=-2(\ell(\theta_0;X_1,\ldots,X_n)-\ell(\hat{\theta};X_1,\ldots,X_n)), \end{align*}$

where $\hat{\theta}$ is obtained by maximizing the log-likelihood under $\Theta,$ giving $\hat{\theta}=\hat{\theta}_\mathrm{MLE}=\bar{X}.$

We are now ready to compute $-2\log\lambda_n$ with the help of R:

# Log-likelihood
log_lik <- function(x, theta) {
  n <- length(x)
  -n * log(theta) - n * mean(x) / theta
}

# Estimator function
theta_hat <- function(x) {
  mean(x)
}

# theta_0
theta_0 <- 1.5

# Statistic and p-value for first sample
x <- c(1.69, 1.15, 2.66, 0.06, 0.11)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
                         log_lik(x = x, theta = theta_hat(x = x))))
## [1] 0.357139
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.5500995

# Statistic and p-value for second sample
x <- c(1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
                         log_lik(x = x, theta = theta_hat(x = x))))
## [1] 4.174641
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.04103325

In conclusion, we do not reject $H_0$ in the first sample and we reject in the second sample for $\alpha=0.05.$

References

Lehmann, E. L., and J. P. Romano. 2005. Testing Statistical Hypotheses. Third. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/0-387-27605-X.

Note that this setup rules out one-sided tests where $\bar{\Theta}_0\neq\Theta_1.$ ↩︎
See the list at Theorem 12.4.2 in Lehmann and Romano (2005).↩︎
An explanation for this different behavior of two-sided tests is that here we are working on with test statistics based on differences of log-likelihoods, rather than in terms of differences (or ratios) of parameters and estimators, as done with normal-, CLT-, and MLE-based two-sided tests.↩︎