## 6.7 The likelihood ratio test

Definition 6.7 (Likelihood ratio test) Let

\begin{align*} H_0:\boldsymbol{\theta}\in\Theta_0\quad \text{vs.}\quad H_1:\boldsymbol{\theta}\in\Theta_1, \end{align*}

be a testing problem where $$\Theta_0$$ and $$\Theta_1$$ are complementary subsets of the parameter space, i.e., such that $$\Theta_1=\bar{\Theta}_0.$$80 For a srs $$(X_1,\ldots,X_n)$$ from $$F(\cdot;\theta),$$ let $$\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)$$ be the maximized likelihood attained inside $$\Theta_0,$$ that is,

\begin{align*} \mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)=\sup_{\boldsymbol{\theta}\in\Theta_0}\mathcal{L}(\boldsymbol{\theta};X_1,\ldots,X_n), \end{align*}

and, with the same notation, let $$\mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n)$$ be the maximum likelihood attained inside $$\Theta.$$ The Likelihood Ratio Test (LRT) is defined as the test with critical region of the form

\begin{align*} C=\left\{(x_1,\ldots,x_n)'\in\mathbb{R}^n:\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;x_1,\ldots,x_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};x_1,\ldots,x_n)}\leq k\right\}. \end{align*}

The quantity

\begin{align*} \lambda_n:=\frac{\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n)} \end{align*}

is referred to as the likelihood ratio statistic.

Remark. Note that the rejection region is at the left because a small $$\lambda_n$$ indicates that $$\mathcal{L}(\hat{\boldsymbol{\theta}}_0;X_1,\ldots,X_n)\ll \mathcal{L}(\hat{\boldsymbol{\theta}};X_1,\ldots,X_n),$$ i.e., that the null hypothesis $$H_0$$ is unlikely given the data.

The likelihood ratio test is a generalization of the Neyman–Pearson’s Lemma. In fact, if $$\Theta=\{\theta_0,\theta_1\},$$ $$\Theta_0=\{\theta_0\},$$ and $$\Theta_1=\{\theta_1\},$$ then the LRT is equivalent to the UMP test for (6.5). In the following theorem, however, $$\Theta_1=\{\theta_1\}$$ is not allowed.

Theorem 6.2 (Asymptotic distribution of the likelihood ratio statistic) Under certain regularity conditions81 about $$F(\cdot;\theta)$$ the distribution of $$X,$$ and under $$H_0,$$ it is verified that

\begin{align} -2\log\lambda_n\stackrel{d}{\longrightarrow} \chi_p^2, \tag{6.7} \end{align}

if $$p=\dim\Theta-\dim\Theta_0\geq 1$$ is the number of specified parameters under $$H_0.$$

Remark. Theorem 6.2 is also known as Wilk’s Theorem and result (6.7) is often referred to as a Wilk’s phenomenon (the null distribution of the log-likelihood ratio does not depend on $$\theta$$ asymptotically).

The LRT based on $$-2\log\lambda_n$$ rejects for large values of the test statistic, as these correspond to small values of $$\lambda_n$$ (see the remark after Definition 6.7). Therefore, the asymptotic $$p$$-value for the LRT using the test statistic $$-2\log\lambda_n$$ is the upper-tail probability of the $$\chi^2_p$$ distribution: $$\mathbb{P}[\chi^2_p>-2\log\lambda_n].$$ Note that this is the $$p$$-value for a two-sided test, yet still an upper tail is involved!82

Example 6.20 A labor union registers the number of complaints that are filled per week by the workers of two different shifts in the production line of a factory. The union gathers $$n_1=87$$ and $$n_2=61$$ independent observations about the number of complaints for both shifts $$X_1$$ and $$X_2.$$ The means $$\bar{X}_1=20$$ and $$\bar{X}_2=22$$ result. Assume that the number of complaints per week of the $$k$$-th shift has $$\mathrm{Pois}(\theta_k)$$ distribution, $$k=1,2.$$ The labor union wants to test if the average number of complaints per week of both shifts is significantly different or not at a significance level $$\alpha=0.01.$$

The hypothesis to test is

\begin{align*} H_0:\theta_1=\theta_2\quad \text{vs.} \quad H_1:\theta_1\neq \theta_2. \end{align*}

Recall that the parametric space is

\begin{align*} \Theta=\{(\theta_1,\theta_2)'\in\mathbb{R}^2: \theta_1\geq 0,\theta_2\geq 0\}, \end{align*}

and that the space $$\Theta_0$$ that determines the null hypothesis is

\begin{align*} \Theta_0=\{(\theta_1,\theta_2)\in\mathbb{R}^2: \theta_1=\theta_2\}, \end{align*}

so the number of specified parameters is $$p=2-1=1.$$

The pmf of the Poisson of mean $$\theta_k$$ is

\begin{align*} p(x;\theta_k)=\frac{\theta_k^x e^{-\theta_k}}{x!},\quad x=0,1,2,\ldots \end{align*}

Two samples of sizes $$n_1=87$$ and $$n_2=61$$ were observed, one for each shift, and we denote them by

\begin{align*} \boldsymbol{X}_1=(X_{11},\ldots,X_{1n_1})',\quad \boldsymbol{X}_2=(X_{11},\ldots,X_{2n_2})'. \end{align*}

The joint likelihood is

\begin{align*} \mathcal{L}(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta_1^{\sum_{i=1}^{n_1}X_{1i}}\theta_2^{\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1\theta_1+n_2\theta_2)}}{c}, \end{align*}

where $$c=X_{11}!\cdots X_{1n_1}!X_{11}!\cdots X_{2n_2}!$$ does not depend on the parameters.

We compute the maximum likelihood in $$\Theta_0.$$ This is attained at the MLE restricted to $$\Theta_0.$$ But, under $$H_0:\theta_1=\theta_2=:\theta,$$ the likelihood is

\begin{align*} \mathcal{L}(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\frac{\theta^{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}e^{-(n_1+n_2)\theta}}{c} \end{align*}

Computing the maximum of this likelihood, we obtain the MLE restricted to $$\Theta_0.$$ Taking the log-likelihood

\begin{align*} \ell(\theta;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}\right)\log\theta -(n_1+n_2)\theta-\log c \end{align*}

and differentiating and equating to zero, we have

\begin{align*} \hat{\theta}_0=\frac{\sum_{i=1}^{n_1}X_{1i}+\sum_{j=1}^{n_2}X_{2j}}{n_1+n_2}=\frac{n_1 \bar{X}_{1}+n_2 \bar{X}_{2}}{n_1+n_2}. \end{align*}

The unrestricted MLE’s of $$\theta_1$$ and $$\theta_2$$ are obtained from the unrestricted log-likelihood, which is

\begin{align*} \ell(\theta_1,\theta_2;\boldsymbol{X}_1,\boldsymbol{X}_2)=\left(\sum_{i=1}^{n_1} X_{1i}\right) \log\theta_1+\left(\sum_{j=1}^{n_2} X_{2j}\right) \log\theta_2-(n_1\theta_1+n_2\theta_2)-\log c. \end{align*}

Differentiating and equating to zero, we have

\begin{align*} \hat{\theta}_1=\bar{X}_1,\quad \hat{\theta}_2=\bar{X}_2. \end{align*}

Then, the maximum likelihoods in $$\Theta_0$$ and $$\Theta$$ are, respectively,

\begin{align*} \mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2) &=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}e^{-(n_1+n_2)\hat{\theta}_0}}{c},\\ \mathcal{L}(\hat{\theta}_1,\hat{\theta}_2;\boldsymbol{X}_1,\boldsymbol{X}_2)&=\frac{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}e^{-(n_1\hat{\theta}_1+n_2\hat{\theta}_2)}}{c}. \end{align*}

Therefore, the likelihood ratio statistic is

\begin{align*} \lambda_n=\frac{\mathcal{L}(\hat{\theta}_0;\boldsymbol{X}_1,\boldsymbol{X}_2)}{\mathcal{L}(\hat{\theta}_1,\hat{\theta}_2 ;\boldsymbol{X}_1,\boldsymbol{X}_2)}=\frac{\hat{\theta}_0^{n_1\bar{X}_1+n_2\bar{X}_2}}{\hat{\theta}_1^{n_1\bar{X}_1}\hat{\theta}_2^{n_2\bar{X}_2}}e^{-(n_1+n_2)\hat{\theta}_0+(n_1\hat{\theta}_1+n_2\hat{\theta}_2)} \end{align*}

and the log-likelihood ratio statistic is

\begin{align*} \log\lambda_n=&\;(n_1\bar{X}_1+n_2\bar{X}_2)\log(\hat{\theta}_0)-(n_1\bar{X}_1)\log(\hat{\theta}_1)-(n_2\bar{X}_2)\log(\hat{\theta}_2)\\ &-(n_1+n_2)\hat{\theta}_0+n_1\hat{\theta}_1+n_2\hat{\theta}_2. \end{align*}

We use R to compute $$-2\log\lambda_n$$:

# Data
xbar_1 <- 20
xbar_2 <- 22
n_1 <- 87
n_2 <- 61

# MLE's under H0 and H1
theta_hat_0 <- (n_1 * xbar_1 + n_2 * xbar_2) / (n_1 + n_2)
theta_hat_1 <- xbar_1
theta_hat_2 <- xbar_2

# Log-likelihood ratio statistic
log_lamba_n <- (n_1 * xbar_1 + n_2 * xbar_2) * log(theta_hat_0) -
(n_1 * xbar_1) * log(theta_hat_1) - (n_2 * xbar_2) * log(theta_hat_2) -
(n_1 + n_2) * theta_hat_0 + (n_1 * theta_hat_1 + n_2 * theta_hat_2)
-2 * log_lamba_n
## [1] 6.851838

Then, $$-2\log\lambda_n\approx6.85>\chi_{1;0.01}^2\approx6.635.$$ Therefore, $$H_0:\theta_1=\theta_2$$ is rejected, that is, the data indicate that the average number of complaints in both shifts differ significantly. In addition, since the $$p$$-value of the test is

# p-value of the test
pchisq(-2 * log_lamba_n, df = 1, lower.tail = FALSE)
## [1] 0.008855011

we will also reject for any significance level $$\alpha>0.0089.$$

Example 6.21 Derive the LRT for $$H_0:\theta=\theta_0$$ vs. $$H_1:\theta\neq\theta_0$$ for a srs $$(X_1,\ldots,X_n)\sim\mathrm{Exp}(1/\theta).$$ For $$\theta_0=1.5$$, compute the $$p$$-value of the test for the sample realizations (1.69, 1.15, 2.66, 0.06, 0.11) and (1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39) using R. What are the rejection decisions for $$\alpha=0.05$$?

First, from Exercise 4.11 we know that

\begin{align*} \ell(\theta;X_1,\ldots,X_n)=-n\log\theta-\frac{n\bar{X}}{\theta} \end{align*}

and that the unrestricted MLE of $$\theta$$ is $$\hat{\theta}_\mathrm{MLE}=\bar{X}.$$

We now obtain $$-2\log\lambda_n.$$ Since $$\Theta_0=\{\theta_0\}$$ and $$\Theta_1=\{\theta\in\mathbb{R}_+: \theta\neq\theta_0\},$$ then $$\hat{\theta}_0=\theta_0,$$ i.e., there is no estimation involved under $$H_0.$$ Therefore: \begin{align*} -2\log\lambda_n=-2\log\frac{\mathcal{L}(\theta_0;X_1,\ldots,X_n)}{\mathcal{L}(\hat{\theta};X_1,\ldots,X_n)}=-2(\ell(\theta_0;X_1,\ldots,X_n)-\ell(\hat{\theta};X_1,\ldots,X_n)), \end{align*}

where $$\hat{\theta}$$ is obtained by maximizing the log-likelihood under $$\Theta,$$ giving $$\hat{\theta}=\hat{\theta}_\mathrm{MLE}=\bar{X}.$$

We are now ready to compute $$-2\log\lambda_n$$ with the help of R:

# Log-likelihood
log_lik <- function(x, theta) {
n <- length(x)
-n * log(theta) - n * mean(x) / theta
}

# Estimator function
theta_hat <- function(x) {
mean(x)
}

# theta_0
theta_0 <- 1.5

# Statistic and p-value for first sample
x <- c(1.69, 1.15, 2.66, 0.06, 0.11)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
log_lik(x = x, theta = theta_hat(x = x))))
## [1] 0.357139
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.5500995

# Statistic and p-value for second sample
x <- c(1.05, 0.72, 1.66, 0.04, 0.07, 0.4, 0.39)
(log_lambda_n <- -2 * (log_lik(x = x, theta = theta_0) -
log_lik(x = x, theta = theta_hat(x = x))))
## [1] 4.174641
pchisq(q = log_lambda_n, df = 1, lower.tail = FALSE)
## [1] 0.04103325

In conclusion, we do not reject $$H_0$$ in the first sample and we reject in the second sample for $$\alpha=0.05.$$

### References

Lehmann, E. L., and J. P. Romano. 2005. Testing Statistical Hypotheses. Third. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/0-387-27605-X.

1. Note that this setup rules out one-sided tests where $$\bar{\Theta}_0\neq\Theta_1.$$↩︎

2. See the list at Theorem 12.4.2 in Lehmann and Romano (2005).↩︎

3. An explanation for this different behavior of two-sided tests is that here we are working on with test statistics based on differences of log-likelihoods, rather than in terms of differences (or ratios) of parameters and estimators, as done with normal-, CLT-, and MLE-based two-sided tests.↩︎