Chapter 15 Methods of Finding Tests(Lecture on 02/13/2020)

Definition 15.1 (Hypothesis) A hypothesis is a statement about a population parameter.

  • A hypothesis makes a statement about the population.

  • The goal of hypothesis test is to decide, based on a sample from the population, which of two complementary hypothesis is true.
Definition 15.2 The two complementary hypotheses in a hypothesis testing problem are called the null hypothesis and the alternative hypothesis. They are denoted by \(H_0\) and \(H_1\), respectively.
If \(\theta\) denotes a population parameter, the general format of the null and alternative hypotheses is \(H_0:\theta\in\Theta_0\) and \(H_1:\theta\in\Theta_0^c\), where \(\Theta_0\) is some subset of the parameter space and \(\Theta_0^c\) is its complement.

Definition 15.3 (Hypothesis Testing) A hypothesis testing procedure or hypothesis test is a rule that specifies:

  1. For which sample values the decision is made to accept \(H_0\) as true.

  2. For which sample values \(H_0\) is rejected and \(H_1\) is accepted as true.

The subset of the sample space for which \(H_0\) will be rejected is called the rejection region or critical region. The complement of the rejection region is called the acceptance region.
  • A hypothesis testing problem is a problem in which one of two actions is going to be taken, assertion of \(H_0\) or \(H_1\).

  • Typically, a hypothesis test is specified in terms of a test statistic \(W(X_1,\cdots,X_n)=W(\mathbf{X})\), a function of the sample.
Definition 15.4 (Likelihood Ratio Test Statistic) The likelihood ratio test statistic for testing \(H_0:\theta\in\Theta_0\) versus \(H_1:\theta\in\Theta_0^c\) is \[\begin{equation} \lambda(\mathbf{x})=\frac{\sup_{\Theta_0}L(\theta|\mathbf{x})}{\sup_{\Theta}L(\theta|\mathbf{x})} \tag{15.1} \end{equation}\] A likelihood ratio test (LRT) is any test that has a rejection region of the form \(\{\mathbf{x}:\lambda(\mathbf{x})\leq c\}\) where \(c\) is any number satisfying \(0\leq c\leq1\).
  • The rationale behind LRTs is as follows. The numerator of \(\lambda(\mathbf{x})\) is the maximum probability of the observed sample, computing over parameters in the null hypothesis. The denominator however, is the maximum probability of the observed sample over all possible parameters. The ratio of them is small if there are parameter points in the alternative hypothesis for which the observed sample is much more likely than for any parameter point in the null hypothesis. In this situation, the LRT criterion says \(H_0\) should be rejected and \(H_1\) accepted as true.

  • LRT can be viewed as doing maximization over both the entire parameter space and a subset of the parameter space. Suppose the MLE of \(\theta\) exists as \(\hat{\theta}\), \(\hat{\theta}\) is obtained by doing an unrestricted maximization of \(L(\theta|\mathbf{x})\). We can also consider the MLE of \(\theta\) on restricted parameter space \(\Theta_0\), denoted as \(\hat{\theta}_0\). Then the LRT statistic is \[\begin{equation} \lambda(\mathbf{x})=\frac{L(\hat{\theta}_0|\mathbf{x})}{L(\hat{\theta}|\mathbf{x})} \tag{15.2} \end{equation}\]
Example 15.1 (Normal LRT) Let \(X_1,\cdots,X_n\) be a random sample from a \(N(\theta,1)\) population. Consider testing \(H_0:\theta=\theta_0\) versus \(H_1:\theta\neq\theta_0\). Here \(\theta_0\) is a number fixed by the experiment prior to the experiment. Since there is only one value of \(\theta\) specified by \(H_0\), the numerator of \(\lambda(\mathbf{x})\) is \(L(\theta_0|\mathbf{x})\). The unrestricted MLE of \(\theta\) is \(\bar{X}\). Thus, the denominator of \(\lambda(\mathbf{x})\) is \(L(\bar{x}|\mathbf{x})\). So the LRT statistic is \[\begin{equation} \begin{split} \lambda(\mathbf{x})&=\frac{(2\pi)^{-n/2}exp[-\sum_{i=1}^n(x_i-\theta_0)^2/2]}{(2\pi)^{-n/2}exp[-\sum_{i=1}^n(x_i-\bar{x})^2/2]}\\ &=exp[(-\sum_{i=1}^n(x_i-\theta_0)^2+\sum_{i=1}^n(x_i-\bar{x})^2)/2] \end{split} \tag{15.3} \end{equation}\] The expression for \(\lambda(\mathbf{x})\) can be simplified by noting that \[\begin{equation} \sum_{i=1}^n(x_i-\theta_0)^2=\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\theta_0)^2 \tag{15.4} \end{equation}\] Thus, the LRT statistic is \[\begin{equation} \lambda(\mathbf{x})=exp[-\frac{n(\bar{x}-\theta_0)^2}{2}] \tag{15.5} \end{equation}\] An LRT is a test that reject \(H_0\) for small values of \(\lambda(\mathbf{x})\). From (15.5), the rejection region, \(\{\mathbf{x}:\lambda(\mathbf{x})\leq c\}\), can be written as \[\begin{equation} \{\mathbf{x}:|\bar{x}-\theta_0|\geq\sqrt{\frac{-2\log(c)}{n}}\} \tag{15.6} \end{equation}\] As \(c\) ranges between 0 and 1, \(\sqrt{\frac{-2\log(c)}{n}}\) ranges between 0 and \(\infty\). Thus, the LRTs are just those tests that reject \(H_0:\theta=\theta_0\) if the sample mean differs from the hypothesized value \(\theta_0\) by more than a specified amount.
Always try to simplify the rejection region to an expression involving a simpler statistic.

Example 15.2 (Exponential LRT) Let \(X_1,\cdots,X_n\) be a random sample from an exponential population with p.d.f. \[\begin{equation} f(x|\theta)=\left\{\begin{aligned} &e^{-(x-\theta)} &\quad x\geq\theta\\ &0 &\quad x<\theta \end{aligned}\right. \tag{15.7} \end{equation}\]

where \(-\infty<\theta<\infty\). The likelihood function is \[\begin{equation} L(\theta|\mathbf{x})=\left\{\begin{aligned} &e^{-\sum_{i=1}^nx_i+n\theta} &\quad \theta\leq\min_ix_i\\ &0 &\quad \theta>\min_ix_i \end{aligned}\right. \tag{15.8} \end{equation}\]

Consider testing \(H_0:\theta\leq\theta_0\) versus \(H_1:\theta>\theta_0\), where \(\theta_0\) is a value specified by the experimenter. Clearly \(L(\theta|\mathbf{x})\) is an increasing function of \(\theta\) on \(-\infty<\theta<\min_ix_i\). Thus, the denominator of \(\lambda(\mathbf{x})\), the unrestricted maximum of \(L(\theta|\mathbf{x})\), is \[\begin{equation} L(\min_{i}x_i|\mathbf{x})=e^{-\sum_{i=1}^nx_i+n\min_ix_i} \tag{15.9} \end{equation}\] If \(\min_ix_i\leq\theta_0\), the numerator of \(\lambda(\mathbf{x})\) is also \(L(\min_{i}x_i|\mathbf{x})\). If \(\theta_0<\min_ix_i\), the numerator of \(\lambda(\mathbf{x})\) is \(L(\theta_0|\mathbf{x})\). Therefore, the likelihood ratio test statistic is \[\begin{equation} \lambda(\mathbf{x})=\left\{\begin{aligned} &1 &\quad \min_ix_i\leq\theta_0\\ &e^{-n(\min_ix_i-\theta_0)} & \quad \min_ix_i>\theta_0 \end{aligned}\right. \tag{15.10} \end{equation}\] \(L(\theta_0|\mathbf{x})\) decreasing exponentially as \(\min_ix_i\) departures \(\theta_0\). An LRT, a test that reject \(H_0\) if \(\lambda(\mathbf{X})\leq c\) is a test with rejection region \(\{\mathbf{x}:\min_ix_i\geq\theta_0-\frac{\log(c)}{n}\}\).

If \(T(\mathbf{X})\) is a sufficient statistic for \(\theta\) with p.d.f. or p.m.f. \(g(t|\theta)\), then we might consider constructing an LRT based on T and its likelihood function \(L^*(\theta|t)=g(t|\theta)\) ranther than on the sample \(\mathbf{X}\) and its likelihood function \(L(\theta|\mathbf{x})\). Let \(\lambda^*(t)\) denote the likelihood ratio test statistic based on \(T\). We shall prove these two tests are equivalent.

The intuition of this idea is all the information about \(\theta\) in \(\mathbf{x}\) is contained in \(T(\mathbf{x})\).
Theorem 15.1 If \(T(\mathbf{X})\) is a sufficient statistic for \(\theta\) and \(\lambda^*(t)\) and \(\lambda(\mathbf{x})\) are the LRT statistics based on \(T\) and \(\mathbf{X}\), respectively, then \(\lambda^*(T(\mathbf{x}))=\lambda(\mathbf{x})\) for every \(\mathbf{x}\) in the sample space.
Proof. From the Factorization Theorem (Theorem (thm:thm05001), the p.d.f. or p.m.f. of \(\mathbf{X}\) can be written as \(f(\mathbf{x}|\theta)=g(T(\mathbf{x})|\theta)h(\mathbf{x})\), where \(g(t|\theta)\) is the p.d.f. or p.m.f. of \(T\) and \(h(\mathbf{x})\) does not depend on \(\theta\). Thus, \[\begin{equation} \begin{split} \lambda(\mathbf{x})&=\frac{\sup_{\Theta_0}L(\theta|\mathbf{x})}{\sup_{\Theta}L(\theta|\mathbf{x})}= \frac{\sup_{\Theta_0}f(\mathbf{x}|\theta)}{\sup_{\Theta}f(\mathbf{x}|\theta)}\\ &=\frac{\sup_{\Theta_0}g(T(\mathbf{x})|\theta)h(\mathbf{x})}{\sup_{\Theta}g(T(\mathbf{x})|\theta)h(\mathbf{x})}\\ &=\frac{\sup_{\Theta_0}g(T(\mathbf{x})|\theta)}{\sup_{\Theta}g(T(\mathbf{x})|\theta)} =\frac{\sup_{\Theta_0}L^*(\theta|T(\mathbf{x}))}{\sup_{\Theta}L^*(\theta|T(\mathbf{x}))}=\lambda^*(T(\mathbf{x})) \end{split} \tag{15.11} \end{equation}\]
The simplified expression for \(\lambda(\mathbf{x})\) should depend on \(\mathbf{x}\) only through \(T(\mathbf{x})\) if \(T(\mathbf{X})\) is a sufficient statisitc for \(\theta\).

Example 15.3 (LRT and Sufficiency) In Example 15.1, \(\bar{X}\) is a sufficient statistic for \(\theta\) and \(\bar{X}\sim N(\theta,\frac{1}{n})\). Thus, using the likilihood function associated with \(\bar{X}\), the likelihood ratio test of \(H_0:\theta=\theta_0\) versus \(H_1:\theta\neq\theta_0\) rejects \(H_0\) for large values of \(|\bar{X}-\theta|\).

In Example 15.2, \(\min_iX_i\) is a sufficient statistic for \(\theta\). The likelihood function of \(\min_iX_i\) is \[\begin{equation} L^*(\theta|\min_ix_i)=\left\{\begin{aligned} & ne^{-n(\min_ix_i-\theta)} & \quad \theta\leq \min_ix_i\\ & 0 & \quad \theta>\min_ix_i \end{aligned} \right. \tag{15.12} \end{equation}\] using (15.12) it is easily to see that a likelihood ratio test of \(H_0:\theta\leq\theta_0\) versus \(H_1:\theta>\theta_0\) rejects \(H_0\) for large values of \(\min_iX_i\).

Example 15.4 (Normal LRT with unknown variance) Suppose \(X_1,\cdots,X_n\) are a random sample from a \(N(\mu,\sigma^2)\), and an experimenter is interested only in inference about \(\mu\), such as testing \(H_0:\mu\leq\mu_0\) versus \(H_1:\mu>\mu_0\). Then the parameter \(\sigma^2\) is a nuisance parameter. The LRT statistic is \[\begin{equation} \begin{split} \lambda(\mathbf{x})&=\frac{max_{\{\mu,\sigma^2:\mu\leq\mu_0,\sigma^2\geq0\}}L(\mu,\sigma^2|\mathbf{x})} {max_{\{\mu,\sigma^2:-\infty<\mu<\infty,\sigma^2\geq0\}}L(\mu,\sigma^2|\mathbf{x})}\\ &=\frac{max_{\{\mu,\sigma^2:\mu\leq\mu_0,\sigma^2\geq0\}}L(\mu,\sigma^2|\mathbf{x})}{L(\hat{\mu},\hat{\sigma}^2|\mathbf{x})} \end{split} \tag{15.13} \end{equation}\] where \(\hat{\mu}\) and \(\hat{\sigma}^2\) are the MLE of \(\mu\) and \(\sigma^2\). Furthermore, if \(\hat{\mu}\leq\mu_0\), then the restricted maximum is the same as the unrestriced maximum, while if \(\hat{\mu}>\mu_0\), the restricted maximum is \(L(\mu_0,\sigma_0^2|\mathbf{x})\), where \(\sigma_0^2=\sum_{i=1}^n(x_i-\mu_0)^2/n\). Thus, \[\begin{equation} \lambda(\mathbf{x})=\left\{\begin{aligned} & 1 & \quad \hat{\mu}\leq\mu_0 \\ & \frac{L(\mu_0,\hat{\sigma}^2_0|\mathbf{x})}{L(\hat{\mu},\hat{\sigma}^2|\mathbf{x})} & \quad \hat{\mu}>\mu_0 \end{aligned} \right. \tag{15.14} \end{equation}\] This test based on \(\lambda(\mathbf{x})\) is equivalent to a test based on t distribution.

When having nuisance parameters, (parameters that are present in a model but are not of direct inferential interest), it does not affect the LRT construction method but, might lead to a different trst.

Definition 15.5 (Bayesian Tests) When using posterior distribution to calculate the probabilities that \(H_0\) and \(H_1\) are true, it is referred as Bayesian tests. The posterior probabilities \(P(\theta\in\Theta_0|\mathbf{x})=P(H_0\,is\,True|\mathbf{x})\) and \(P(\theta\in\Theta_0^c|\mathbf{x})=P(H_1\,is\,True|\mathbf{x})\) can be computed. The tester may choose to accept \(H_0\) as true if \(P(\theta\in\Theta_0|\mathbf{x})\geq P(\theta\in\Theta_0^c|\mathbf{x})\). In this case, the test statistic is \(P(\theta\in\Theta_0^c|\mathbf{x})\) and the rejection region is \(\{\mathbf{x}:P(\theta\in\Theta_0^c|\mathbf{x})>\frac{1}{2}\}\). Alternatively, if the tester wishes to guard against falsely rejecting \(H_0\), he may decide to reject \(H_0\) only if \(P(\theta\in\Theta_0^c|\mathbf{X})\) us greater than some large number, 0.99 for example.

Example 15.5 (Normal Bayesian Test) Let \(X_1,\cdots,X_n\) be i.i.d. \(N(\theta,\sigma^2)\) and let the prior distribution on \(\theta\) be \(N(\mu,\tau^2)\), where \(\sigma^2\), \(\mu\) and \(\tau^2\) are known. Consider testing \(H_0: \theta\leq\theta_0\) versus \(H_1: \theta>\theta_0\). The posterior \(\pi(\theta|\bar{x})\) is normal with mean \((n\tau^2\bar{x}+\sigma^2\mu)/(n\tau^2+\sigma^2)\) and variance \(\sigma^2\tau^2/(n\tau^2+\sigma^2)\).

If we decide to accept \(H_0\) if and only if \(P(\theta\in\Theta_0|\mathbf{X})\geq P(\theta\in\Theta_0^c|\mathbf{X})\), then we will accept \(H_0\) if and only if \[\begin{equation} \frac{1}{2}\leq P(\theta\in\Theta_0|\mathbf{X})=P(\theta\leq\theta_0|\mathbf{X}) \tag{15.15} \end{equation}\] Since \(\pi(\theta|\mathbf{x})\) is symmetric, this is true if and only if the mean of \(\pi(\theta|\mathbf{x})\) is less than or equal to \(\theta_0\). Therefore, \(H_0\) will be accepted as true if \[\begin{equation} \bar{X}\leq \theta_0+\frac{\sigma^2(\theta_0-\mu)}{n\tau^2} \tag{15.16} \end{equation}\] and \(H_1\) will be accepted as true otherwise. In particular, if \(\mu=\theta_0\) so that prior to experimentation probability \(\frac{1}{2}\) is assigned to both \(H_0\) and \(H_1\), then \(H_0\) will be accepted as true if \(\bar{x}\leq \theta_0\) and \(H_1\) accepted otherwise.
Tests for complicated null hypotheses can be developed from tests for simpler null hypotheses. The two related methods are Union-Intersection and Intersection-Union tests.
Definition 15.6 (Union-Intersection Method) The union-intersection method of test construction might be useful when the null hypothesis is conveniently expressed as an intersection, \[\begin{equation} H_0: \theta\in\bigcap_{\gamma\in\Gamma}\Theta_{\gamma} \tag{15.17} \end{equation}\] Here \(\Gamma\) is an arbitrary index set that may be finite or infinite, depending on the problem. Suppose that tests are available for each of the problems of testing \(H_{0_{\gamma}}:\theta\in\Theta_{\gamma}\) versus \(H_{1_{\gamma}}:\theta\in\Theta_{\gamma}^c\). Say the rejection region for the test of \(H_{0_{\gamma}}\) is \(\{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\}\). Then the rejection region for the union-intersection test is \[\begin{equation} \bigcup_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\} \tag{15.18} \end{equation}\]
  • The rationale behind is if any one of the hypotheses \(H_{0_{\gamma}}\) is rejected, then \(H_0\) must also be rejected, because by formulation, \(H_0\) is true only if \(H_{0_{\gamma}}\) is true for every \(\gamma\).

  • This method is useful when the rejection region of a union-intersection has simple expression. For example, suppose each of the individual tests has a rejection region of the form \(\{\mathbf{x}:T_{\gamma}(\mathbf{x})>c\}\) where \(c\) does not depend on \(\gamma\). The rejection region for the union-intersection test can then be expressed as \[\begin{equation} \bigcup_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})>c\}=\{\mathbf{x}:\sup_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x})>c\} \tag{15.19} \end{equation}\] Thus, the test statistic for testing \(H_0\) is \(T(\mathbf{x})=\sup_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x})\).

Example 15.6 (Normal Union-Intersection Test) Let \(X_1,\cdots,X_n\) be a random sample from \(N(\mu,\sigma^2)\) population. Consider testing \(H_0:\mu=\mu_0\) versus \(H_1:\mu\neq\mu_0\), where \(\mu_0\) is a specified number. We can write \(H_0\) as the intersection of two sets, \[\begin{equation} H_0: \{\mu:\mu\leq\mu_0\}\cap\{\mu:\mu\geq\mu_0\} \tag{15.20} \end{equation}\]

The LRT of \(H_{0L}:\mu\leq\mu_0\) versus \(H_{1L}:\mu>\mu_0\) is reject \(H_{0L}:\mu\leq\mu_0\) in favor of \(H_{1L}:\mu>\mu_0\) if \(\frac{\bar{X}-\mu_0}{S/\sqrt{n}}\geq t_L\). Similarly, the LRT of \(H_{0U}:\mu\geq\mu_0\) versus \(H_{1U}:\mu<\mu_0\) is reject \(H_{0U}:\mu\geq\mu_0\) in favor of \(H_{1U}:\mu<\mu_0\) if \(\frac{\bar{X}-\mu_0}{S/\sqrt{n}}\leq t_U\). Thus, the union-intersection test of \(H_0:\mu=\mu_0\) versus \(H_1:\mu\neq\mu_0\) formed from these two LRTs is reject \(H_0\) if \(\frac{\bar{X}-\mu_0}{S/\sqrt{n}}\geq t_L\) or \(\frac{\bar{X}-\mu_0}{S/\sqrt{n}}\leq t_U\). If \(t_L=-t_U\geq 0\), the union-intersection test can be more simply expressed as reject \(H_0\) if \[\begin{equation} \frac{|\bar{X}-\mu_0|}{S/\sqrt{n}}\geq t_L \tag{15.21} \end{equation}\]

This is also the LRT for this problem and is called the two-sided \(t\) test.

Definition 15.7 (Intersection-Union Method) The intersection-union method may be useful if the null hypothesis is conveniently expressed as a union. Suppose we wish to test the null hypothesis \[\begin{equation} H_0: \theta\in\bigcup_{\gamma\in\Gamma}\Theta_{\gamma} \tag{15.22} \end{equation}\] Suppose that for each \(\gamma\in\Gamma\), \(\{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\}\) is the rejection region for a test of \(H_{0_{\gamma}}:\theta\in\Theta_{\gamma}\) versus \(H_{1_{\gamma}}:\theta\in\Theta_{\gamma}^c\). Then the rejection region for the intersection-union test of \(H_0\) versus \(H_1\) is \[\begin{equation} \bigcap_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\} \tag{15.23} \end{equation}\]

  • The rationale behind is \(H_0\) is false if and only if all of the \(H_{0_{\gamma}}\) are false, so \(H_0\) can be reject if and only if each individual hypotheses \(H_{0_{\gamma}}\) can be rejected.

  • This method is useful when the rejection regions for the individual hypotheses are all of the form \(\{\mathbf{x}:T_{\gamma}(\mathbf{x})\geq c\}\) where \(c\) does not depend on \(\gamma\). The rejection region is then \[\begin{equation} \bigcap_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})\geq c\}=\{\mathbf{x}:\inf_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x})\geq c\} \tag{15.24} \end{equation}\] Thus, the test statistic for testing \(H_0\) is \(T(\mathbf{x})=\inf_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x})\), and the test rejects \(H_0\) for large values of this statistic.

Example 15.7 (Acceptance Sampling) Two parameters that are important in assessing the quality of upholstery fabric are \(\theta_1\), the mean breaking strength and \(\theta_2\), the probability of passing a flammability test. Standards may dictate that \(\theta_1\) should be over 50 pounds and \(\theta_2\) should be over 0.95. and the fabric is acceptable only if it meets both these standards. This can be modeled with \(H_0:\{\theta_1\leq 50\,or\,\theta_2\leq0.95\}\) versus \(H_1:\{\theta_1>50\,and\,\theta_2>0.95\}\) where a batch of material is acceptable only if \(H_1\) is accepted.

Suppose \(X_1,\cdots,X_n\) are measurements of breaking strength for \(n\) samples and are assumed to be i.i.d. \(N(\theta_1,\sigma^2)\). The LRT of \(H_{01}:\theta_1\leq 50\) will be rejected if \(\frac{\bar{X}-50}{S/\sqrt{n}}>t\). Suppose that we also have the results of \(m\) flammability tests, denoted by \(Y_1,\cdots,Y_m\), where \(Y_i=1\) if the \(i\)th sample passes the test and \(Y_i=0\) otherwise. If \(Y_1,\cdots,Y_m\) are modeled as i.i.d. \(Bernoulli(\theta_2)\) random variables, the LRT will reject \(H_{02}:\theta_2\leq 0.95\) if \(\sum_{i=1}^mY_i>b\). Putting all of this together, the rejection region for the intersection-union test is \[\begin{equation} \{(\mathbf{x},\mathbf{y}):\frac{\bar{x}-50}{s/\sqrt{n}}>t\,and\,\sum_{i=1}^my_i>b\} \tag{15.25} \end{equation}\]

Thus the intersection-union test decides the product is acceptable, that is, \(H_1\) is true, if and only if it decides that each of the individual parameters meets its standard, that is, \(H_{1i}\) is true. If more than two parameters define a product’s quality, individual tests for each parameter can be combined, by means of the intersection-union method, to yield an overall test of the product’s quality.