Chapter 15 Methods of Finding Tests(Lecture on 02/13/2020)

Definition 15.1 (Hypothesis) A hypothesis is a statement about a population parameter.

  • A hypothesis makes a statement about the population.

  • The goal of hypothesis test is to decide, based on a sample from the population, which of two complementary hypothesis is true.
Definition 15.2 The two complementary hypotheses in a hypothesis testing problem are called the null hypothesis and the alternative hypothesis. They are denoted by H0 and H1, respectively.
If θ denotes a population parameter, the general format of the null and alternative hypotheses is H0:θΘ0 and H1:θΘc0, where Θ0 is some subset of the parameter space and Θc0 is its complement.

Definition 15.3 (Hypothesis Testing) A hypothesis testing procedure or hypothesis test is a rule that specifies:

  1. For which sample values the decision is made to accept H0 as true.

  2. For which sample values H0 is rejected and H1 is accepted as true.

The subset of the sample space for which H0 will be rejected is called the rejection region or critical region. The complement of the rejection region is called the acceptance region.
  • A hypothesis testing problem is a problem in which one of two actions is going to be taken, assertion of H0 or H1.

  • Typically, a hypothesis test is specified in terms of a test statistic W(X1,,Xn)=W(X), a function of the sample.
Definition 15.4 (Likelihood Ratio Test Statistic) The likelihood ratio test statistic for testing H0:θΘ0 versus H1:θΘc0 is λ(x)=sup A likelihood ratio test (LRT) is any test that has a rejection region of the form \{\mathbf{x}:\lambda(\mathbf{x})\leq c\} where c is any number satisfying 0\leq c\leq1.
  • The rationale behind LRTs is as follows. The numerator of \lambda(\mathbf{x}) is the maximum probability of the observed sample, computing over parameters in the null hypothesis. The denominator however, is the maximum probability of the observed sample over all possible parameters. The ratio of them is small if there are parameter points in the alternative hypothesis for which the observed sample is much more likely than for any parameter point in the null hypothesis. In this situation, the LRT criterion says H_0 should be rejected and H_1 accepted as true.

  • LRT can be viewed as doing maximization over both the entire parameter space and a subset of the parameter space. Suppose the MLE of \theta exists as \hat{\theta}, \hat{\theta} is obtained by doing an unrestricted maximization of L(\theta|\mathbf{x}). We can also consider the MLE of \theta on restricted parameter space \Theta_0, denoted as \hat{\theta}_0. Then the LRT statistic is \begin{equation} \lambda(\mathbf{x})=\frac{L(\hat{\theta}_0|\mathbf{x})}{L(\hat{\theta}|\mathbf{x})} \tag{15.2} \end{equation}
Example 15.1 (Normal LRT) Let X_1,\cdots,X_n be a random sample from a N(\theta,1) population. Consider testing H_0:\theta=\theta_0 versus H_1:\theta\neq\theta_0. Here \theta_0 is a number fixed by the experiment prior to the experiment. Since there is only one value of \theta specified by H_0, the numerator of \lambda(\mathbf{x}) is L(\theta_0|\mathbf{x}). The unrestricted MLE of \theta is \bar{X}. Thus, the denominator of \lambda(\mathbf{x}) is L(\bar{x}|\mathbf{x}). So the LRT statistic is \begin{equation} \begin{split} \lambda(\mathbf{x})&=\frac{(2\pi)^{-n/2}exp[-\sum_{i=1}^n(x_i-\theta_0)^2/2]}{(2\pi)^{-n/2}exp[-\sum_{i=1}^n(x_i-\bar{x})^2/2]}\\ &=exp[(-\sum_{i=1}^n(x_i-\theta_0)^2+\sum_{i=1}^n(x_i-\bar{x})^2)/2] \end{split} \tag{15.3} \end{equation} The expression for \lambda(\mathbf{x}) can be simplified by noting that \begin{equation} \sum_{i=1}^n(x_i-\theta_0)^2=\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\theta_0)^2 \tag{15.4} \end{equation} Thus, the LRT statistic is \begin{equation} \lambda(\mathbf{x})=exp[-\frac{n(\bar{x}-\theta_0)^2}{2}] \tag{15.5} \end{equation} An LRT is a test that reject H_0 for small values of \lambda(\mathbf{x}). From (15.5), the rejection region, \{\mathbf{x}:\lambda(\mathbf{x})\leq c\}, can be written as \begin{equation} \{\mathbf{x}:|\bar{x}-\theta_0|\geq\sqrt{\frac{-2\log(c)}{n}}\} \tag{15.6} \end{equation} As c ranges between 0 and 1, \sqrt{\frac{-2\log(c)}{n}} ranges between 0 and \infty. Thus, the LRTs are just those tests that reject H_0:\theta=\theta_0 if the sample mean differs from the hypothesized value \theta_0 by more than a specified amount.
Always try to simplify the rejection region to an expression involving a simpler statistic.

Example 15.2 (Exponential LRT) Let X_1,\cdots,X_n be a random sample from an exponential population with p.d.f. \begin{equation} f(x|\theta)=\left\{\begin{aligned} &e^{-(x-\theta)} &\quad x\geq\theta\\ &0 &\quad x<\theta \end{aligned}\right. \tag{15.7} \end{equation}

where -\infty<\theta<\infty. The likelihood function is \begin{equation} L(\theta|\mathbf{x})=\left\{\begin{aligned} &e^{-\sum_{i=1}^nx_i+n\theta} &\quad \theta\leq\min_ix_i\\ &0 &\quad \theta>\min_ix_i \end{aligned}\right. \tag{15.8} \end{equation}

Consider testing H_0:\theta\leq\theta_0 versus H_1:\theta>\theta_0, where \theta_0 is a value specified by the experimenter. Clearly L(\theta|\mathbf{x}) is an increasing function of \theta on -\infty<\theta<\min_ix_i. Thus, the denominator of \lambda(\mathbf{x}), the unrestricted maximum of L(\theta|\mathbf{x}), is \begin{equation} L(\min_{i}x_i|\mathbf{x})=e^{-\sum_{i=1}^nx_i+n\min_ix_i} \tag{15.9} \end{equation} If \min_ix_i\leq\theta_0, the numerator of \lambda(\mathbf{x}) is also L(\min_{i}x_i|\mathbf{x}). If \theta_0<\min_ix_i, the numerator of \lambda(\mathbf{x}) is L(\theta_0|\mathbf{x}). Therefore, the likelihood ratio test statistic is \begin{equation} \lambda(\mathbf{x})=\left\{\begin{aligned} &1 &\quad \min_ix_i\leq\theta_0\\ &e^{-n(\min_ix_i-\theta_0)} & \quad \min_ix_i>\theta_0 \end{aligned}\right. \tag{15.10} \end{equation} L(\theta_0|\mathbf{x}) decreasing exponentially as \min_ix_i departures \theta_0. An LRT, a test that reject H_0 if \lambda(\mathbf{X})\leq c is a test with rejection region \{\mathbf{x}:\min_ix_i\geq\theta_0-\frac{\log(c)}{n}\}.

If T(\mathbf{X}) is a sufficient statistic for \theta with p.d.f. or p.m.f. g(t|\theta), then we might consider constructing an LRT based on T and its likelihood function L^*(\theta|t)=g(t|\theta) ranther than on the sample \mathbf{X} and its likelihood function L(\theta|\mathbf{x}). Let \lambda^*(t) denote the likelihood ratio test statistic based on T. We shall prove these two tests are equivalent.

The intuition of this idea is all the information about \theta in \mathbf{x} is contained in T(\mathbf{x}).
Theorem 15.1 If T(\mathbf{X}) is a sufficient statistic for \theta and \lambda^*(t) and \lambda(\mathbf{x}) are the LRT statistics based on T and \mathbf{X}, respectively, then \lambda^*(T(\mathbf{x}))=\lambda(\mathbf{x}) for every \mathbf{x} in the sample space.
Proof. From the Factorization Theorem (Theorem (thm:thm05001), the p.d.f. or p.m.f. of \mathbf{X} can be written as f(\mathbf{x}|\theta)=g(T(\mathbf{x})|\theta)h(\mathbf{x}), where g(t|\theta) is the p.d.f. or p.m.f. of T and h(\mathbf{x}) does not depend on \theta. Thus, \begin{equation} \begin{split} \lambda(\mathbf{x})&=\frac{\sup_{\Theta_0}L(\theta|\mathbf{x})}{\sup_{\Theta}L(\theta|\mathbf{x})}= \frac{\sup_{\Theta_0}f(\mathbf{x}|\theta)}{\sup_{\Theta}f(\mathbf{x}|\theta)}\\ &=\frac{\sup_{\Theta_0}g(T(\mathbf{x})|\theta)h(\mathbf{x})}{\sup_{\Theta}g(T(\mathbf{x})|\theta)h(\mathbf{x})}\\ &=\frac{\sup_{\Theta_0}g(T(\mathbf{x})|\theta)}{\sup_{\Theta}g(T(\mathbf{x})|\theta)} =\frac{\sup_{\Theta_0}L^*(\theta|T(\mathbf{x}))}{\sup_{\Theta}L^*(\theta|T(\mathbf{x}))}=\lambda^*(T(\mathbf{x})) \end{split} \tag{15.11} \end{equation}
The simplified expression for \lambda(\mathbf{x}) should depend on \mathbf{x} only through T(\mathbf{x}) if T(\mathbf{X}) is a sufficient statisitc for \theta.

Example 15.3 (LRT and Sufficiency) In Example 15.1, \bar{X} is a sufficient statistic for \theta and \bar{X}\sim N(\theta,\frac{1}{n}). Thus, using the likilihood function associated with \bar{X}, the likelihood ratio test of H_0:\theta=\theta_0 versus H_1:\theta\neq\theta_0 rejects H_0 for large values of |\bar{X}-\theta|.

In Example 15.2, \min_iX_i is a sufficient statistic for \theta. The likelihood function of \min_iX_i is \begin{equation} L^*(\theta|\min_ix_i)=\left\{\begin{aligned} & ne^{-n(\min_ix_i-\theta)} & \quad \theta\leq \min_ix_i\\ & 0 & \quad \theta>\min_ix_i \end{aligned} \right. \tag{15.12} \end{equation} using (15.12) it is easily to see that a likelihood ratio test of H_0:\theta\leq\theta_0 versus H_1:\theta>\theta_0 rejects H_0 for large values of \min_iX_i.

Example 15.4 (Normal LRT with unknown variance) Suppose X_1,\cdots,X_n are a random sample from a N(\mu,\sigma^2), and an experimenter is interested only in inference about \mu, such as testing H_0:\mu\leq\mu_0 versus H_1:\mu>\mu_0. Then the parameter \sigma^2 is a nuisance parameter. The LRT statistic is \begin{equation} \begin{split} \lambda(\mathbf{x})&=\frac{max_{\{\mu,\sigma^2:\mu\leq\mu_0,\sigma^2\geq0\}}L(\mu,\sigma^2|\mathbf{x})} {max_{\{\mu,\sigma^2:-\infty<\mu<\infty,\sigma^2\geq0\}}L(\mu,\sigma^2|\mathbf{x})}\\ &=\frac{max_{\{\mu,\sigma^2:\mu\leq\mu_0,\sigma^2\geq0\}}L(\mu,\sigma^2|\mathbf{x})}{L(\hat{\mu},\hat{\sigma}^2|\mathbf{x})} \end{split} \tag{15.13} \end{equation} where \hat{\mu} and \hat{\sigma}^2 are the MLE of \mu and \sigma^2. Furthermore, if \hat{\mu}\leq\mu_0, then the restricted maximum is the same as the unrestriced maximum, while if \hat{\mu}>\mu_0, the restricted maximum is L(\mu_0,\sigma_0^2|\mathbf{x}), where \sigma_0^2=\sum_{i=1}^n(x_i-\mu_0)^2/n. Thus, \begin{equation} \lambda(\mathbf{x})=\left\{\begin{aligned} & 1 & \quad \hat{\mu}\leq\mu_0 \\ & \frac{L(\mu_0,\hat{\sigma}^2_0|\mathbf{x})}{L(\hat{\mu},\hat{\sigma}^2|\mathbf{x})} & \quad \hat{\mu}>\mu_0 \end{aligned} \right. \tag{15.14} \end{equation} This test based on \lambda(\mathbf{x}) is equivalent to a test based on t distribution.

When having nuisance parameters, (parameters that are present in a model but are not of direct inferential interest), it does not affect the LRT construction method but, might lead to a different trst.

Definition 15.5 (Bayesian Tests) When using posterior distribution to calculate the probabilities that H_0 and H_1 are true, it is referred as Bayesian tests. The posterior probabilities P(\theta\in\Theta_0|\mathbf{x})=P(H_0\,is\,True|\mathbf{x}) and P(\theta\in\Theta_0^c|\mathbf{x})=P(H_1\,is\,True|\mathbf{x}) can be computed. The tester may choose to accept H_0 as true if P(\theta\in\Theta_0|\mathbf{x})\geq P(\theta\in\Theta_0^c|\mathbf{x}). In this case, the test statistic is P(\theta\in\Theta_0^c|\mathbf{x}) and the rejection region is \{\mathbf{x}:P(\theta\in\Theta_0^c|\mathbf{x})>\frac{1}{2}\}. Alternatively, if the tester wishes to guard against falsely rejecting H_0, he may decide to reject H_0 only if P(\theta\in\Theta_0^c|\mathbf{X}) us greater than some large number, 0.99 for example.

Example 15.5 (Normal Bayesian Test) Let X_1,\cdots,X_n be i.i.d. N(\theta,\sigma^2) and let the prior distribution on \theta be N(\mu,\tau^2), where \sigma^2, \mu and \tau^2 are known. Consider testing H_0: \theta\leq\theta_0 versus H_1: \theta>\theta_0. The posterior \pi(\theta|\bar{x}) is normal with mean (n\tau^2\bar{x}+\sigma^2\mu)/(n\tau^2+\sigma^2) and variance \sigma^2\tau^2/(n\tau^2+\sigma^2).

If we decide to accept H_0 if and only if P(\theta\in\Theta_0|\mathbf{X})\geq P(\theta\in\Theta_0^c|\mathbf{X}), then we will accept H_0 if and only if \begin{equation} \frac{1}{2}\leq P(\theta\in\Theta_0|\mathbf{X})=P(\theta\leq\theta_0|\mathbf{X}) \tag{15.15} \end{equation} Since \pi(\theta|\mathbf{x}) is symmetric, this is true if and only if the mean of \pi(\theta|\mathbf{x}) is less than or equal to \theta_0. Therefore, H_0 will be accepted as true if \begin{equation} \bar{X}\leq \theta_0+\frac{\sigma^2(\theta_0-\mu)}{n\tau^2} \tag{15.16} \end{equation} and H_1 will be accepted as true otherwise. In particular, if \mu=\theta_0 so that prior to experimentation probability \frac{1}{2} is assigned to both H_0 and H_1, then H_0 will be accepted as true if \bar{x}\leq \theta_0 and H_1 accepted otherwise.
Tests for complicated null hypotheses can be developed from tests for simpler null hypotheses. The two related methods are Union-Intersection and Intersection-Union tests.
Definition 15.6 (Union-Intersection Method) The union-intersection method of test construction might be useful when the null hypothesis is conveniently expressed as an intersection, \begin{equation} H_0: \theta\in\bigcap_{\gamma\in\Gamma}\Theta_{\gamma} \tag{15.17} \end{equation} Here \Gamma is an arbitrary index set that may be finite or infinite, depending on the problem. Suppose that tests are available for each of the problems of testing H_{0_{\gamma}}:\theta\in\Theta_{\gamma} versus H_{1_{\gamma}}:\theta\in\Theta_{\gamma}^c. Say the rejection region for the test of H_{0_{\gamma}} is \{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\}. Then the rejection region for the union-intersection test is \begin{equation} \bigcup_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\} \tag{15.18} \end{equation}
  • The rationale behind is if any one of the hypotheses H_{0_{\gamma}} is rejected, then H_0 must also be rejected, because by formulation, H_0 is true only if H_{0_{\gamma}} is true for every \gamma.

  • This method is useful when the rejection region of a union-intersection has simple expression. For example, suppose each of the individual tests has a rejection region of the form \{\mathbf{x}:T_{\gamma}(\mathbf{x})>c\} where c does not depend on \gamma. The rejection region for the union-intersection test can then be expressed as \begin{equation} \bigcup_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})>c\}=\{\mathbf{x}:\sup_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x})>c\} \tag{15.19} \end{equation} Thus, the test statistic for testing H_0 is T(\mathbf{x})=\sup_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x}).

Example 15.6 (Normal Union-Intersection Test) Let X_1,\cdots,X_n be a random sample from N(\mu,\sigma^2) population. Consider testing H_0:\mu=\mu_0 versus H_1:\mu\neq\mu_0, where \mu_0 is a specified number. We can write H_0 as the intersection of two sets, \begin{equation} H_0: \{\mu:\mu\leq\mu_0\}\cap\{\mu:\mu\geq\mu_0\} \tag{15.20} \end{equation}

The LRT of H_{0L}:\mu\leq\mu_0 versus H_{1L}:\mu>\mu_0 is reject H_{0L}:\mu\leq\mu_0 in favor of H_{1L}:\mu>\mu_0 if \frac{\bar{X}-\mu_0}{S/\sqrt{n}}\geq t_L. Similarly, the LRT of H_{0U}:\mu\geq\mu_0 versus H_{1U}:\mu<\mu_0 is reject H_{0U}:\mu\geq\mu_0 in favor of H_{1U}:\mu<\mu_0 if \frac{\bar{X}-\mu_0}{S/\sqrt{n}}\leq t_U. Thus, the union-intersection test of H_0:\mu=\mu_0 versus H_1:\mu\neq\mu_0 formed from these two LRTs is reject H_0 if \frac{\bar{X}-\mu_0}{S/\sqrt{n}}\geq t_L or \frac{\bar{X}-\mu_0}{S/\sqrt{n}}\leq t_U. If t_L=-t_U\geq 0, the union-intersection test can be more simply expressed as reject H_0 if \begin{equation} \frac{|\bar{X}-\mu_0|}{S/\sqrt{n}}\geq t_L \tag{15.21} \end{equation}

This is also the LRT for this problem and is called the two-sided t test.

Definition 15.7 (Intersection-Union Method) The intersection-union method may be useful if the null hypothesis is conveniently expressed as a union. Suppose we wish to test the null hypothesis \begin{equation} H_0: \theta\in\bigcup_{\gamma\in\Gamma}\Theta_{\gamma} \tag{15.22} \end{equation} Suppose that for each \gamma\in\Gamma, \{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\} is the rejection region for a test of H_{0_{\gamma}}:\theta\in\Theta_{\gamma} versus H_{1_{\gamma}}:\theta\in\Theta_{\gamma}^c. Then the rejection region for the intersection-union test of H_0 versus H_1 is \begin{equation} \bigcap_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})\in R_{\gamma}\} \tag{15.23} \end{equation}

  • The rationale behind is H_0 is false if and only if all of the H_{0_{\gamma}} are false, so H_0 can be reject if and only if each individual hypotheses H_{0_{\gamma}} can be rejected.

  • This method is useful when the rejection regions for the individual hypotheses are all of the form \{\mathbf{x}:T_{\gamma}(\mathbf{x})\geq c\} where c does not depend on \gamma. The rejection region is then \begin{equation} \bigcap_{\gamma\in\Gamma}\{\mathbf{x}:T_{\gamma}(\mathbf{x})\geq c\}=\{\mathbf{x}:\inf_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x})\geq c\} \tag{15.24} \end{equation} Thus, the test statistic for testing H_0 is T(\mathbf{x})=\inf_{\gamma\in\Gamma}T_{\gamma}(\mathbf{x}), and the test rejects H_0 for large values of this statistic.

Example 15.7 (Acceptance Sampling) Two parameters that are important in assessing the quality of upholstery fabric are \theta_1, the mean breaking strength and \theta_2, the probability of passing a flammability test. Standards may dictate that \theta_1 should be over 50 pounds and \theta_2 should be over 0.95. and the fabric is acceptable only if it meets both these standards. This can be modeled with H_0:\{\theta_1\leq 50\,or\,\theta_2\leq0.95\} versus H_1:\{\theta_1>50\,and\,\theta_2>0.95\} where a batch of material is acceptable only if H_1 is accepted.

Suppose X_1,\cdots,X_n are measurements of breaking strength for n samples and are assumed to be i.i.d. N(\theta_1,\sigma^2). The LRT of H_{01}:\theta_1\leq 50 will be rejected if \frac{\bar{X}-50}{S/\sqrt{n}}>t. Suppose that we also have the results of m flammability tests, denoted by Y_1,\cdots,Y_m, where Y_i=1 if the ith sample passes the test and Y_i=0 otherwise. If Y_1,\cdots,Y_m are modeled as i.i.d. Bernoulli(\theta_2) random variables, the LRT will reject H_{02}:\theta_2\leq 0.95 if \sum_{i=1}^mY_i>b. Putting all of this together, the rejection region for the intersection-union test is \begin{equation} \{(\mathbf{x},\mathbf{y}):\frac{\bar{x}-50}{s/\sqrt{n}}>t\,and\,\sum_{i=1}^my_i>b\} \tag{15.25} \end{equation}

Thus the intersection-union test decides the product is acceptable, that is, H_1 is true, if and only if it decides that each of the individual parameters meets its standard, that is, H_{1i} is true. If more than two parameters define a product’s quality, individual tests for each parameter can be combined, by means of the intersection-union method, to yield an overall test of the product’s quality.