Chapter 17 Most Powerful Tests, Size of Union-Intersection and Intersection-Union Tests, p-Values(Lecture on 02/20/2020)
Example 17.1 (UMP Binomial Test) Let X∼Bin(2,θ). We want to test H0:θ=12 versus H1:θ=34. Calculating the ratios of the p.m.f. gives f(0|θ=34)f(0|θ=12)=14f(1|θ=34)f(1|θ=12)=34f(2|θ=34)f(2|θ=12)=94 If we choose 34<k<94, the Neyman-Perason Lemma says that the test that rejects H0 if X=2 is the UMP level α=P(X=2|θ=12)=14 test. If we choose 14<k<34, the Neyman-Perason Lemma says that the test that rejects H0 if X=1 or 2 is the UMP level α=P(X=1or2|θ=12)=34 test. Choosing k<14 or k>94 yields the UMP level α=1 or level α=0 test.
Note that if k=34, then (16.11) says we must reject H0 for the sample point x=2 and accept H0 for x=0 but leaves our action for x=1 undetermined. If we accept H0 for x=1, we get the UMP level α=14 test as above. If we reject H0 for x=1, we get the UMP level α=34 test as above.Proof. Let β(θ)=Pθ(T>t0) be the power function of the test. Fix θ′>θ0 and consider testing H′0:θ=θ0 versus H′1:θ=θ′. Since the family of p.d.f. or p.m.f. of T has an MLR, β(θ) is nondecreasing, so
sup, and this is a level \alpha test.
- If we define k^{\prime}=\inf_{t\in\mathcal{T}}\frac{g(t|\theta^{\prime})}{g(t|\theta_0)}, where \mathcal{T}=\{t:t>t_0,\,either\,g(t|\theta^{\prime})>0\,or\,g(t|\theta_0)>0)\}, it follows that \begin{equation} T>t_0 \Leftrightarrow \frac{g(t|\theta^{\prime})}{g(t|\theta_0)}>k^{\prime} \tag{17.3} \end{equation} Together with Corollary 16.1, (i) and (ii) imply that \beta(\theta^{\prime})\geq\beta^*(\theta^{\prime}), where \beta^*(\theta) is the power function for any other level \alpha test of H_0^{\prime}, that is, any test satisfying \beta(\theta_0)\leq\alpha. However, any level \alpha test of H_0 satisfies \beta^*(\theta_0)\leq\sup_{\theta\in\Theta_0}\beta^*(\theta)\leq\alpha. Thus, \beta(\theta^{\prime})\geq\beta^*(\theta^{\prime}) for any level \alpha test of H_0. Thus, \beta(\theta^{\prime})\geq \beta^*(\theta^{\prime}) for any level \alpha test of H_0. Since \theta^{\prime} was arbitrary, the test is a UMP level \alpha test.
By an analogous argument, it can be shown that under the conditions of Theorem 17.1, the test that rejects H_0:\theta\geq\theta_0 in favor of H_1:\theta<\theta_0 if and only if T<t_0 is a UMP level \alpha=P_{\theta_0}(T<t_0) test.
Example 17.3 Consider testing H_0^{\prime}:\theta\geq\theta_0 versus H_1^{\prime}:\theta<\theta_0 using the test that rejects H_0^{\prime} if \bar{X}<-\frac{\sigma z_{\alpha}}{\sqrt{n}}+\theta_0. As \bar{X} is sufficient and its distribution has an MLR, it follows from Theorem 17.1 that the test is a UMP level \alpha test in this problem.
As the power function of this test, \begin{equation} \beta(\theta)=P_{\theta}(\bar{X}<-\frac{\sigma z_{\alpha}}{\sqrt{n}}+\theta_0) \tag{17.4} \end{equation} is a decreasing function of \theta (since \theta is a location parameter in the distribution of \bar{X}), the value of \alpha is given by \sup_{\theta\geq\theta_0}\beta(\theta)=\beta(\theta_0)=\alpha.Example 17.4 (Nonexistence of UMP Test) Let X_1,\cdots,X_n be i.i.d. N(\theta,\sigma^2), \sigma^2 known. Consider testing H_0:\theta=\theta_0 versus H_1:\theta\neq\theta_0. For a specified value \alpha, a level \alpha test in this problem is any test that satisfies \begin{equation} P_{\theta_0}(reject\,H_0)\leq \alpha \tag{17.5} \end{equation} Consider an alternative parameter point \theta_1<\theta_0. Tha analysis in Example 17.3 shows that, among all tests that satisfy (17.5), the test that rejects H_0 if \bar{X}<-\frac{\sigma z_{\alpha}}{\sqrt{n}}+\theta_0 has the highest possible power at \theta_1. Call this Test 1. Furthermore, by necessity of the Neyman-Pearson Lemma, any other level \alpha test that has as high a power as Test 1 at \theta_1 must have the same rejection region as Test 1 except possibly for a set A satisfying \int_{A}f(\mathbf{x}|\theta_i)d\mathbf{x}=0. Thus, if a UMP level \alpha test exists for this problem, it must be Test 1 because no other test has as high a power as Test 1 at \theta_1.
Now consider Test 2, which rejects H_0 if \bar{X}>\frac{\sigma z_{\alpha}}{\sqrt{n}}+\theta_0. Test 2 is also a level \alpha test. Let \beta_i(\theta) denote the power function of Test i, for any \theta_2>\theta_0, \begin{equation} \begin{split} \beta_2(\theta_2)&=P_{\theta_2}(\bar{X}>\frac{\sigma z_{\alpha}}{\sqrt{n}}+\theta_0)\\ &=P_{\theta_2}(\frac{\bar{X}-\theta_2}{\sigma/\sqrt{n}}> z_{\alpha}+\frac{\theta_0-\theta_2}{\sigma/\sqrt{n}})\\ &>P(Z>z_{\alpha})=P(Z<-z_{alpha})\\ &>P_{\theta_2}(\frac{\bar{X}-\theta_2}{\sigma/\sqrt{n}}< -z_{\alpha}+\frac{\theta_0-\theta_2}{\sigma/\sqrt{n}})\\ &=P_{\theta_2}(\bar{X}<-\frac{\sigma z_{\alpha}}{\sqrt{n}}+\theta_0)\\ &=\beta_1(\theta_2) \end{split} \tag{17.6} \end{equation}
Thus, Test 1 is not a UMP level \alpha test because Test 2 has a higher power than Test 1 at \theta_2. Earlier we showed that if there were a UMP level \alpha test, it would have to be Test 1. Therefore, no UMP level \alpha test exists in this problem.Example 17.5 (Unbiased Test) When no UMP level \alpha test exists within the class of all tests, we might try to find a UMP level \alpha test within the class of unbiased tests. The power function \beta_3(\theta) of Test 3, whihc rejects H_0:\theta=\theta_0 in favor of H_1:\theta\neq\theta_0 if and only if \bar{X}>\frac{\sigma z_{\alpha/2}}{\sqrt{n}}+\theta_0 or \bar{X}<-\frac{\sigma z_{\alpha/2}}{\sqrt{n}}+\theta_0 as well as \beta_1(\theta) and \beta_2(\theta) in Example 17.4 is shown in Figure 17.1. Test 3 is actually a UMP unbiased level \alpha test.
Note that although Test 1 and Test 2 have slightly higher powers than Test 3 for some parameter points, Test 3 has much higher power than Test 1 and Test 2 at other parameter points. For example, \beta_3(\theta_2) is near 1, whereas \beta_1(\theta_2) is near O. If the interest is in rejecting H_0 for both large and small values of 0, Figure 17.1 shows that Test 3 is better overall than either Test 1 or Test 2.
FIGURE 17.1: Power functions for three tests
Because of the simple way in which they are constructed, the sizes of union-intersection tests (UIT) and intersection union tests (IUT) can often be bounded above by the sizes of some other tests. Such bounds are useful if a level \alpha test is wanted, but the size of the UIT or IUT is too difficult to evaluate.
Let \lambda_{\gamma}(\mathbf{x}) be the LRT statistic for testing H_{0_{\gamma}}:\theta\in\Theta_{\gamma} versus H_{1_{\gamma}}:\theta\in\Theta_{\gamma}^c, and let \lambda(\mathbf{x}) be the LRT statistic for testing H_0:\theta\in\Theta_0 versus H_1:\theta\in\Theta_0^c.
Theorem 17.2 Consider testing H_0:\theta\in\Theta_0 versus H_1:\theta\in\Theta_0^c, where \theta_0=\bigcap_{\gamma\in\Gamma}\Theta_{\gamma} and \lambda_{\gamma}(\mathbf{x}) is defined as previous. Define T(\mathbf{x})=\inf_{\gamma\in\Gamma}\lambda_{\gamma}(\mathbf{x}), and form the UIT with rejection region \{\mathbf{x}:\lambda_{\gamma}(\mathbf{x})<c,\gamma\in\Gamma\}=\{\mathbf{x}:T(\mathbf{x})<c\}. Also consider the usual LRT with rejection region \{\mathbf{x}:\lambda(\mathbf{x})<c\}. Then
T(\mathbf{x})\geq \lambda(\mathbf{x}) for every \mathbf{x};
If \beta_T(\theta) and \beta_{\lambda}(\theta) are the power functions for the tests based on T and \lambda, respectively, then \beta_T(\theta)\leq\beta_{\lambda}(\theta) for every \theta\in\Theta;
- If the LRT is a level \alpha test, then the UIT is a level \alpha test.
Theorem 17.2 says the LRT is uniformly more powerful than the UIT. Why should we consider UIT?
UIT has a smaller Type I Error probability for every \theta\in\Theta_0.
- If H_0 is rejected, we may wish to look at the individual tests of H_{0_{\gamma}} to see why.
Typically the individual rejection regions R_{\gamma} are choosen so that \alpha_{\gamma}=\alpha for all \gamma. In such case, the resulting IUT is a level \alpha test.
Theorem 17.2 applies only to UITs constructed from likelihood ratio tests. Theorem 17.3 applies to any IUT.
- The bound in Theorem 17.2 is the size of the LRT, which may be difficult to compute. In Theorem 17.3, any test of H_{0_{\gamma}} with known size \alpha_{\gamma} can be used and then then the upper bound on the size of the IUT is given in terms of the known sizes \alpha_{\gamma},\gamma\in\Gamma.
Theorem 17.4 Consider testing H_0:\theta\in\bigcup_{j=1}^k\Theta_j, where k is a finite positive integer. For each j=1,\cdots,k, let R_j be the rejection region of a level \alpha test of H_{0_j}. Suppose that for some i=1,\cdots,k, there exists a sequence of parameter points, \theta_l\in\Theta_i, l=1,2,\cdots, such that
\lim_{l\to\infty}P_{\theta_l}(\mathbf{X}\in R_i)=\alpha,
for each j=1,\cdots,k, j\neq i, \lim_{l\to\infty}P_{\theta_l}(\mathbf{X}\in R_j)=1.
Proof. By Theorem 17.3, R is a level \alpha test, that is \begin{equation} \sup_{\theta\in\Theta_0}P_{\theta}(\mathbf{X}\in R)\leq\alpha \tag{17.10} \end{equation}
But, because all the parameter points \theta_l satisfy \theta_l\in\Theta_i\subset\Theta_0, \begin{equation} \begin{split} \sup_{\theta\in\Theta_0}P_{\theta}(\mathbf{X}\in R)&\geq \lim_{l\to\infty}P_{\theta_l}(\mathbf{X}\in R)\\ =\lim_{l\to\infty}P_{\theta_l}(\mathbf{X}\in\bigcap_{j=1}^kR_j)\\ &\geq \lim_{l\to\infty}]\sum_{j=1}^kP_{\theta_l}(\mathbf{X}\in R_j)-(k-1) \quad (Bonferroni\, Inequality)\\ &=(k-1)+\alpha-(k-1)=\alpha \end{split} \tag{17.11} \end{equation}
This and (17.10) imply the test has size exactly equal to \alpha.If p(\mathbf{X}) is a valid p-value, it is easy to construct a level \alpha test based on p(\mathbf{X}). The test that rejects H_0 if and only if p(\mathbf{X})\leq \alpha is a level \alpha test because of (17.12).
An advantage to reporting a test result via a p-value is that each reader can choose the \alpha he or she considers appropriate and then can compare the reported p(x) to \alpha and know whether these data lead to acceptance or rejection of H_0.
- The smaller the p-value, the stronger the evidence for rejecting H_0. Hence, a p-value reports the results of a test on a more continuous scale, rather than just the dichotomous decision “Accept H_0” or “Reject H_0”.
The inequality in the last line is true becasue \mu_0\geq\mu and \frac{\mu_0-\mu}{S/\sqrt{n}} is a nonnegative random variable. The subscript on P is dropped because the probability does not depend on (\mu,\sigma). Furthermore, \begin{equation} P(T_{n-1}\geq W(\mathbf{x}))=P_{\mu_0,\sigma}(\frac{\bar{X}-\mu_0}{S/\sqrt{n}}\geq W(\mathbf{x}))=P_{\mu_0,\sigma}(W(\mathbf{X})\geq W(\mathbf{x})) \tag{17.17} \end{equation} beacuse \frac{\bar{X}-\mu_0}{S/\sqrt{n}}\sim T_{n-1} given \theta=\theta_0 and this probability is one of those considered in the calculation of the supremum in (17.13) because (\mu_0,\sigma)\in\Theta_0. Thus, the p-value from (17.13) for this one-sided t test is p(\mathbf{x})=P(T_{n-1}\geq W(\mathbf{x}))=P(T_{n-1}\geq \frac{\bar{x}-\mu_0}{s/\sqrt{n}}).
Corollary 17.1 (p-Value Conditioning on Sufficient Statistic) Another method for definition a valid p-value involves conditioning on a sufficient statistic. Suppose S(\mathbf{X}) is a sufficient statisitc for the model \{f(\mathbf{x}|\theta):\theta\in\Theta_0\}. If the null hypothesis is true, the conditional distribution of \mathbf{X} given S=s does not depend on \theta. Again, let W(\mathbf{X}) denote a test statistic for which large values give evidence that H_1 is true. Then, for each sample point \mathbf{x} define \begin{equation} p(\mathbf{x})=P(W(\mathbf{X})\geq W(\mathbf{x})|S=S(\mathbf{x})) \tag{17.18} \end{equation} Considering only the single distribution that is the conditional distribution of \mathbf{X} given S=s, we see that, for any 0\leq\alpha\leq 1, P(p(\mathbf{X})\leq\alpha|S=s)\leq\alpha. Then, for any \theta\in\Theta_0, unconditionally we have \begin{equation} P_{\theta}(p(\mathbf{X})\leq\alpha)=\sum_{s}P(p(\mathbf{X})\leq\alpha|S=s)P_{\theta}(S=s)\leq\sum_{s}\alpha P_{\theta}(S=s)\leq\alpha \tag{17.19} \end{equation}
Thus, p(\mathbf{X}) defined by (17.18) is a valid p-value.Example 17.9 (Fisher Exact Test) Let S_1 and S_2 be independent observations with S_1\sim Bin(n_1,p_1) and S_2\sim Bin(n_2,p_2). Consider testing H_0:p_1=p_2 versus H_1:p_1>p_2. Under H_0, if we let p denote the common value of p_1=p_2, the joint p.m.f. of (S_1,S_2) is
\begin{equation} \begin{split} f(s_1,s_2|p)&={{n_1} \choose {s_1}}p^{s_1}(1-p)^{n_1-s_1}{{n_2} \choose {s_2}}p^{s_2}(1-p)^{n_2-s_2}\\ &={{n_1} \choose {s_1}}{{n_2} \choose {s_2}}p^{s_1+s_2}(1-p)^{n_1+n_2-(s_1+s_2)} \end{split} \tag{17.20} \end{equation}
Thus, S=S_1+S_2 is a sufficient statistic under H_0. Given the value of S=s, it is reasonable to use S_1 as a test statistic and reject H_0 in favor of H_1 for large values of S_1, because large values of S_1 correspond to small values of S_2=s-S_1. The conditional distribution of S_1 given S=s is HyperGeo(n_1+n_2,n_1,s). Thus, the conditional p-value in (17.18) is \begin{equation} p(s_1,s_2)=\sum_{j=s_1}^{\min\{n_1,s\}}f(j|s) \tag{17.21} \end{equation} the sum of hypergeometric probabilities. The test defined by this p-value is called Fisher Exact Test.