6.1 Introduction
The following example serves to illustrate the main philosophy in hypothesis testing.
Example 6.1 A pharmaceutical company suspects that a given drug in testing produces an increment of the ocular tension as a secondary effect. The basis tension level has mean 15 and the suspected increment is towards a mean of 18 units. This increment may increase the risk of suffering glaucoma. Since the drug has a positive primary effect, it is important to check if this suspicion is true before moving forward on the commercialization of the drug.
Medical trials have established that, at a population level, the ocular tension tends to follow a N(15,1) distribution. The suspicion about the secondary effect then means that the ocular tension among the takers of the drug would be N(18,1). We denote by X∼N(μ,1) to the rv “ocular tension of the takers of the drug”. Then, the question the pharmaceutical company faces is to decide whether μ=15 (the drug has no secondary effect) or μ=18 (the drug has a secondary effect) based on empirical evidence. To formalize this, we define:
{H0:μ=15(null hypothesis)H1:μ=18(alternative hypothesis)
Then, we want to find evidence in favor of H0 or of H1. But, since the drug has been proved effective, the pharmaceutical company is only willing to stop its commercialization if there is enough evidence against H0 (or in favor of H1), that is, if there is enough evidence pointing to the presence of a secondary effect. Therefore, the roles of H0 and H1 are not symmetric. H0 is a “stickier” belief than H1 for the pharmaceutical company.
In order to look for evidence against H0 or in favor of H1, a sample of the ocular tension level of four drug takers is measured, and its sample mean ˉX is computed. Then, the following is verified:
- If H0 is true, then ˉX∼N(15,1/4).
- If H1 is true, then ˉX∼N(18,1/4).
Then, if we obtain a small value of ˉX, we will have little or no evidence in favor of H1 and believe H0. If we obtain a large value of ˉX, then the sample supports H1. But, up to which value of ˉX=k1 are we willing to believe in H0?
This question can be answered in the following way: we can limit the possibility of incorrectly stopping the commercialization of the drug to a small value α, i.e.,
P(Reject H0|H0 true)≤α,
and choose the constant k1 that satisfies this, that is, choose the smaller k1 that verifies
P(Reject H0|H0 true)=P(ˉX>k1|μ=15)≤α.
Standardizing ˉX, we obtain the standard normal distribution. Then k1 directly follows from the quantile of such a distribution:
P(ˉX>k1|μ=15)=P(2(ˉX−15)>2(k1−15))=P(Z>2(k1−15))≤α.
From here, we have that 2(k1−15)=zα, so k1=15+zα/2. For example, if we take α=0.05, then zα≈1.645 and k1≈15+1.645/2=15.8225. Therefore, if we obtain ˉX>15.8225 and H0 was true, the obtained sample would belong to this “extreme” set that only has probability α=0.05. This implies that one of the following two possibilities is happening if the event {ˉX>k1} happens:
- either H0 is true but the obtained sample was “extreme” and is not very representative of its distribution;
- or H0 is not true because an event with low probability if H0 was true just happened.
Following the logic of believing H0 or not based on how likely is the realization of ˉX under the assumption of veracity of H0, a rule to make a decision on the veracity of H0 is the following:
- “Reject H0” if ˉX>k1, as it is unlikely that the event {ˉX>k1} happened if H0 is true;
- “Do not reject H0” if ˉX≤k1, as it is likely that the event {ˉX≤k1} happened if H0 is true.
The level α determines the probability of the data-dependent extreme event that is inconsistent with the veracity of H0 and that triggers its rejection. A choice of α=0.05 by the pharmaceutical company implies that the drug commercialization will only be stopped if the outcome of the ocular tension level test is only 5% likely under the assumption that the drug has no secondary effect.
Example 6.2 We could make the same reasoning that we have made in Exercise 2.15, now with respect to H1. For small values of ˉX, we would think that H1 is not true. But, up to which value ˉX=k2 are we willing to believe H1?
If we fix again a bound for the probability of committing an error (in this case, allowing the commercialization of the drug while it has secondary effects), β, that is,
P(Reject H1|H1 true)≤β,
then we will choose the larger constant k2 that verifies that relation, that is, that verifies
P(Reject H1|H1 true)=P(ˉX≤k2|μ=18)≤β.
Standardizing ˉX in the previous probability, we obtain
P(ˉX≤k2|μ=18)=P(2(ˉX−18)≤2(k2−18)|μ=18)=P(Z≤2(k2−18))≤β,
in such a way that 2(k2−18)=−zβ, so k2=18−zβ/2. Taking β=0.05, we have zβ≈1.645, and k2≈18−1.645/2=17.1775.
Then, following this argument and joining with that for H0 done in Exercise 2.15, the decision would be:
- If ˉX≤k1≈15.8225, then we accept H0.
- If ˉX≥k2≈17.1775, then we accept H1.
The following question arises immediately: what shall we do if 15.8225<ˉX<17.1775? Also, imagine that instead of 15 units, the basis level of ocular tension ocular was 16.5 units. Then 16.5+1.645/2=17.3225>k2 and we will be accepting H0 and H1 at the same time! These inconsistencies point out towards focusing on choosing just a single value k from which to make a decision. But in this case, only one of the probabilities for the two types of error, α and β, can be controlled.
If we decrease α too much, β will increase. In addition, it may happen that ˉX>k, so we would not have evidence against H0, but that the sample is neither representative of H1. It may also happen that ˉX≤k, so we will not have evidence against H0, but that however the sample is more representative of H1 than of H0. Therefore, if we want to control α, in the first place we have to fix the null hypothesis H0 as the most conservative statement, that is, the statement that will be assumed as true unless there is enough evidence against it. As a consequence, the decision to take is going to be one of the following:
- “Reject H0” if ˉX>k1 (without commitment to accept H1);
- “Do not reject H0” if ˉX≤k1 (without commitment to reject H1, which could be valid).
In general, through this section we assume a rv X with distribution within the family of distributions {F(⋅;θ):θ∈Θ} for whom we want to determine the validity of a statement H0 about the parameter θ against an alternative statement H1. Splitting the parametric space as Θ=Θ0∪Θ1∪Θ2, where typically74 Θ1=ˉΘ0 and Θ2=∅, the hypotheses to test are of the form
H0:θ∈Θ0vs.H1:θ∈Θ1.
Recall that a statement about the unknown parameter θ is equivalent to a statement about the distribution F(⋅;θ).
Definition 6.1 (Null and alternative hypotheses) The null hypothesis (denoted by H0) is the statement that is assumed true unless there is enough evidence against it. The confronting statement is the alternative hypothesis (denoted by H1).
Definition 6.2 (Simple and composite hypotheses) If the set Θ0⊂Θ that determines the hypothesis H0 contains a single element,75 then H0 is said to be a simple hypothesis. Otherwise, H0 is referred to as a composite hypothesis.
The decision in favor or against H0 is made from the information available in the realization of a srs (X1,…,Xn) of X.
Definition 6.3 (Test) A test or a hypothesis test of H0 vs. H1 is a function φ:Rn→{0,1}, where 1 stands for “reject H0 in favor of H1” and 0 for “do not reject H0”, of the form
φ(x1,…,xn)={1if (x1,…,xn)′∈C,0if (x1,…,xn)′∈ˉC,
where C and ˉC provide a partition of the sample space Rn. The set C is denoted as the critical region or rejection region and ˉC is the acceptance region.
A hypothesis test is entirely determined by the critical region C. In principle, there exist infinitely many tests for testing a hypothesis at hand. The selection of a particular test is done according to the test reliability, that is, according to its “success rate”. The possible consequences — with respect to the reality about H0, which is unknown — of a test decision are given in the following table:
Test decision ∖ Reality | H0 true | H0 false |
---|---|---|
Do not reject H0 | Correct decision | Type II error |
Reject H0 | Type I error | Correct decision |
Then, the reliability of a hypothesis test is quantified and assessed in terms of the two possible types of errors:
Error | Interpretation |
---|---|
Type I error | Reject H0 if it is true |
Type II error | Do not reject H0 if it is false |
As illustrated in Examples 6.1 and 6.2, the classical procedure for selecting a test among all the available ones is the following:
Fix a bound, α, for the probability of committing the Type I error. This bound is the significance level of the hypothesis test.
Exclude all the tests with critical regions C that do not respect the bound for the Type I error, that is, that do not satisfy the condition P(Reject H0|H0 true)=P((X1,…,Xn)′∈C|H0 true)≤α.
Among the selected tests, choose the one that has a critical region C that minimizes the Type II error, that is, that minimizes β=P(Do not reject H0|H1 true)=P((X1,…,Xn)′∈ˉC|H1 true).
Instead of determining the critical region C directly as a subset of Rn, it is simpler to determine it through a statistic of the sample and express the critical region as a subset of the range of a statistic. Then, a test statistic will determine the rejection region of the test.
Definition 6.4 (Test statistic) A test statistic of a hypothesis H0 vs. H1 is a measurable function of the sample that involves H0 and under H0 has a known or approximable distribution.
A test statistic can be obtained usually by taking an estimator of the unknown parameter that is involved in H0 and transforming it in such a way that has a usable distribution under H0.
Summarizing what has been presented until now, the key elements of a hypothesis test are:
- a null hypothesis H0 and an alternative H1;
- a significance level α;
- a test statistic;
- and a critical region C.
The following practical takeaways are implied by the definitions of H0 and H1 and are relevant for deciding how to assign H0 and H1 when carrying out a hypothesis test.
When deciding between how to assign a research question to H0 or H1 in a real application, remember:
- H1 is used for the statement that the researcher is “interested” in proving through the sample.
- H0 is the de facto statement that is assumed true unless there is enough evidence against it in the sample. It cannot be proved using a hypothesis test – at most it is not rejected (but not accepted either).
If we are interesting in testing the veracity of “Statement”, it is stronger to
- reject H0:¯Statement in favor of H1:Statement than
- do not reject H0:Statement vs. H1:¯Statement.
Example 6.3 A political poll is made in order to know the voting intentions of the electorate regarding two candidates, A and B, and, specifically, if candidate A will win the elections. For that purpose, the number of voters Y who will vote for candidate A within a sample of n=15 voters was recorded. The associated hypothesis test for this problem is
H0:p=0.5vs.H1:p>0.5,
where p denotes the proportion of voters in favor of A. If Y is the test statistic and the rejection region is set as C={y≥12}, compute the probability of the Type I error for the test, α.
The probability of the Type I error is
α=P(Type error I)=P(Reject H0|H0 true)=P(Y≥12|p=0.5).
Since Y∼Bin(n,0.5), the previous probability is
\begin{align*} \alpha=\sum_{y=12}^{15} \binom{15}{y}(0.5)^{15}\approx0.0176. \end{align*}
Example 6.4 Assume that the real proportion of voters for the candidate A is p=0.6 (so H_0 is false). What is the probability that in the test of Example 6.3 we obtain that candidate A will not win (H_0 is not rejected)?
The probability of Type II error is
\begin{align*} \beta=\mathbb{P}(\text{Type error II})=\mathbb{P}(\text{Do not reject $H_0$}|\text{$H_0$ false}). \end{align*}
In this case, we want to compute the value of \beta for p=0.6, that is,
\begin{align*} \beta&=\mathbb{P}(Y<12|p=0.6)=\sum_{y=0}^{11} \binom{15}{y}(0.6)^y(0.4)^{15-y}\approx0.9095. \end{align*}
Then, if we employ that rejection region, the test would most likely conclude that candidate A will win, either the candidate is indeed going to win or not at all. The test is conservative due to the small sample size n and \alpha=0.0176.76
For the so-called two-sided tests, to be introduced in Section 6.2, \Theta_0=\{\theta_0\}, \Theta_1=\bar{\Theta}_0, and \Theta_2=\emptyset. However, for the so-called one-sided tests, \Theta_0=\{\theta_0\}, \Theta_1=(\theta_0,+\infty), and \Theta_2=(-\infty, \theta_0) (or \Theta_1=(-\infty,\theta_0) and \Theta_2=(\theta_0,+\infty)).↩︎
Therefore, under H_0 the distribution of X is completely known.↩︎
What will be \alpha and \beta for the rejection region C=\{Y\geq 10\}?↩︎