6.1 Introduction

The following example serves to illustrate the main philosophy in hypothesis testing.

Example 6.1 A pharmaceutical company suspects that a given drug in testing produces an increment of the ocular tension as a secondary effect. The basis tension level has mean \(15\) and the suspected increment is towards a mean of \(18\) units. This increment may increase the risk of suffering glaucoma. Since the drug has a positive primary effect, it is important to check if this suspicion is true before moving forward on the commercialization of the drug.

Medical trials have established that, at a population level, the ocular tension tends to follow a \(\mathcal{N}(15,1)\) distribution. The suspicion about the secondary effect then means that the ocular tension among the takers of the drug would be \(\mathcal{N}(18,1).\) We denote by \(X\sim\mathcal{N}(\mu,1)\) to the rv “ocular tension of the takers of the drug”. Then, the question the pharmaceutical company faces is to decide whether \(\mu=15\) (the drug has no secondary effect) or \(\mu=18\) (the drug has a secondary effect) based on empirical evidence. To formalize this, we define:

\[\begin{align*} \begin{cases} H_0:\mu=15 & \text{(null hypothesis)} \\ H_1:\mu=18 & \text{(alternative hypothesis)} \end{cases} \end{align*}\]

Then, we want to find evidence in favor of \(H_0\) or of \(H_1.\) But, since the drug has been proved effective, the pharmaceutical company is only willing to stop its commercialization if there is enough evidence against \(H_0\) (or in favor of \(H_1\)), that is, if there is enough evidence pointing to the presence of a secondary effect. Therefore, the roles of \(H_0\) and \(H_1\) are not symmetric. \(H_0\) is a “stickier” belief than \(H_1\) for the pharmaceutical company.

In order to look for evidence against \(H_0\) or in favor of \(H_1,\) a sample of the ocular tension level of four drug takers is measured, and its sample mean \(\bar{X}\) is computed. Then, the following is verified:

  • If \(H_0\) is true, then \(\bar{X}\sim \mathcal{N}(15,1/4).\)
  • If \(H_1\) is true, then \(\bar{X}\sim \mathcal{N}(18,1/4).\)

Then, if we obtain a small value of \(\bar{X},\) we will have little or no evidence in favor of \(H_1\) and believe \(H_0.\) If we obtain a large value of \(\bar{X},\) then the sample supports \(H_1.\) But, up to which value of \(\bar{X}=k_1\) are we willing to believe in \(H_0\)?

This question can be answered in the following way: we can limit the possibility of incorrectly stopping the commercialization of the drug to a small value \(\alpha,\) i.e.,

\[\begin{align*} \mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})\leq \alpha, \end{align*}\]

and choose the constant \(k_1\) that satisfies this, that is, choose the smaller \(k_1\) that verifies

\[\begin{align*} \mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})=\mathbb{P}(\bar{X}>k_1|\mu=15)\leq \alpha. \end{align*}\]

Standardizing \(\bar{X},\) we obtain the standard normal distribution. Then \(k_1\) directly follows from the quantile of such a distribution:

\[\begin{align*} \mathbb{P}(\bar{X}> k_1|\mu=15)=\mathbb{P}(2(\bar{X}-15)> 2(k_1-15))=\mathbb{P}(Z> 2(k_1-15))\leq \alpha. \end{align*}\]

From here, we have that \(2(k_1-15)=z_{\alpha},\) so \(k_1=15+z_{\alpha}/2.\) For example, if we take \(\alpha=0.05,\) then \(z_{\alpha}\approx1.645\) and \(k_1\approx15+1.645/2=15.8225.\) Therefore, if we obtain \(\bar{X}>15.8225\) and \(H_0\) was true, the obtained sample would belong to this “extreme” set that only has probability \(\alpha=0.05.\) This implies that one of the following two possibilities is happening if the event \(\{\bar{X}>k_1\}\) happens:

  1. either \(H_0\) is true but the obtained sample was “extreme” and is not very representative of its distribution;
  2. or \(H_0\) is not true because an event with low probability if \(H_0\) was true just happened.

Following the logic of believing \(H_0\) or not based on how likely is the realization of \(\bar{X}\) under the assumption of veracity of \(H_0,\) a rule to make a decision on the veracity of \(H_0\) is the following:

  • “Reject \(H_0\)” if \(\bar{X}>k_1,\) as it is unlikely that the event \(\{\bar{X}>k_1\}\) happened if \(H_0\) is true;
  • “Do not reject \(H_0\)” if \(\bar{X}\leq k_1,\) as it is likely that the event \(\{\bar{X}\leq k_1\}\) happened if \(H_0\) is true.

The level \(\alpha\) determines the probability of the data-dependent extreme event that is inconsistent with the veracity of \(H_0\) and that triggers its rejection. A choice of \(\alpha=0.05\) by the pharmaceutical company implies that the drug commercialization will only be stopped if the outcome of the ocular tension level test is only \(5\%\) likely under the assumption that the drug has no secondary effect.

Example 6.2 We could make the same reasoning that we have made in Exercise 2.15, now with respect to \(H_1.\) For small values of \(\bar{X},\) we would think that \(H_1\) is not true. But, up to which value \(\bar{X}=k_2\) are we willing to believe \(H_1\)?

If we fix again a bound for the probability of committing an error (in this case, allowing the commercialization of the drug while it has secondary effects), \(\beta,\) that is,

\[\begin{align*} \mathbb{P}(\text{Reject $H_1$}|\text{$H_1$ true})\leq \beta, \end{align*}\]

then we will choose the larger constant \(k_2\) that verifies that relation, that is, that verifies

\[\begin{align*} \mathbb{P}(\text{Reject $H_1$}|\text{$H_1$ true})=\mathbb{P}(\bar{X}\leq k_2|\mu=18)\leq \beta. \end{align*}\]

Standardizing \(\bar{X}\) in the previous probability, we obtain

\[\begin{align*} \mathbb{P}(\bar{X}\leq k_2|\mu=18)&=\mathbb{P}(2(\bar{X}-18)\leq 2(k_2-18)|\mu=18)\\ &=\mathbb{P}(Z\leq 2(k_2-18))\leq \beta, \end{align*}\]

in such a way that \(2(k_2-18)=-z_{\beta},\) so \(k_2=18-z_{\beta}/2.\) Taking \(\beta=0.05,\) we have \(z_{\beta}\approx1.645,\) and \(k_2\approx18-1.645/2=17.1775.\)

Then, following this argument and joining with that for \(H_0\) done in Exercise 2.15, the decision would be:

  • If \(\bar{X}\leq k_1\approx 15.8225,\) then we accept \(H_0.\)
  • If \(\bar{X}\geq k_2\approx 17.1775,\) then we accept \(H_1.\)

The following question arises immediately: what shall we do if \(15.8225<\bar{X}<17.1775\)? Also, imagine that instead of \(15\) units, the basis level of ocular tension ocular was \(16.5\) units. Then \(16.5+1.645/2=17.3225>k_2\) and we will be accepting \(H_0\) and \(H_1\) at the same time! These inconsistencies point out towards focusing on choosing just a single value \(k\) from which to make a decision. But in this case, only one of the probabilities for the two types of error, \(\alpha\) and \(\beta,\) can be controlled.

If we decrease \(\alpha\) too much, \(\beta\) will increase. In addition, it may happen that \(\bar{X}>k,\) so we would not have evidence against \(H_0,\) but that the sample is neither representative of \(H_1.\) It may also happen that \(\bar{X}\leq k,\) so we will not have evidence against \(H_0,\) but that however the sample is more representative of \(H_1\) than of \(H_0.\) Therefore, if we want to control \(\alpha,\) in the first place we have to fix the null hypothesis \(H_0\) as the most conservative statement, that is, the statement that will be assumed as true unless there is enough evidence against it. As a consequence, the decision to take is going to be one of the following:

  • “Reject \(H_0\)” if \(\bar{X}>k_1\) (without commitment to accept \(H_1\));
  • “Do not reject \(H_0\)” if \(\bar{X}\leq k_1\) (without commitment to reject \(H_1,\) which could be valid).

In general, through this section we assume a rv \(X\) with distribution within the family of distributions \(\{F(\cdot;\theta):\theta\in\Theta\}\) for whom we want to determine the validity of a statement \(H_0\) about the parameter \(\theta\) against an alternative statement \(H_1.\) Splitting the parametric space as \(\Theta=\Theta_0\cup\Theta_1\cup\Theta_2,\) where typically74 \(\Theta_1=\bar{\Theta}_0\) and \(\Theta_2=\emptyset,\) the hypotheses to test are of the form

\[\begin{align*} H_0:\theta\in\Theta_0 \quad \text{vs.}\quad H_1:\theta\in\Theta_1. \end{align*}\]

Recall that a statement about the unknown parameter \(\theta\) is equivalent to a statement about the distribution \(F(\cdot;\theta).\)

Definition 6.1 (Null and alternative hypotheses) The null hypothesis (denoted by \(H_0\)) is the statement that is assumed true unless there is enough evidence against it. The confronting statement is the alternative hypothesis (denoted by \(H_1\)).

Definition 6.2 (Simple and composite hypotheses) If the set \(\Theta_0\subset \Theta\) that determines the hypothesis \(H_0\) contains a single element,75 then \(H_0\) is said to be a simple hypothesis. Otherwise, \(H_0\) is referred to as a composite hypothesis.

The decision in favor or against \(H_0\) is made from the information available in the realization of a srs \((X_1,\ldots,X_n)\) of \(X.\)

Definition 6.3 (Test) A test or a hypothesis test of \(H_0\) vs. \(H_1\) is a function \(\varphi:\mathbb{R}^n\to\{0,1\},\) where \(1\) stands for “reject \(H_0\) in favor of \(H_1\)” and \(0\) for “do not reject \(H_0\)”, of the form

\[\begin{align*} \varphi(x_1,\ldots,x_n)=\begin{cases} 1 & \text{if}\ (x_1,\ldots,x_n)'\in C,\\ 0 & \text{if}\ (x_1,\ldots,x_n)'\in \bar{C}, \end{cases} \end{align*}\]

where \(C\) and \(\bar{C}\) provide a partition of the sample space \(\mathbb{R}^n.\) The set \(C\) is denoted as the critical region or rejection region and \(\bar{C}\) is the acceptance region.

A hypothesis test is entirely determined by the critical region \(C.\) In principle, there exist infinitely many tests for testing a hypothesis at hand. The selection of a particular test is done according to the test reliability, that is, according to its “success rate”. The possible consequences — with respect to the reality about \(H_0,\) which is unknown — of a test decision are given in the following table:

Test decision \(\backslash\) Reality \(H_0\) true \(H_0\) false
Do not reject \(H_0\) Correct decision Type II error
Reject \(H_0\) Type I error Correct decision

Then, the reliability of a hypothesis test is quantified and assessed in terms of the two possible types of errors:

Error Interpretation
Type I error Reject \(H_0\) if it is true
Type II error Do not reject \(H_0\) if it is false

As illustrated in Examples 6.1 and 6.2, the classical procedure for selecting a test among all the available ones is the following:

  1. Fix a bound, \(\alpha,\) for the probability of committing the Type I error. This bound is the significance level of the hypothesis test.

  2. Exclude all the tests with critical regions \(C\) that do not respect the bound for the Type I error, that is, that do not satisfy the condition \[\begin{align*} \mathbb{P}(\text{Reject $H_0$}|H_0\ \text{true})=\mathbb{P}((X_1,\ldots,X_n)'\in C|\text{$H_0$ true})\leq \alpha. \end{align*}\]

  3. Among the selected tests, choose the one that has a critical region \(C\) that minimizes the Type II error, that is, that minimizes \[\begin{align*} \beta=\mathbb{P}(\text{Do not reject $H_0$}|\text{$H_1$ true})=\mathbb{P}((X_1,\ldots,X_n)'\in \bar{C}|H_1 \ \text{true}). \end{align*}\]

Instead of determining the critical region \(C\) directly as a subset of \(\mathbb{R}^n,\) it is simpler to determine it through a statistic of the sample and express the critical region as a subset of the range of a statistic. Then, a test statistic will determine the rejection region of the test.

Definition 6.4 (Test statistic) A test statistic of a hypothesis \(H_0\) vs. \(H_1\) is a measurable function of the sample that involves \(H_0\) and under \(H_0\) has a known or approximable distribution.

A test statistic can be obtained usually by taking an estimator of the unknown parameter that is involved in \(H_0\) and transforming it in such a way that has a usable distribution under \(H_0.\)

Summarizing what has been presented until now, the key elements of a hypothesis test are:

  • a null hypothesis \(H_0\) and an alternative \(H_1;\)
  • a significance level \(\alpha;\)
  • a test statistic;
  • and a critical region \(C.\)

The following practical takeaways are implied by the definitions of \(H_0\) and \(H_1\) and are relevant for deciding how to assign \(H_0\) and \(H_1\) when carrying out a hypothesis test.

When deciding between how to assign a research question to \(H_0\) or \(H_1\) in a real application, remember:

  • \(H_1\) is used for the statement that the researcher is “interested” in proving through the sample.
  • \(H_0\) is the de facto statement that is assumed true unless there is enough evidence against it in the sample. It cannot be proved using a hypothesis test – at most it is not rejected (but not accepted either).

If we are interesting in testing the veracity of “Statement”, it is stronger to

  1. reject \(H_0: \overline{\mathrm{Statement}}\) in favor of \(H_1: \mathrm{Statement}\) than
  2. do not reject \(H_0: \mathrm{Statement}\) vs. \(H_1: \overline{\mathrm{Statement}}.\)
In the first case we do find evidence in favor of “Statement”; in the second case we do not find evidence against “Statement”. Above, \(\overline{\mathrm{Statement}}\) is the negation of \(\mathrm{Statement}.\)

Example 6.3 A political poll is made in order to know the voting intentions of the electorate regarding two candidates, A and B, and, specifically, if candidate A will win the elections. For that purpose, the number of voters \(Y\) who will vote for candidate A within a sample of \(n=15\) voters was recorded. The associated hypothesis test for this problem is

\[\begin{align*} H_0:p=0.5\quad \text{vs.}\quad H_1:p>0.5, \end{align*}\]

where \(p\) denotes the proportion of voters in favor of \(A.\) If \(Y\) is the test statistic and the rejection region is set as \(C=\{y\geq 12\},\) compute the probability of the Type I error for the test, \(\alpha.\)

The probability of the Type I error is

\[\begin{align*} \alpha=\mathbb{P}(\text{Type error I})=\mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})=\mathbb{P}(Y\geq 12|p=0.5). \end{align*}\]

Since \(Y\sim \mathrm{Bin}(n,0.5),\) the previous probability is

\[\begin{align*} \alpha=\sum_{y=12}^{15} \binom{15}{y}(0.5)^{15}\approx0.0176. \end{align*}\]

Example 6.4 Assume that the real proportion of voters for the candidate A is \(p=0.6\) (so \(H_0\) is false). What is the probability that in the test of Example 6.3 we obtain that candidate A will not win (\(H_0\) is not rejected)?

The probability of Type II error is

\[\begin{align*} \beta=\mathbb{P}(\text{Type error II})=\mathbb{P}(\text{Do not reject $H_0$}|\text{$H_0$ false}). \end{align*}\]

In this case, we want to compute the value of \(\beta\) for \(p=0.6,\) that is,

\[\begin{align*} \beta&=\mathbb{P}(Y<12|p=0.6)=\sum_{y=0}^{11} \binom{15}{y}(0.6)^y(0.4)^{15-y}\approx0.9095. \end{align*}\]

Then, if we employ that rejection region, the test would most likely conclude that candidate A will win, either the candidate is indeed going to win or not at all. The test is conservative due to the small sample size \(n\) and \(\alpha=0.0176.\)76

  1. For the so-called two-sided tests, to be introduced in Section 6.2, \(\Theta_0=\{\theta_0\},\) \(\Theta_1=\bar{\Theta}_0,\) and \(\Theta_2=\emptyset.\) However, for the so-called one-sided tests, \(\Theta_0=\{\theta_0\},\) \(\Theta_1=(\theta_0,+\infty),\) and \(\Theta_2=(-\infty, \theta_0)\) (or \(\Theta_1=(-\infty,\theta_0)\) and \(\Theta_2=(\theta_0,+\infty)\)).↩︎

  2. Therefore, under \(H_0\) the distribution of \(X\) is completely known.↩︎

  3. What will be \(\alpha\) and \(\beta\) for the rejection region \(C=\{Y\geq 10\}\)?↩︎