6.1 Introduction

The following example serves to illustrate the main philosophy in hypothesis testing.

Example 6.1 A pharmaceutical company suspects that a given drug in testing produces an increment of the ocular tension as a secondary effect. The basis tension level has mean $15$ and the suspected increment is towards a mean of $18$ units. This increment may increase the risk of suffering glaucoma. Since the drug has a positive primary effect, it is important to check if this suspicion is true before moving forward on the commercialization of the drug.

Medical trials have established that, at a population level, the ocular tension tends to follow a $\mathcal{N}(15,1)$ distribution. The suspicion about the secondary effect then means that the ocular tension among the takers of the drug would be $\mathcal{N}(18,1).$ We denote by $X\sim\mathcal{N}(\mu,1)$ to the rv “ocular tension of the takers of the drug”. Then, the question the pharmaceutical company faces is to decide whether $\mu=15$ (the drug has no secondary effect) or $\mu=18$ (the drug has a secondary effect) based on empirical evidence. To formalize this, we define:

$\begin{align*} \begin{cases} H_0:\mu=15 & \text{(null hypothesis)} \\ H_1:\mu=18 & \text{(alternative hypothesis)} \end{cases} \end{align*}$

Then, we want to find evidence in favor of $H_0$ or of $H_1.$ But, since the drug has been proved effective, the pharmaceutical company is only willing to stop its commercialization if there is enough evidence against $H_0$ (or in favor of $H_1$ ), that is, if there is enough evidence pointing to the presence of a secondary effect. Therefore, the roles of $H_0$ and $H_1$ are not symmetric. $H_0$ is a “stickier” belief than $H_1$ for the pharmaceutical company.

In order to look for evidence against $H_0$ or in favor of $H_1,$ a sample of the ocular tension level of four drug takers is measured, and its sample mean $\bar{X}$ is computed. Then, the following is verified:

If $H_0$ is true, then $\bar{X}\sim \mathcal{N}(15,1/4).$
If $H_1$ is true, then $\bar{X}\sim \mathcal{N}(18,1/4).$

Then, if we obtain a small value of $\bar{X},$ we will have little or no evidence in favor of $H_1$ and believe $H_0.$ If we obtain a large value of $\bar{X},$ then the sample supports $H_1.$ But, up to which value of $\bar{X}=k_1$ are we willing to believe in $H_0$ ?

This question can be answered in the following way: we can limit the possibility of incorrectly stopping the commercialization of the drug to a small value $\alpha,$ i.e.,

$\begin{align*} \mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})\leq \alpha, \end{align*}$

and choose the constant $k_1$ that satisfies this, that is, choose the smaller $k_1$ that verifies

$\begin{align*} \mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})=\mathbb{P}(\bar{X}>k_1|\mu=15)\leq \alpha. \end{align*}$

Standardizing $\bar{X},$ we obtain the standard normal distribution. Then $k_1$ directly follows from the quantile of such a distribution:

$\begin{align*} \mathbb{P}(\bar{X}> k_1|\mu=15)=\mathbb{P}(2(\bar{X}-15)> 2(k_1-15))=\mathbb{P}(Z> 2(k_1-15))\leq \alpha. \end{align*}$

From here, we have that $2(k_1-15)=z_{\alpha},$ so $k_1=15+z_{\alpha}/2.$ For example, if we take $\alpha=0.05,$ then $z_{\alpha}\approx1.645$ and $k_1\approx15+1.645/2=15.8225.$ Therefore, if we obtain $\bar{X}>15.8225$ and $H_0$ was true, the obtained sample would belong to this “extreme” set that only has probability $\alpha=0.05.$ This implies that one of the following two possibilities is happening if the event $\{\bar{X}>k_1\}$ happens:

either $H_0$ is true but the obtained sample was “extreme” and is not very representative of its distribution;
or $H_0$ is not true because an event with low probability if $H_0$ was true just happened.

Following the logic of believing $H_0$ or not based on how likely is the realization of $\bar{X}$ under the assumption of veracity of $H_0,$ a rule to make a decision on the veracity of $H_0$ is the following:

“Reject $H_0$ ” if $\bar{X}>k_1,$ as it is unlikely that the event $\{\bar{X}>k_1\}$ happened if $H_0$ is true;
“Do not reject $H_0$ ” if $\bar{X}\leq k_1,$ as it is likely that the event $\{\bar{X}\leq k_1\}$ happened if $H_0$ is true.

The level $\alpha$ determines the probability of the data-dependent extreme event that is inconsistent with the veracity of $H_0$ and that triggers its rejection. A choice of $\alpha=0.05$ by the pharmaceutical company implies that the drug commercialization will only be stopped if the outcome of the ocular tension level test is only $5\%$ likely under the assumption that the drug has no secondary effect.

Example 6.2 We could make the same reasoning that we have made in Exercise 2.15, now with respect to $H_1.$ For small values of $\bar{X},$ we would think that $H_1$ is not true. But, up to which value $\bar{X}=k_2$ are we willing to believe $H_1$ ?

If we fix again a bound for the probability of committing an error (in this case, allowing the commercialization of the drug while it has secondary effects), $\beta,$ that is,

$\begin{align*} \mathbb{P}(\text{Reject $H_1$}|\text{$H_1$ true})\leq \beta, \end{align*}$

then we will choose the larger constant $k_2$ that verifies that relation, that is, that verifies

$\begin{align*} \mathbb{P}(\text{Reject $H_1$}|\text{$H_1$ true})=\mathbb{P}(\bar{X}\leq k_2|\mu=18)\leq \beta. \end{align*}$

Standardizing $\bar{X}$ in the previous probability, we obtain

$\begin{align*} \mathbb{P}(\bar{X}\leq k_2|\mu=18)&=\mathbb{P}(2(\bar{X}-18)\leq 2(k_2-18)|\mu=18)\\ &=\mathbb{P}(Z\leq 2(k_2-18))\leq \beta, \end{align*}$

in such a way that $2(k_2-18)=-z_{\beta},$ so $k_2=18-z_{\beta}/2.$ Taking $\beta=0.05,$ we have $z_{\beta}\approx1.645,$ and $k_2\approx18-1.645/2=17.1775.$

Then, following this argument and joining with that for $H_0$ done in Exercise 2.15, the decision would be:

If $\bar{X}\leq k_1\approx 15.8225,$ then we accept $H_0.$
If $\bar{X}\geq k_2\approx 17.1775,$ then we accept $H_1.$

The following question arises immediately: what shall we do if $15.8225<\bar{X}<17.1775$ ? Also, imagine that instead of $15$ units, the basis level of ocular tension ocular was $16.5$ units. Then $16.5+1.645/2=17.3225>k_2$ and we will be accepting $H_0$ and $H_1$ at the same time! These inconsistencies point out towards focusing on choosing just a single value $k$ from which to make a decision. But in this case, only one of the probabilities for the two types of error, $\alpha$ and $\beta,$ can be controlled.

If we decrease $\alpha$ too much, $\beta$ will increase. In addition, it may happen that $\bar{X}>k,$ so we would not have evidence against $H_0,$ but that the sample is neither representative of $H_1.$ It may also happen that $\bar{X}\leq k,$ so we will not have evidence against $H_0,$ but that however the sample is more representative of $H_1$ than of $H_0.$ Therefore, if we want to control $\alpha,$ in the first place we have to fix the null hypothesis $H_0$ as the most conservative statement, that is, the statement that will be assumed as true unless there is enough evidence against it. As a consequence, the decision to take is going to be one of the following:

“Reject $H_0$ ” if $\bar{X}>k_1$ (without commitment to accept $H_1$ );
“Do not reject $H_0$ ” if $\bar{X}\leq k_1$ (without commitment to reject $H_1,$ which could be valid).

In general, through this section we assume a rv $X$ with distribution within the family of distributions $\{F(\cdot;\theta):\theta\in\Theta\}$ for whom we want to determine the validity of a statement $H_0$ about the parameter $\theta$ against an alternative statement $H_1.$ Splitting the parametric space as $\Theta=\Theta_0\cup\Theta_1\cup\Theta_2,$ where typically⁷⁴ $\Theta_1=\bar{\Theta}_0$ and $\Theta_2=\emptyset,$ the hypotheses to test are of the form

$\begin{align*} H_0:\theta\in\Theta_0 \quad \text{vs.}\quad H_1:\theta\in\Theta_1. \end{align*}$

Recall that a statement about the unknown parameter $\theta$ is equivalent to a statement about the distribution $F(\cdot;\theta).$

Definition 6.1 (Null and alternative hypotheses) The null hypothesis (denoted by $H_0$ ) is the statement that is assumed true unless there is enough evidence against it. The confronting statement is the alternative hypothesis (denoted by $H_1$ ).

Definition 6.2 (Simple and composite hypotheses) If the set $\Theta_0\subset \Theta$ that determines the hypothesis $H_0$ contains a single element,⁷⁵ then $H_0$ is said to be a simple hypothesis. Otherwise, $H_0$ is referred to as a composite hypothesis.

The decision in favor or against $H_0$ is made from the information available in the realization of a srs $(X_1,\ldots,X_n)$ of $X.$

Definition 6.3 (Test) A test or a hypothesis test of $H_0$ vs. $H_1$ is a function $\varphi:\mathbb{R}^n\to\{0,1\},$ where $1$ stands for “reject $H_0$ in favor of $H_1$ ” and $0$ for “do not reject $H_0$ ”, of the form

$\begin{align*} \varphi(x_1,\ldots,x_n)=\begin{cases} 1 & \text{if}\ (x_1,\ldots,x_n)'\in C,\\ 0 & \text{if}\ (x_1,\ldots,x_n)'\in \bar{C}, \end{cases} \end{align*}$

where $C$ and $\bar{C}$ provide a partition of the sample space $\mathbb{R}^n.$ The set $C$ is denoted as the critical region or rejection region and $\bar{C}$ is the acceptance region.

A hypothesis test is entirely determined by the critical region $C.$ In principle, there exist infinitely many tests for testing a hypothesis at hand. The selection of a particular test is done according to the test reliability, that is, according to its “success rate”. The possible consequences — with respect to the reality about $H_0,$ which is unknown — of a test decision are given in the following table:

Test decision $\backslash$ Reality	$H_0$ true	$H_0$ false
Do not reject $H_0$	Correct decision	Type II error
Reject $H_0$	Type I error	Correct decision

Then, the reliability of a hypothesis test is quantified and assessed in terms of the two possible types of errors:

Error	Interpretation
Type I error	Reject $H_0$ if it is true
Type II error	Do not reject $H_0$ if it is false

As illustrated in Examples 6.1 and 6.2, the classical procedure for selecting a test among all the available ones is the following:

Fix a bound, $\alpha,$ for the probability of committing the Type I error. This bound is the significance level of the hypothesis test.
Exclude all the tests with critical regions $C$ that do not respect the bound for the Type I error, that is, that do not satisfy the condition $\begin{align*} \mathbb{P}(\text{Reject $H_0$}|H_0\ \text{true})=\mathbb{P}((X_1,\ldots,X_n)'\in C|\text{$H_0$ true})\leq \alpha. \end{align*}$
Among the selected tests, choose the one that has a critical region $C$ that minimizes the Type II error, that is, that minimizes $\begin{align*} \beta=\mathbb{P}(\text{Do not reject $H_0$}|\text{$H_1$ true})=\mathbb{P}((X_1,\ldots,X_n)'\in \bar{C}|H_1 \ \text{true}). \end{align*}$

Instead of determining the critical region $C$ directly as a subset of $\mathbb{R}^n,$ it is simpler to determine it through a statistic of the sample and express the critical region as a subset of the range of a statistic. Then, a test statistic will determine the rejection region of the test.

Definition 6.4 (Test statistic) A test statistic of a hypothesis $H_0$ vs. $H_1$ is a measurable function of the sample that involves $H_0$ and under $H_0$ has a known or approximable distribution.

A test statistic can be obtained usually by taking an estimator of the unknown parameter that is involved in $H_0$ and transforming it in such a way that has a usable distribution under $H_0.$

Summarizing what has been presented until now, the key elements of a hypothesis test are:

a null hypothesis $H_0$ and an alternative $H_1;$
a significance level $\alpha;$
a test statistic;
and a critical region $C.$

The following practical takeaways are implied by the definitions of $H_0$ and $H_1$ and are relevant for deciding how to assign $H_0$ and $H_1$ when carrying out a hypothesis test.

When deciding between how to assign a research question to $H_0$ or $H_1$ in a real application, remember:

$H_1$ is used for the statement that the researcher is “interested” in proving through the sample.
$H_0$ is the de facto statement that is assumed true unless there is enough evidence against it in the sample. It cannot be proved using a hypothesis test – at most it is not rejected (but not accepted either).

If we are interesting in testing the veracity of “Statement”, it is stronger to

reject $H_0: \overline{\mathrm{Statement}}$ in favor of $H_1: \mathrm{Statement}$ than
do not reject $H_0: \mathrm{Statement}$ vs. $H_1: \overline{\mathrm{Statement}}.$

In the first case we do find evidence in favor of “Statement”; in the second case we do not find evidence against “Statement”. Above,

$\overline{\mathrm{Statement}}$ is the negation of

$\mathrm{Statement}.$

Example 6.3 A political poll is made in order to know the voting intentions of the electorate regarding two candidates, A and B, and, specifically, if candidate A will win the elections. For that purpose, the number of voters $Y$ who will vote for candidate A within a sample of $n=15$ voters was recorded. The associated hypothesis test for this problem is

$\begin{align*} H_0:p=0.5\quad \text{vs.}\quad H_1:p>0.5, \end{align*}$

where $p$ denotes the proportion of voters in favor of $A.$ If $Y$ is the test statistic and the rejection region is set as $C=\{y\geq 12\},$ compute the probability of the Type I error for the test, $\alpha.$

The probability of the Type I error is

$\begin{align*} \alpha=\mathbb{P}(\text{Type error I})=\mathbb{P}(\text{Reject $H_0$}|\text{$H_0$ true})=\mathbb{P}(Y\geq 12|p=0.5). \end{align*}$

Since $Y\sim \mathrm{Bin}(n,0.5),$ the previous probability is

$\begin{align*} \alpha=\sum_{y=12}^{15} \binom{15}{y}(0.5)^{15}\approx0.0176. \end{align*}$

Example 6.4 Assume that the real proportion of voters for the candidate A is $p=0.6$ (so $H_0$ is false). What is the probability that in the test of Example 6.3 we obtain that candidate A will not win ( $H_0$ is not rejected)?

The probability of Type II error is

$\begin{align*} \beta=\mathbb{P}(\text{Type error II})=\mathbb{P}(\text{Do not reject $H_0$}|\text{$H_0$ false}). \end{align*}$

In this case, we want to compute the value of $\beta$ for $p=0.6,$ that is,

$\begin{align*} \beta&=\mathbb{P}(Y<12|p=0.6)=\sum_{y=0}^{11} \binom{15}{y}(0.6)^y(0.4)^{15-y}\approx0.9095. \end{align*}$

Then, if we employ that rejection region, the test would most likely conclude that candidate A will win, either the candidate is indeed going to win or not at all. The test is conservative due to the small sample size $n$ and $\alpha=0.0176.$ ⁷⁶

For the so-called two-sided tests, to be introduced in Section 6.2, $\Theta_0=\{\theta_0\},$ $\Theta_1=\bar{\Theta}_0,$ and $\Theta_2=\emptyset.$ However, for the so-called one-sided tests, $\Theta_0=\{\theta_0\},$ $\Theta_1=(\theta_0,+\infty),$ and $\Theta_2=(-\infty, \theta_0)$ (or $\Theta_1=(-\infty,\theta_0)$ and $\Theta_2=(\theta_0,+\infty)$ ).↩︎
Therefore, under $H_0$ the distribution of $X$ is completely known.↩︎
What will be $\alpha$ and $\beta$ for the rejection region $C=\{Y\geq 10\}$ ?↩︎