## 6.1 Introduction

The following example serves to illustrate the main philosophy in hypothesis testing.

Example 6.1 A pharmaceutical company suspects that a given drug in testing produces an increment of the ocular tension as a secondary effect. The basis tension level has mean $$15$$ and the suspected increment is towards a mean of $$18$$ units. This increment may increase the risk of suffering glaucoma. Since the drug has a positive primary effect, it is important to check if this suspicion is true before moving forward on the commercialization of the drug.

Medical trials have established that, at a population level, the ocular tension tends to follow a $$\mathcal{N}(15,1)$$ distribution. The suspicion about the secondary effect then means that the ocular tension among the takers of the drug would be $$\mathcal{N}(18,1).$$ We denote by $$X\sim\mathcal{N}(\mu,1)$$ to the rv “ocular tension of the takers of the drug”. Then, the question the pharmaceutical company faces is to decide whether $$\mu=15$$ (the drug has no secondary effect) or $$\mu=18$$ (the drug has a secondary effect) based on empirical evidence. To formalize this, we define:

\begin{align*} \begin{cases} H_0:\mu=15 & \text{(null hypothesis)} \\ H_1:\mu=18 & \text{(alternative hypothesis)} \end{cases} \end{align*}

Then, we want to find evidence in favor of $$H_0$$ or of $$H_1.$$ But, since the drug has been proved effective, the pharmaceutical company is only willing to stop its commercialization if there is enough evidence against $$H_0$$ (or in favor of $$H_1$$), that is, if there is enough evidence pointing to the presence of a secondary effect. Therefore, the roles of $$H_0$$ and $$H_1$$ are not symmetric. $$H_0$$ is a “stickier” belief than $$H_1$$ for the pharmaceutical company.

In order to look for evidence against $$H_0$$ or in favor of $$H_1,$$ a sample of the ocular tension level of four drug takers is measured, and its sample mean $$\bar{X}$$ is computed. Then, the following is verified:

• If $$H_0$$ is true, then $$\bar{X}\sim \mathcal{N}(15,1/4).$$
• If $$H_1$$ is true, then $$\bar{X}\sim \mathcal{N}(18,1/4).$$

Then, if we obtain a small value of $$\bar{X},$$ we will have little or no evidence in favor of $$H_1$$ and believe $$H_0.$$ If we obtain a large value of $$\bar{X},$$ then the sample supports $$H_1.$$ But, up to which value of $$\bar{X}=k_1$$ are we willing to believe in $$H_0$$?

This question can be answered in the following way: we can limit the possibility of incorrectly stopping the commercialization of the drug to a small value $$\alpha,$$ i.e.,

\begin{align*} \mathbb{P}(\text{Reject H_0}|\text{H_0 true})\leq \alpha, \end{align*}

and choose the constant $$k_1$$ that satisfies this, that is, choose the smaller $$k_1$$ that verifies

\begin{align*} \mathbb{P}(\text{Reject H_0}|\text{H_0 true})=\mathbb{P}(\bar{X}>k_1|\mu=15)\leq \alpha. \end{align*}

Standardizing $$\bar{X},$$ we obtain the standard normal distribution. Then $$k_1$$ directly follows from the quantile of such a distribution:

\begin{align*} \mathbb{P}(\bar{X}> k_1|\mu=15)=\mathbb{P}(2(\bar{X}-15)> 2(k_1-15))=\mathbb{P}(Z> 2(k_1-15))\leq \alpha. \end{align*}

From here, we have that $$2(k_1-15)=z_{\alpha},$$ so $$k_1=15+z_{\alpha}/2.$$ For example, if we take $$\alpha=0.05,$$ then $$z_{\alpha}\approx1.645$$ and $$k_1\approx15+1.645/2=15.8225.$$ Therefore, if we obtain $$\bar{X}>15.8225$$ and $$H_0$$ was true, the obtained sample would belong to this “extreme” set that only has probability $$\alpha=0.05.$$ This implies that one of the following two possibilities is happening if the event $$\{\bar{X}>k_1\}$$ happens:

1. either $$H_0$$ is true but the obtained sample was “extreme” and is not very representative of its distribution;
2. or $$H_0$$ is not true because an event with low probability if $$H_0$$ was true just happened.

Following the logic of believing $$H_0$$ or not based on how likely is the realization of $$\bar{X}$$ under the assumption of veracity of $$H_0,$$ a rule to make a decision on the veracity of $$H_0$$ is the following:

• “Reject $$H_0$$” if $$\bar{X}>k_1,$$ as it is unlikely that the event $$\{\bar{X}>k_1\}$$ happened if $$H_0$$ is true;
• “Do not reject $$H_0$$” if $$\bar{X}\leq k_1,$$ as it is likely that the event $$\{\bar{X}\leq k_1\}$$ happened if $$H_0$$ is true.

The level $$\alpha$$ determines the probability of the data-dependent extreme event that is inconsistent with the veracity of $$H_0$$ and that triggers its rejection. A choice of $$\alpha=0.05$$ by the pharmaceutical company implies that the drug commercialization will only be stopped if the outcome of the ocular tension level test is only $$5\%$$ likely under the assumption that the drug has no secondary effect.

Example 6.2 We could make the same reasoning that we have made in Exercise 2.15, now with respect to $$H_1.$$ For small values of $$\bar{X},$$ we would think that $$H_1$$ is not true. But, up to which value $$\bar{X}=k_2$$ are we willing to believe $$H_1$$?

If we fix again a bound for the probability of committing an error (in this case, allowing the commercialization of the drug while it has secondary effects), $$\beta,$$ that is,

\begin{align*} \mathbb{P}(\text{Reject H_1}|\text{H_1 true})\leq \beta, \end{align*}

then we will choose the larger constant $$k_2$$ that verifies that relation, that is, that verifies

\begin{align*} \mathbb{P}(\text{Reject H_1}|\text{H_1 true})=\mathbb{P}(\bar{X}\leq k_2|\mu=18)\leq \beta. \end{align*}

Standardizing $$\bar{X}$$ in the previous probability, we obtain

\begin{align*} \mathbb{P}(\bar{X}\leq k_2|\mu=18)&=\mathbb{P}(2(\bar{X}-18)\leq 2(k_2-18)|\mu=18)\\ &=\mathbb{P}(Z\leq 2(k_2-18))\leq \beta, \end{align*}

in such a way that $$2(k_2-18)=-z_{\beta},$$ so $$k_2=18-z_{\beta}/2.$$ Taking $$\beta=0.05,$$ we have $$z_{\beta}\approx1.645,$$ and $$k_2\approx18-1.645/2=17.1775.$$

Then, following this argument and joining with that for $$H_0$$ done in Exercise 2.15, the decision would be:

• If $$\bar{X}\leq k_1\approx 15.8225,$$ then we accept $$H_0.$$
• If $$\bar{X}\geq k_2\approx 17.1775,$$ then we accept $$H_1.$$

The following question arises immediately: what shall we do if $$15.8225<\bar{X}<17.1775$$? Also, imagine that instead of $$15$$ units, the basis level of ocular tension ocular was $$16.5$$ units. Then $$16.5+1.645/2=17.3225>k_2$$ and we will be accepting $$H_0$$ and $$H_1$$ at the same time! These inconsistencies point out towards focusing on choosing just a single value $$k$$ from which to make a decision. But in this case, only one of the probabilities for the two types of error, $$\alpha$$ and $$\beta,$$ can be controlled.

If we decrease $$\alpha$$ too much, $$\beta$$ will increase. In addition, it may happen that $$\bar{X}>k,$$ so we would not have evidence against $$H_0,$$ but that the sample is neither representative of $$H_1.$$ It may also happen that $$\bar{X}\leq k,$$ so we will not have evidence against $$H_0,$$ but that however the sample is more representative of $$H_1$$ than of $$H_0.$$ Therefore, if we want to control $$\alpha,$$ in the first place we have to fix the null hypothesis $$H_0$$ as the most conservative statement, that is, the statement that will be assumed as true unless there is enough evidence against it. As a consequence, the decision to take is going to be one of the following:

• “Reject $$H_0$$” if $$\bar{X}>k_1$$ (without commitment to accept $$H_1$$);
• “Do not reject $$H_0$$” if $$\bar{X}\leq k_1$$ (without commitment to reject $$H_1,$$ which could be valid).

In general, through this section we assume a rv $$X$$ with distribution within the family of distributions $$\{F(\cdot;\theta):\theta\in\Theta\}$$ for whom we want to determine the validity of a statement $$H_0$$ about the parameter $$\theta$$ against an alternative statement $$H_1.$$ Splitting the parametric space as $$\Theta=\Theta_0\cup\Theta_1\cup\Theta_2,$$ where typically74 $$\Theta_1=\bar{\Theta}_0$$ and $$\Theta_2=\emptyset,$$ the hypotheses to test are of the form

\begin{align*} H_0:\theta\in\Theta_0 \quad \text{vs.}\quad H_1:\theta\in\Theta_1. \end{align*}

Recall that a statement about the unknown parameter $$\theta$$ is equivalent to a statement about the distribution $$F(\cdot;\theta).$$

Definition 6.1 (Null and alternative hypotheses) The null hypothesis (denoted by $$H_0$$) is the statement that is assumed true unless there is enough evidence against it. The confronting statement is the alternative hypothesis (denoted by $$H_1$$).

Definition 6.2 (Simple and composite hypotheses) If the set $$\Theta_0\subset \Theta$$ that determines the hypothesis $$H_0$$ contains a single element,75 then $$H_0$$ is said to be a simple hypothesis. Otherwise, $$H_0$$ is referred to as a composite hypothesis.

The decision in favor or against $$H_0$$ is made from the information available in the realization of a srs $$(X_1,\ldots,X_n)$$ of $$X.$$

Definition 6.3 (Test) A test or a hypothesis test of $$H_0$$ vs. $$H_1$$ is a function $$\varphi:\mathbb{R}^n\to\{0,1\},$$ where $$1$$ stands for “reject $$H_0$$ in favor of $$H_1$$” and $$0$$ for “do not reject $$H_0$$”, of the form

\begin{align*} \varphi(x_1,\ldots,x_n)=\begin{cases} 1 & \text{if}\ (x_1,\ldots,x_n)'\in C,\\ 0 & \text{if}\ (x_1,\ldots,x_n)'\in \bar{C}, \end{cases} \end{align*}

where $$C$$ and $$\bar{C}$$ provide a partition of the sample space $$\mathbb{R}^n.$$ The set $$C$$ is denoted as the critical region or rejection region and $$\bar{C}$$ is the acceptance region.

A hypothesis test is entirely determined by the critical region $$C.$$ In principle, there exist infinitely many tests for testing a hypothesis at hand. The selection of a particular test is done according to the test reliability, that is, according to its “success rate”. The possible consequences — with respect to the reality about $$H_0,$$ which is unknown — of a test decision are given in the following table:

Test decision $$\backslash$$ Reality $$H_0$$ true $$H_0$$ false
Do not reject $$H_0$$ Correct decision Type II error
Reject $$H_0$$ Type I error Correct decision

Then, the reliability of a hypothesis test is quantified and assessed in terms of the two possible types of errors:

Error Interpretation
Type I error Reject $$H_0$$ if it is true
Type II error Do not reject $$H_0$$ if it is false

As illustrated in Examples 6.1 and 6.2, the classical procedure for selecting a test among all the available ones is the following:

1. Fix a bound, $$\alpha,$$ for the probability of committing the Type I error. This bound is the significance level of the hypothesis test.

2. Exclude all the tests with critical regions $$C$$ that do not respect the bound for the Type I error, that is, that do not satisfy the condition \begin{align*} \mathbb{P}(\text{Reject H_0}|H_0\ \text{true})=\mathbb{P}((X_1,\ldots,X_n)'\in C|\text{H_0 true})\leq \alpha. \end{align*}

3. Among the selected tests, choose the one that has a critical region $$C$$ that minimizes the Type II error, that is, that minimizes \begin{align*} \beta=\mathbb{P}(\text{Do not reject H_0}|\text{H_1 true})=\mathbb{P}((X_1,\ldots,X_n)'\in \bar{C}|H_1 \ \text{true}). \end{align*}

Instead of determining the critical region $$C$$ directly as a subset of $$\mathbb{R}^n,$$ it is simpler to determine it through a statistic of the sample and express the critical region as a subset of the range of a statistic. Then, a test statistic will determine the rejection region of the test.

Definition 6.4 (Test statistic) A test statistic of a hypothesis $$H_0$$ vs. $$H_1$$ is a measurable function of the sample that involves $$H_0$$ and under $$H_0$$ has a known or approximable distribution.

A test statistic can be obtained usually by taking an estimator of the unknown parameter that is involved in $$H_0$$ and transforming it in such a way that has a usable distribution under $$H_0.$$

Summarizing what has been presented until now, the key elements of a hypothesis test are:

• a null hypothesis $$H_0$$ and an alternative $$H_1;$$
• a significance level $$\alpha;$$
• a test statistic;
• and a critical region $$C.$$

The following practical takeaways are implied by the definitions of $$H_0$$ and $$H_1$$ and are relevant for deciding how to assign $$H_0$$ and $$H_1$$ when carrying out a hypothesis test.

When deciding between how to assign a research question to $$H_0$$ or $$H_1$$ in a real application, remember:

• $$H_1$$ is used for the statement that the researcher is “interested” in proving through the sample.
• $$H_0$$ is the de facto statement that is assumed true unless there is enough evidence against it in the sample. It cannot be proved using a hypothesis test – at most it is not rejected (but not accepted either).

If we are interesting in testing the veracity of “Statement”, it is stronger to

1. reject $$H_0: \overline{\mathrm{Statement}}$$ in favor of $$H_1: \mathrm{Statement}$$ than
2. do not reject $$H_0: \mathrm{Statement}$$ vs. $$H_1: \overline{\mathrm{Statement}}.$$
In the first case we do find evidence in favor of “Statement”; in the second case we do not find evidence against “Statement”. Above, $$\overline{\mathrm{Statement}}$$ is the negation of $$\mathrm{Statement}.$$

Example 6.3 A political poll is made in order to know the voting intentions of the electorate regarding two candidates, A and B, and, specifically, if candidate A will win the elections. For that purpose, the number of voters $$Y$$ who will vote for candidate A within a sample of $$n=15$$ voters was recorded. The associated hypothesis test for this problem is

\begin{align*} H_0:p=0.5\quad \text{vs.}\quad H_1:p>0.5, \end{align*}

where $$p$$ denotes the proportion of voters in favor of $$A.$$ If $$Y$$ is the test statistic and the rejection region is set as $$C=\{y\geq 12\},$$ compute the probability of the Type I error for the test, $$\alpha.$$

The probability of the Type I error is

\begin{align*} \alpha=\mathbb{P}(\text{Type error I})=\mathbb{P}(\text{Reject H_0}|\text{H_0 true})=\mathbb{P}(Y\geq 12|p=0.5). \end{align*}

Since $$Y\sim \mathrm{Bin}(n,0.5),$$ the previous probability is

\begin{align*} \alpha=\sum_{y=12}^{15} \binom{15}{y}(0.5)^{15}\approx0.0176. \end{align*}

Example 6.4 Assume that the real proportion of voters for the candidate A is $$p=0.6$$ (so $$H_0$$ is false). What is the probability that in the test of Example 6.3 we obtain that candidate A will not win ($$H_0$$ is not rejected)?

The probability of Type II error is

\begin{align*} \beta=\mathbb{P}(\text{Type error II})=\mathbb{P}(\text{Do not reject H_0}|\text{H_0 false}). \end{align*}

In this case, we want to compute the value of $$\beta$$ for $$p=0.6,$$ that is,

\begin{align*} \beta&=\mathbb{P}(Y<12|p=0.6)=\sum_{y=0}^{11} \binom{15}{y}(0.6)^y(0.4)^{15-y}\approx0.9095. \end{align*}

Then, if we employ that rejection region, the test would most likely conclude that candidate A will win, either the candidate is indeed going to win or not at all. The test is conservative due to the small sample size $$n$$ and $$\alpha=0.0176.$$76

1. For the so-called two-sided tests, to be introduced in Section 6.2, $$\Theta_0=\{\theta_0\},$$ $$\Theta_1=\bar{\Theta}_0,$$ and $$\Theta_2=\emptyset.$$ However, for the so-called one-sided tests, $$\Theta_0=\{\theta_0\},$$ $$\Theta_1=(\theta_0,+\infty),$$ and $$\Theta_2=(-\infty, \theta_0)$$ (or $$\Theta_1=(-\infty,\theta_0)$$ and $$\Theta_2=(\theta_0,+\infty)$$).↩︎

2. Therefore, under $$H_0$$ the distribution of $$X$$ is completely known.↩︎

3. What will be $$\alpha$$ and $$\beta$$ for the rejection region $$C=\{Y\geq 10\}$$?↩︎