9.1 Review of General Principles

Classical hypothesis testing is discussed in almost all introductory textbooks on statistics, and the reader is assumed be familiar with the basic concepts.⁴⁶ In this section we provide a brief and concise review of the general principles of classical hypothesis testing. In the following sections we apply these principles to answer questions about the GWN model parameters and assumptions.

9.1.1 Steps for hypothesis testing

The main steps for conducting a classical hypothesis test are as follows:

Specify the hypotheses to be tested: \[ H_{0}:\textrm{null hypothesis vs. }H_{1}:\textrm{alternative hypothesis.} \] The null hypothesis is the maintained hypothesis (what is assumed to be true) and is the hypothesis to be tested against the data. The alternative hypothesis specifies what is assumed to be true if the null hypothesis is false. This can be specific or vague depending on the context.
Specify the significance level of the test: \[ \alpha=\textrm{significance level}=\Pr(\textrm{Reject }H_{0}|H_{0}\,\textrm{is true}). \] The significance level of a test specifies the probability of making a certain type of decision error: rejecting the null hypothesis when the null hypothesis is, in fact, true. In practice, the significance level \(\alpha\) is chosen to be a small number like \(0.01\) or \(0.05\).
Construct a test statistic, \(S\), from the observed data whose probability distribution is known when \(H_{0}\) is true.
Use the test statistic \(S\) to evaluate the data evidence regarding the validity of \(H_{0}\). Typically, if \(S\) is a big number then there is strong data evidence against \(H_{0}\) and \(H_{0}\) should be rejected; otherwise, there is not strong data evidence against \(H_{0}\) and \(H_{0}\) should not be rejected.
- Decide to reject \(H_{0}\) at the specified significance level \(\alpha\) if the value of the test statistic \(S\) falls in the rejection region of the test. Usually, the rejection region for \(S\) is determined by a critical value \(cv_{\alpha}\) such that: \[\begin{eqnarray*} S & > & cv_{\alpha}\Rightarrow\mathrm{reject}\,H_{0},\\ S & \leq & cv_{\alpha}\Rightarrow\mathrm{do\,not\,reject}\,H_{0}. \end{eqnarray*}\] Smaller values of \(\alpha\) typically make \(cv_{\alpha}\) larger and require more data evidence (i.e., larger \(S\)) to reject \(H_{0}.\)
- Alternatively, decide to reject \(H_{0}\) at the specified significance level \(\alpha\) if the p-value of the test statistic \(S\) is less than \(\alpha\). The p-value of the statistic \(S\) is defined as the significance level at which the test is just rejected.

9.1.2 Hypothesis tests and decisions

Hypothesis testing involves making a decision: reject \(H_{0}\) or do not reject \(H_{0}.\) Notice that if the data evidence strongly favors \(H_{0}\) then we say “we do not reject \(H_{0}\)” instead of “accept \(H_{0}\)”. We can rarely know the truth with absolute certainty with a finite amount of data so it is more appropriate to say “do not reject \(H_{0}\)” than to say “accept \(H_{0}\).”⁴⁷

Table 9.1: Decision table for hypothesis testing
		Reality
		\(H_{0}\) is true	\(H_{0}\) is false
Decision	Reject \(H_{0}\)	Type I error	No error
	Do not reject \(H_{0}\)	No error	Type II error

Table 9.2 shows the \(2\times2\) decision table associated with hypothesis testing. In reality there are two states of the world: either \(H_{0}\) is true or it is not.⁴⁸ There are two decisions to make: reject \(H_{0}\) or don’t reject \(H_{0}.\) If the decision corresponds with reality then the correct decision is made and there is no decision error. This occurs in the off-diagonal elements of the table. If the decision and reality disagree then there is a decision error. These errors are in the diagonal elements of the table. Type I error results when the decision to reject \(H_{0}\) occurs when, in fact, \(H_{0}\) is true. Type II error happens when the decision not to reject \(H_{0}\) occurs when \(H_{0}\) is false. To put these types of errors in context, consider a jury in a capital murder trial in the US.⁴⁹ In the US, a defendant is considered innocent until proven guilty. Here, the null hypothesis to be tested is: \[ H_{0}:\textrm{defendant is innocent}. \] The alternative is: \[ H_{1}:\textrm{defendant is guilty}. \]

There is a true state of the world here: the defendant is either innocent or guilty. The jury must decide what the state of the world is based on evidence presented to them. The decision table for the jury is shown in Table 9.2. Type I error occurs when the jury convicts an innocent defendant and puts that person on “death row”. Clearly, this is a terrible mistake. To avoid this type of mistake the jury should have a very small (close to zero) significance level for evaluating evidence so that Type I error very rarely occurs. This is why typical jury instructions are to convict only if evidence of guilt is presented beyond reasonable doubt. Type II error happens when the jury does not convict (acquits) a guilty defendant and sets that person free. This is also a terrible mistake, but perhaps it is not as terrible as convicting the innocent person. Notice that there is a conflict between Type I error and Type II error. In order to completely eliminate Type I error, you can never reject \(H_{0}\). That is, to avoid ever sending innocent people to “death row” you must never convict anyone. But if you never convict anyone then you never convict guilty people either and this increases the occurrences of Type II errors.

9.1.3 Significance level and power

The significance level of a test is the probability of Type I error: \[ \alpha=\Pr(\textrm{Type I error})=\Pr(\textrm{Reject}\,H_{0}|H_{0}\,\textrm{is true}). \] The power of a test is one minus the probability of Type II error: \[ \pi=1-\Pr(\textrm{Type II error})=\Pr(\textrm{Reject }H_{0}|H_{0}\,\textrm{is false}). \] In classical hypothesis tests, the goal is to construct a test that has a small significance level (\(\alpha\) close to zero) that you can control and has high power (\(\pi\) close to one). That is, a good test is one that has a small and controllable probability of rejecting the null when it is true and, at the same time, has a very high probability of rejecting the null when it is false. In general, an optimal test is one that has the highest possible power for a given significance level. In the jury example, an optimal test would be the one in which the jury has the highest probability of convicting a guilty defendant while at the same time has a very low probability of convicting an innocent defendant. A difficult problem indeed!

Table 9.2: Decision table for jury on capital murder trial
		Reality
		Defendant is innocent	Defendant is guilty
Decision	Convict	Type I error	No error
	Acquit	No error	Type II error

In this book, we do not consider evaluating hypotheses from a Bayesian perspective.↩︎
As an extreme example, consider testing \(H_{0}:\textrm{"all swans are white"}\). To logically accept this hypothesis you would have to show that every swan that ever existed is white. In contrast, if you have a data set that only contains white swans then saying you cannot reject the null hypothesis that all swans are white is consistent with your data set. ↩︎
It is important to emphasize that there is a “true state of the world” in this context. That is, the null hypothesis \(H_{0}\) is either true or it is not. As a result, \(\Pr\left\{H_{0}\,\textrm{is true}\right\} =1\) or \(0\) depending on whether \(H_{0}\) is true or not. ↩︎
A capital murder trial is one in which the defendant is eligible for the death penalty.↩︎