## 9.1 Review of General Principles

Classical hypothesis testing is discussed in almost all introductory textbooks
on statistics, and the reader is assumed be familiar with the basic
concepts.^{46} In this section we provide a brief and concise review of
the general principles of classical hypothesis testing. In the following
sections we apply these principles to answer questions about the GWN
model parameters and assumptions.

### 9.1.1 Steps for hypothesis testing

The main steps for conducting a classical hypothesis test are as follows:

Specify the hypotheses to be tested: \[ H_{0}:\textrm{null hypothesis vs. }H_{1}:\textrm{alternative hypothesis.} \] The

*null hypothesis*is the maintained hypothesis (what is assumed to be true) and is the hypothesis to be tested against the data. The*alternative hypothesis*specifies what is assumed to be true if the null hypothesis is false. This can be specific or vague depending on the context.Specify the significance level of the test: \[ \alpha=\textrm{significance level}=\Pr(\textrm{Reject }H_{0}|H_{0}\,\textrm{is true}). \] The

*significance level*of a test specifies the probability of making a certain type of decision error: rejecting the null hypothesis when the null hypothesis is, in fact, true. In practice, the significance level \(\alpha\) is chosen to be a small number like \(0.01\) or \(0.05\).Construct a

*test statistic*, \(S\), from the observed data whose probability distribution is known when \(H_{0}\) is true.Use the test statistic \(S\) to evaluate the data evidence regarding the validity of \(H_{0}\). Typically, if \(S\) is a big number then there is strong data evidence against \(H_{0}\) and \(H_{0}\) should be rejected; otherwise, there is not strong data evidence against \(H_{0}\) and \(H_{0}\) should not be rejected.

Decide to reject \(H_{0}\) at the specified significance level \(\alpha\) if the value of the test statistic \(S\) falls in the

*rejection region*of the test. Usually, the rejection region for \(S\) is determined by a critical value \(cv_{\alpha}\) such that: \[\begin{eqnarray*} S & > & cv_{\alpha}\Rightarrow\mathrm{reject}\,H_{0},\\ S & \leq & cv_{\alpha}\Rightarrow\mathrm{do\,not\,reject}\,H_{0}. \end{eqnarray*}\] Smaller values of \(\alpha\) typically make \(cv_{\alpha}\) larger and require more data evidence (i.e., larger \(S\)) to reject \(H_{0}.\)Alternatively, decide to reject \(H_{0}\) at the specified significance level \(\alpha\) if the p-value of the test statistic \(S\) is less than \(\alpha\). The

*p-value*of the statistic \(S\) is defined as the significance level at which the test is just rejected.

### 9.1.2 Hypothesis tests and decisions

Hypothesis testing involves making a decision: reject \(H_{0}\) or
do not reject \(H_{0}.\) Notice that if the data evidence strongly
favors \(H_{0}\) then we say “we do not reject \(H_{0}\)” instead
of “accept \(H_{0}\)”. We can rarely know the truth with absolute
certainty with a finite amount of data so it is more appropriate to
say “do not reject \(H_{0}\)” than to say “accept \(H_{0}\).”^{47}

Reality | |||
---|---|---|---|

\(H_{0}\) is true |
\(H_{0}\) is false |
||

Decision | Reject \(H_{0}\) |
Type I error | No error |

Do not reject \(H_{0}\) |
No error | Type II error |

Table 9.2 shows the \(2\times2\) decision
table associated with hypothesis testing. In reality there are two
states of the world: either \(H_{0}\) is true or it is not.^{48} There are two decisions to make: reject \(H_{0}\) or don’t reject
\(H_{0}.\) If the decision corresponds with reality then the correct
decision is made and there is no decision error. This occurs in the
off-diagonal elements of the table. If the decision and reality disagree
then there is a decision error. These errors are in the diagonal elements
of the table. Type I error results when the decision to reject \(H_{0}\)
occurs when, in fact, \(H_{0}\) is true. Type II error happens when
the decision not to reject \(H_{0}\) occurs when \(H_{0}\) is false.
To put these types of errors in context, consider a jury in a capital
murder trial in the US.^{49} In the US, a defendant is considered innocent until proven guilty.
Here, the null hypothesis to be tested is:
\[
H_{0}:\textrm{defendant is innocent}.
\]
The alternative is:
\[
H_{1}:\textrm{defendant is guilty}.
\]

There is a true state of the world here: the defendant is either innocent or guilty. The jury must decide what the state of the world is based on evidence presented to them. The decision table for the jury is shown in Table 9.2. Type I error occurs when the jury convicts an innocent defendant and puts that person on “death row”. Clearly, this is a terrible mistake. To avoid this type of mistake the jury should have a very small (close to zero) significance level for evaluating evidence so that Type I error very rarely occurs. This is why typical jury instructions are to convict only if evidence of guilt is presented beyond reasonable doubt. Type II error happens when the jury does not convict (acquits) a guilty defendant and sets that person free. This is also a terrible mistake, but perhaps it is not as terrible as convicting the innocent person. Notice that there is a conflict between Type I error and Type II error. In order to completely eliminate Type I error, you can never reject \(H_{0}\). That is, to avoid ever sending innocent people to “death row” you must never convict anyone. But if you never convict anyone then you never convict guilty people either and this increases the occurrences of Type II errors.

### 9.1.3 Significance level and power

The *significance level* of a test is the probability of Type
I error:
\[
\alpha=\Pr(\textrm{Type I error})=\Pr(\textrm{Reject}\,H_{0}|H_{0}\,\textrm{is true}).
\]
The *power* of a test is one minus the probability of Type II
error:
\[
\pi=1-\Pr(\textrm{Type II error})=\Pr(\textrm{Reject }H_{0}|H_{0}\,\textrm{is false}).
\]
In classical hypothesis tests, the goal is to construct a test that
has a small significance level (\(\alpha\) close to zero) that you
can control and has high power (\(\pi\) close to one). That is, a good
test is one that has a small and controllable probability of rejecting
the null when it is true and, at the same time, has a very high probability
of rejecting the null when it is false. In general, an *optimal test*
is one that has the highest possible power for a given significance
level. In the jury example, an optimal test would be the one in which
the jury has the highest probability of convicting a guilty defendant
while at the same time has a very low probability of convicting an
innocent defendant. A difficult problem indeed!

Reality | |||
---|---|---|---|

Defendant is innocent |
Defendant is guilty |
||

Decision | Convict |
Type I error | No error |

Acquit |
No error | Type II error |

In this book, we do not consider evaluating hypotheses from a Bayesian perspective.↩︎

As an extreme example, consider testing \(H_{0}:\textrm{"all swans are white"}\). To logically accept this hypothesis you would have to show that every swan that ever existed is white. In contrast, if you have a data set that only contains white swans then saying you cannot reject the null hypothesis that all swans are white is consistent with your data set. ↩︎

It is important to emphasize that there is a “true state of the world” in this context. That is, the null hypothesis \(H_{0}\) is either true or it is not. As a result, \(\Pr\left\{H_{0}\,\textrm{is true}\right\} =1\) or \(0\) depending on whether \(H_{0}\) is true or not. ↩︎

A capital murder trial is one in which the defendant is eligible for the death penalty.↩︎