3.1 Explaining the one-sample \(t\)-test results
If we begin by assuming \(H_0\) is true, then we assume we have \(\mu = \mu_0 = 5\). To carry out the \(t\)-test, we use the \(t_{71}\) distribution, which is pictured below:
The above distribution is called the distribution under \(H_0\). We can see that the mean of the above distribution is at \(t = 0\), which, due to standardisation, represents the value \(\mu_0 = 5\).
The test statistic can be thought of as a standardised version of the sample mean. The test statistic can be calculated as
\[t = \displaystyle \frac{\bar{x} - \mu_0}{\text{SE}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \]
where:
- \(t\) denotes the test statistic
- \(\text{SE}\) refers to the Standard Error. The standard error is an estimate of the standard deviation of the mean, which, as we know from the previous topic, is \(\frac{s}{\sqrt{n}}\).
In our example, our test statistic can be calculated as
\[t = \displaystyle \frac{5.13 - 5}{0.5/\sqrt{72}} = 2.2062.\] Now comes the all-important question: If it were true that \(\mu = 5\), what are the chances that, when we took our sample of 72 patients, we would have seen this sample mean of \(\bar{x} = 5.13\) (which translates to a test statistic of \(t = 2.2062\)), or more extreme? Is our test statistic extreme in the context of the above \(t\)-distribution which assumes \(H_0\) is true? Let's have a look:
As it turns out, our test statistic is fairly extreme in the context of this distribution, because the probability of obvserving this test statitsic if \(H_0\) is true is only \(p = 0.0306\). That is:
- \(P(-2.2062 \leq T \geq 2.062) = 0.0306\)
This probability is our \(p\)-value.
In the type of hypothesis test we have done here, we were only interested in whether \(\mu\) was different from 5, which is why we have included the probability of seeing a test statistic at least as extreme as what we have seen in either direction. That is, greater than 2.062 or less than -2.062. This is called a two-sided test. This point will be further explained in the following sections.
Because our \(p\)-value was small, this means we have enough evidence to reject \(H_0\). Therefore, we have evidence to support the alternative hypothesis that \(\mu \neq 5\), i.e. this result is statistically significant.
How small does our \(p\)-value need to be for us to decide that the test statistic is extreme enough for us to reject \(H_0\)? The answer comes in the level of significance, \(\alpha\) (Greek letter, 'alpha'). In general, the standard level of significance is \(\alpha = 0.05\), although other levels of \(\alpha\) can be chosen. That is,
- if \(p < \alpha\), reject \(H_0\)
- if \(p > \alpha\), do not reject \(H_0\)
Note the wording above for when \(p > \alpha\): do not reject \(H_0\). Just because we do not have enough evidence to reject \(H_0\) does not mean we have proven \(H_0\) is true. So it is best to avoid using terms like, accept \(H_0\).
The method we have used above to carry out the hypothesis test is called the \(p\)-value approach.
There is another method we could use called the critical region approach. To understand this, let's consider the question, if \(\alpha = 0.05\), how extreme would our test statistic need to be in order to reject \(H_0\)? To answer this question, we can find the quantiles such that \(P(-t \leq T \geq t) = 0.05\) as represented below:
As we can see, \(P(-1.99 \leq T \geq 1.99) = 0.05\). This means that if our test statistic was in the range \(P(-1.99 \leq T \geq 1.99)\) (i.e., greater than 1.99 or less than -1.99) then we would say it falls in the critical region and we would reject \(H_0\), because any value of \(t\) within this range would result in \(p < 0.05\).