3.1 Explaining the one-sample $$t$$-test results

If we begin by assuming $$H_0$$ is true, then we assume we have $$\mu = \mu_0 = 5$$. To carry out the $$t$$-test, we use the $$t_{71}$$ distribution, which is pictured below:

The above distribution is called the distribution under $$H_0$$. We can see that the mean of the above distribution is at $$t = 0$$, which, due to standardisation, represents the value $$\mu_0 = 5$$.

The test statistic can be thought of as a standardised version of the sample mean. The test statistic can be calculated as

$t = \displaystyle \frac{\bar{x} - \mu_0}{\text{SE}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}},$

where:

• $$t$$ denotes the test statistic
• $$\text{SE}$$ refers to the Standard Error. The standard error is an estimate of the standard deviation of the mean, which, as we know from the previous topic, is $$\frac{s}{\sqrt{n}}$$.

In our example, our test statistic can be calculated as

$t = \displaystyle \frac{5.13 - 5}{0.5/\sqrt{72}} = 2.2062.$ Now comes the all-important question: If it were true that $$\mu = 5$$, what are the chances that, when we took our sample of 72 patients, we would have seen this sample mean of $$\bar{x} = 5.13$$ (which translates to a test statistic of $$t = 2.2062$$), or more extreme? Is our test statistic extreme in the context of the above $$t$$-distribution which assumes $$H_0$$ is true? Let's have a look:

As it turns out, our test statistic is fairly extreme in the context of this distribution, because the probability of obvserving this test statitsic if $$H_0$$ is true is only $$p = 0.0306$$. That is:

• $$P(-2.2062 \leq T \geq 2.062) = 0.0306$$

This probability is our $$p$$-value.

In the type of hypothesis test we have done here, we were only interested in whether $$\mu$$ was different from 5, which is why we have included the probability of seeing a test statistic at least as extreme as what we have seen in either direction. That is, greater than 2.062 or less than -2.062. This is called a two-sided test. This point will be further explained in the following sections.

Because our $$p$$-value was small, this means we have enough evidence to reject $$H_0$$. Therefore, we have evidence to support the alternative hypothesis that $$\mu \neq 5$$, i.e. this result is statistically significant.

How small does our $$p$$-value need to be for us to decide that the test statistic is extreme enough for us to reject $$H_0$$? The answer comes in the level of significance, $$\alpha$$ (Greek letter, 'alpha'). In general, the standard level of significance is $$\alpha = 0.05$$, although other levels of $$\alpha$$ can be chosen. That is,

• if $$p < \alpha$$, reject $$H_0$$
• if $$p > \alpha$$, do not reject $$H_0$$

Note the wording above for when $$p > \alpha$$: do not reject $$H_0$$. Just because we do not have enough evidence to reject $$H_0$$ does not mean we have proven $$H_0$$ is true. So it is best to avoid using terms like, accept $$H_0$$.

The method we have used above to carry out the hypothesis test is called the $$p$$-value approach.

There is another method we could use called the critical region approach. To understand this, let's consider the question, if $$\alpha = 0.05$$, how extreme would our test statistic need to be in order to reject $$H_0$$? To answer this question, we can find the quantiles such that $$P(-t \leq T \geq t) = 0.05$$ as represented below:

As we can see, $$P(-1.99 \leq T \geq 1.99) = 0.05$$. This means that if our test statistic was in the range $$P(-1.99 \leq T \geq 1.99)$$ (i.e., greater than 1.99 or less than -1.99) then we would say it falls in the critical region and we would reject $$H_0$$, because any value of $$t$$ within this range would result in $$p < 0.05$$.