3.1 Explaining the one-sample \(t\)-test results


If we begin by assuming \(H_0\) is true, then we assume we have \(\mu = \mu_0 = 5\). To carry out the \(t\)-test, we use the \(t_{71}\) distribution, which is pictured below:

The above distribution is called the distribution under \(H_0\). That is, it is the distribution of the sample mean assuming \(H_0\) is true. We can see that the mean of the above distribution is at \(t = 0\), which, due to standardisation, represents the value \(\mu_0 = 5\).

The test statistic can be thought of as a standardised version of the sample mean. In general terms, the test statistic is defined as follows:

\[T = \displaystyle \frac{\overline{X} - \mu_0}{\text{SE}} = \frac{\overline{X} - \mu_0}{S/\sqrt{n}}, \]

where:

  • \(T\) denotes the test statistic
  • \(\overline{X}\) denotes the sample mean
  • \(\text{SE}\) refers to the Standard Error. The standard error is an estimator of the standard deviation of the mean, and is equal to \(\frac{S}{\sqrt{n}}\).

Note that the test statistic as defined above is random. That is, it is a random variable and we have that \(T \sim t_{n - 1}\). That is, in general terms, \(T\) follows a \(t\)-distribution with \(n - 1\) degrees of freedom.

Once we have data, we can calculate the observed test statistic and then see where it lies in the context of the \(t\)-distribution. The observed test statistic can be calculated as

\[t = \displaystyle \frac{\bar{x} - \mu_0}{\text{se}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}, \]

where:

  • \(t\) denotes the observed test statistic
  • \(\overline{x}\) denotes the observed sample mean
  • \(\text{se}\) refers to the observed standard error. The observed standard error is an estimate of the standard deviation of the mean, and is equal to \(\frac{s}{\sqrt{n}}\).

Note the difference in notation between the random and observed test statistic definitions where, for example, \(T\) is the random test statistic, and \(t\) is the observed test statistic.

In our example, the observed test statistic can be calculated as

\[t = \displaystyle \frac{5.13 - 5}{0.5/\sqrt{72}} = 2.2062.\] Now comes the all-important question: If it were true that \(\mu = 5\), what are the chances that, when we took our sample of 72 patients, we would have seen this sample mean of \(\bar{x} = 5.13\) (which translates to a test statistic of \(t = 2.2062\)), or more extreme? Is our test statistic extreme in the context of the above \(t\)-distribution which assumes \(H_0\) is true? Let's have a look:

As it turns out, our test statistic is fairly extreme in the context of this distribution, because the probability of observing this test statistic if \(H_0\) is true is only \(p = 0.0306\). That is:

  • \(P(T \leq -2.2062) + P(T \geq 2.062) = 0.0306\)

This probability is our \(p\)-value.

In the type of hypothesis test we have done here, we were only interested in whether \(\mu\) was different from 5, which is why we have included the probability of seeing a test statistic at least as extreme as what we have seen in either direction. That is, greater than 2.062 or less than -2.062. This is called a two-sided test. This point will be further explained in the following sections.

Because our \(p\)-value was small, this means we have enough evidence to reject \(H_0\). Therefore, we have evidence to support the alternative hypothesis that \(\mu \neq 5\), i.e. this result is statistically significant.

How small does our \(p\)-value need to be for us to decide that the test statistic is extreme enough for us to reject \(H_0\)? The answer comes in the level of significance, \(\alpha\) (Greek letter, 'alpha'). In general, the standard level of significance is \(\alpha = 0.05\), although other levels of \(\alpha\) can be chosen. That is,

  • if \(p < \alpha\), reject \(H_0\)
  • if \(p > \alpha\), do not reject \(H_0\)

Note the wording above for when \(p > \alpha\): do not reject \(H_0\). Just because we do not have enough evidence to reject \(H_0\) does not mean we have proven \(H_0\) is true. So it is best to avoid using terms like, accept \(H_0\).

The method we have used above to carry out the hypothesis test is called the \(p\)-value approach.

There is another method we could use called the critical region approach. To understand this, let's consider the question, if \(\alpha = 0.05\), how extreme would our test statistic need to be in order to reject \(H_0\)? To answer this question, we can find the quantiles such that \(P(T \leq -t) + P(T \geq t) = 0.05\) as represented below:

As we can see, \(P(T \leq -1.99) + P(T \geq 1.99) = 0.05\). This means that if our test statistic was either greater than 1.99 or less than -1.99 then we would say it falls in the critical region and we would reject \(H_0\), because any value of \(t\) within this range would result in \(p < 0.05\).