3.1 Explaining the one-sample t-test results

If we begin by assuming H0 is true, then we assume we have μ=μ0=5. To carry out the t-test, we use the t71 distribution, which is pictured below:

The above distribution is called the distribution under H0. We can see that the mean of the above distribution is at t=0, which, due to standardisation, represents the value μ0=5.

The test statistic can be thought of as a standardised version of the sample mean. The test statistic can be calculated as

t=ˉxμ0SE=ˉxμ0s/n,

where:

  • t denotes the test statistic
  • SE refers to the Standard Error. The standard error is an estimate of the standard deviation of the mean, which, as we know from the previous topic, is sn.

In our example, our test statistic can be calculated as

t=5.1350.5/72=2.2062. Now comes the all-important question: If it were true that μ=5, what are the chances that, when we took our sample of 72 patients, we would have seen this sample mean of ˉx=5.13 (which translates to a test statistic of t=2.2062), or more extreme? Is our test statistic extreme in the context of the above t-distribution which assumes H0 is true? Let's have a look:

As it turns out, our test statistic is fairly extreme in the context of this distribution, because the probability of obvserving this test statitsic if H0 is true is only p=0.0306. That is:

  • P(2.2062T2.062)=0.0306

This probability is our p-value.

In the type of hypothesis test we have done here, we were only interested in whether μ was different from 5, which is why we have included the probability of seeing a test statistic at least as extreme as what we have seen in either direction. That is, greater than 2.062 or less than -2.062. This is called a two-sided test. This point will be further explained in the following sections.

Because our p-value was small, this means we have enough evidence to reject H0. Therefore, we have evidence to support the alternative hypothesis that μ5, i.e. this result is statistically significant.

How small does our p-value need to be for us to decide that the test statistic is extreme enough for us to reject H0? The answer comes in the level of significance, α (Greek letter, 'alpha'). In general, the standard level of significance is α=0.05, although other levels of α can be chosen. That is,

  • if p<α, reject H0
  • if p>α, do not reject H0

Note the wording above for when p>α: do not reject H0. Just because we do not have enough evidence to reject H0 does not mean we have proven H0 is true. So it is best to avoid using terms like, accept H0.

The method we have used above to carry out the hypothesis test is called the p-value approach.

There is another method we could use called the critical region approach. To understand this, let's consider the question, if α=0.05, how extreme would our test statistic need to be in order to reject H0? To answer this question, we can find the quantiles such that P(tTt)=0.05 as represented below:

As we can see, P(1.99T1.99)=0.05. This means that if our test statistic was in the range P(1.99T1.99) (i.e., greater than 1.99 or less than -1.99) then we would say it falls in the critical region and we would reject H0, because any value of t within this range would result in p<0.05.