3.1 Explaining the one-sample $t$-test results | STM1001 Topic 5: Hypothesis Testing

3.1 Explaining the one-sample $t$ -test results

If we begin by assuming $H_0$ is true, then we assume we have $\mu = \mu_0 = 5$ . To carry out the $t$ -test, we use the $t_{71}$ distribution, which is pictured below:

The above distribution is called the distribution under $H_0$ . We can see that the mean of the above distribution is at $t = 0$ , which, due to standardisation, represents the value $\mu_0 = 5$ .

The test statistic can be thought of as a standardised version of the sample mean. The test statistic can be calculated as

$t = \displaystyle \frac{\bar{x} - \mu_0}{\text{SE}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}},$

where:

$t$ denotes the test statistic
$\text{SE}$ refers to the Standard Error. The standard error is an estimate of the standard deviation of the mean, which, as we know from the previous topic, is $\frac{s}{\sqrt{n}}$ .

In our example, our test statistic can be calculated as

$t = \displaystyle \frac{5.13 - 5}{0.5/\sqrt{72}} = 2.2062.$ Now comes the all-important question: If it were true that $\mu = 5$ , what are the chances that, when we took our sample of 72 patients, we would have seen this sample mean of $\bar{x} = 5.13$ (which translates to a test statistic of $t = 2.2062$ ), or more extreme? Is our test statistic extreme in the context of the above $t$ -distribution which assumes $H_0$ is true? Let's have a look:

As it turns out, our test statistic is fairly extreme in the context of this distribution, because the probability of obvserving this test statitsic if $H_0$ is true is only $p = 0.0306$ . That is:

$P(-2.2062 \leq T \geq 2.062) = 0.0306$

This probability is our $p$ -value.

In the type of hypothesis test we have done here, we were only interested in whether $\mu$ was different from 5, which is why we have included the probability of seeing a test statistic at least as extreme as what we have seen in either direction. That is, greater than 2.062 or less than -2.062. This is called a two-sided test. This point will be further explained in the following sections.

Because our $p$ -value was small, this means we have enough evidence to reject $H_0$ . Therefore, we have evidence to support the alternative hypothesis that $\mu \neq 5$ , i.e. this result is statistically significant.

How small does our $p$ -value need to be for us to decide that the test statistic is extreme enough for us to reject $H_0$ ? The answer comes in the level of significance, $\alpha$ (Greek letter, 'alpha'). In general, the standard level of significance is $\alpha = 0.05$ , although other levels of $\alpha$ can be chosen. That is,

if $p < \alpha$ , reject $H_0$
if $p > \alpha$ , do not reject $H_0$

Note the wording above for when $p > \alpha$ : do not reject $H_0$ . Just because we do not have enough evidence to reject $H_0$ does not mean we have proven $H_0$ is true. So it is best to avoid using terms like, accept $H_0$ .

The method we have used above to carry out the hypothesis test is called the $p$ -value approach.

There is another method we could use called the critical region approach. To understand this, let's consider the question, if $\alpha = 0.05$ , how extreme would our test statistic need to be in order to reject $H_0$ ? To answer this question, we can find the quantiles such that $P(-t \leq T \geq t) = 0.05$ as represented below:

As we can see, $P(-1.99 \leq T \geq 1.99) = 0.05$ . This means that if our test statistic was in the range $P(-1.99 \leq T \geq 1.99)$ (i.e., greater than 1.99 or less than -1.99) then we would say it falls in the critical region and we would reject $H_0$ , because any value of $t$ within this range would result in $p < 0.05$ .

3.1 Explaining the one-sample tt-test results

3.1 Explaining the one-sample $t$ -test results