3.1 Explaining the one-sample $t$-test results | STM1001 Topic 5: Hypothesis Testing

3.1 Explaining the one-sample $t$ -test results

If we begin by assuming $H_0$ is true, then we assume we have $\mu = \mu_0 = 5$ . To carry out the $t$ -test, we use the $t_{71}$ distribution, which is pictured below:

The above distribution is called the distribution under $H_0$ . That is, it is the distribution of the sample mean assuming $H_0$ is true. We can see that the mean of the above distribution is at $t = 0$ , which, due to standardisation, represents the value $\mu_0 = 5$ .

The test statistic can be thought of as a standardised version of the sample mean. In general terms, the test statistic is defined as follows:

$T = \displaystyle \frac{\overline{X} - \mu_0}{\text{SE}} = \frac{\overline{X} - \mu_0}{S/\sqrt{n}},$

where:

$T$ denotes the test statistic
$\overline{X}$ denotes the sample mean
$\text{SE}$ refers to the Standard Error. The standard error is an estimator of the standard deviation of the mean, and is equal to $\frac{S}{\sqrt{n}}$ .

Note that the test statistic as defined above is random. That is, it is a random variable and we have that $T \sim t_{n - 1}$ . That is, in general terms, $T$ follows a $t$ -distribution with $n - 1$ degrees of freedom.

Once we have data, we can calculate the observed test statistic and then see where it lies in the context of the $t$ -distribution. The observed test statistic can be calculated as

$t = \displaystyle \frac{\bar{x} - \mu_0}{\text{se}} = \frac{\bar{x} - \mu_0}{s/\sqrt{n}},$

where:

$t$ denotes the observed test statistic
$\overline{x}$ denotes the observed sample mean
$\text{se}$ refers to the observed standard error. The observed standard error is an estimate of the standard deviation of the mean, and is equal to $\frac{s}{\sqrt{n}}$ .

Note the difference in notation between the random and observed test statistic definitions where, for example, $T$ is the random test statistic, and $t$ is the observed test statistic.

In our example, the observed test statistic can be calculated as

$t = \displaystyle \frac{5.13 - 5}{0.5/\sqrt{72}} = 2.2062.$ Now comes the all-important question: If it were true that $\mu = 5$ , what are the chances that, when we took our sample of 72 patients, we would have seen this sample mean of $\bar{x} = 5.13$ (which translates to a test statistic of $t = 2.2062$ ), or more extreme? Is our test statistic extreme in the context of the above $t$ -distribution which assumes $H_0$ is true? Let's have a look:

As it turns out, our test statistic is fairly extreme in the context of this distribution, because the probability of observing this test statistic if $H_0$ is true is only $p = 0.0306$ . That is:

$P(T \leq -2.2062) + P(T \geq 2.062) = 0.0306$

This probability is our $p$ -value.

In the type of hypothesis test we have done here, we were only interested in whether $\mu$ was different from 5, which is why we have included the probability of seeing a test statistic at least as extreme as what we have seen in either direction. That is, greater than 2.062 or less than -2.062. This is called a two-sided test. This point will be further explained in the following sections.

Because our $p$ -value was small, this means we have enough evidence to reject $H_0$ . Therefore, we have evidence to support the alternative hypothesis that $\mu \neq 5$ , i.e. this result is statistically significant.

How small does our $p$ -value need to be for us to decide that the test statistic is extreme enough for us to reject $H_0$ ? The answer comes in the level of significance, $\alpha$ (Greek letter, 'alpha'). In general, the standard level of significance is $\alpha = 0.05$ , although other levels of $\alpha$ can be chosen. That is,

if $p < \alpha$ , reject $H_0$
if $p > \alpha$ , do not reject $H_0$

Note the wording above for when $p > \alpha$ : do not reject $H_0$ . Just because we do not have enough evidence to reject $H_0$ does not mean we have proven $H_0$ is true. So it is best to avoid using terms like, accept $H_0$ .

The method we have used above to carry out the hypothesis test is called the $p$ -value approach.

There is another method we could use called the critical region approach. To understand this, let's consider the question, if $\alpha = 0.05$ , how extreme would our test statistic need to be in order to reject $H_0$ ? To answer this question, we can find the quantiles such that $P(T \leq -t) + P(T \geq t) = 0.05$ as represented below:

As we can see, $P(T \leq -1.99) + P(T \geq 1.99) = 0.05$ . This means that if our test statistic was either greater than 1.99 or less than -1.99 then we would say it falls in the critical region and we would reject $H_0$ , because any value of $t$ within this range would result in $p < 0.05$ .

3.1 Explaining the one-sample ttt-test results

3.1 Explaining the one-sample $t$ -test results