4.2 How the Central Limit Theorem applies

Recall the third assumption for the \(t\)-test:

The sample mean, \(\overline{X}\), is normally distributed.

This does not mean that the data itself needs to be normally distributed; only that the distribution of the sample mean needs to be normally distributed.

Recalling what we learnt in the previous topic, this means that as long as the sample size is 30 or greater, we can generally assume that the distribution of the sample mean is normal, regardless of whether or not the underlying distribution of the data is normal. Considering the cholesterol example where considering the data alone, the decision was somewhat unclear, it is straightforward to further apply the Central Limit Theorem and conclude that \(\overline{X}\) is normally distributed. In other words, the normality assumption has been met. This is because \(n = 72 > 30\).

However, applying the Central Limit Theorem in this manner should be done with caution. Although, with a sample size of \(n \geq 30\), the CLT generally means that theoretically the normality assumption would not have been violated, consider the following example:

Suppose we wish to carry out a one-sample \(t\)-test using a sample of \(n = 200\) observations of highly skewed data, such as what can be seen in the histogram below:

Recall that for the \(t\)-test, we are testing for the mean. Also consider the following question:

When we have skewed data, which measure of location is preferred?

median

Therefore, even though, due to the Central Limit Theorem, the normality assumption would not technically be violated here because we have \(n = 200 > 30\), we may wish to consider a different type of hypothesis test that is testing for a measure of location other than the mean. These types of tests do exist, but are beyond the scope of this subject.