3.1 CLT simulated example with normally distributed population

For this simulation, we have generated 100 observations from the standard normal distribution (recall that the standard normal distribution has \(\mu = 0\) and \(\sigma^2 = 1\)). This random sample of 100 observations is represented in the green histogram below. The blue line is the standard normal distribution probability density.

The red histograms represent sample means estimated as follows. Consider, for example, the first histogram of means with \(n = 5\). For that particular example, we generated \(n = 5\) observations from the standard normal distribution, and obtained the estimate, \(\bar{x}\), from that sample. We then repeated this a further 9,999 times so that we obtain 10,000 estimates. These 10,000 estimates are represented in the first red histogram below. The blue line is the normal density curve we obtain via the Central Limit Theorem. That is, it is the normal density curve with \(\mu = 0\) and \(\sigma^2 = 1/n = 1/5\). The same procedure has been followed for the second two red histograms but with \(n = 30\) and \(n = 60\) respectively.

Of interest here is the following:

  • Since the underlying distribution is Normal, it is no surprise that the green histogram resembles Normally distributed data
  • It is also no surprise that the red histograms of means also appear normally distributed
  • All four histograms are centered around 0
  • The red histograms display less variability compared with the green histogram, and this variability decreases as \(n\) increases.

We noted above that the variance of \(\overline{X}\) decreases as the sample size \(n\) increases. This is because the variance of \(\overline{X}\) is equal to \(\displaystyle \frac{\sigma^2}{n}\). Looking at this fraction, we can see that if \(n\) is small, then \(\displaystyle \frac{\sigma^2}{n}\) will be big. On the other hand, if \(n\) is large, then \(\displaystyle \frac{\sigma^2}{n}\) will be small.