3.2 CLT simulated example with exponentially distributed population
You may recall that in the previous topic, we briefly introduced some other continuous distributions, one of which was the exponential distribution. The exponential distribution is known to be very skewed.
For this simulation, we have generated 100 observations from the \(\text{EXP}(10)\) distribution, i.e. with the parameter \(\lambda = 10\). Note that, according to the exponential distribution, this means we have \(\mu = \frac{1}{\lambda} = \frac{1}{10}\) and \(\sigma^2 = \frac{1}{\lambda^2} = \frac{1}{10^2}\). This random sample of 100 observations is represented in the green histogram below and, as expected, we can observe the data are clearly skewed to the right. The blue line is a normal density curve. Not surprisingly, this normal density curve does not fit the histogram well at all.
The red histograms represent sample means estimated as follows. Consider, for example, the first histogram of means with \(n = 5\). For that particular example, we generated \(n = 5\) observations from the \(\text{EXP}(10)\) distribution, and estimated the sample mean, \(\bar{x}\), from that sample. We then repeated this a further 9,999 times so that we obtain 10,000 estimates of \(\bar{x}\). These 10,000 estimates are represented in the first red histogram below. The blue line is the normal density curve we obtain via the Central Limit Theorem. That is, it is the normal density curve with \(\mu = \frac{1}{10}\) and \(\sigma^2 = \frac{1}{10^2}\div n = \frac{1}{500}\). The same procedure has been followed for the second two red histograms but with \(n = 30\) and \(n = 60\) respectively.
Of interest here is the following:
- Since the underlying distribution is exponential, it is no surprise that the green histogram appears highly skewed
- All four histograms are centered around \(\frac{1}{10} = 0.1\)
- The data in the red histogram of means with \(n = 5\) are more symmetric than the green histogram, but still with some skew to the right and not fitting the normal curve as well as the other red histograms
- The red histograms with \(n = 30\) and \(n = 60\) appear to be normally distributed, and the normal density curves fit the data very well. This is remarkable, considering how much skew we observe in the underlying distribution
- The red histograms display less variability as \(n\) increases.