Chapter 2 The Sample Mean

In this section, we will demonstrate how different samples taken from the same population can lead to different results. Let's suppose that for a given population, cholesterol levels are normally distributed with a mean of \(\mu = 5\) and a standard deviation of \(\sigma = 1\). Usually we do not know the true values of \(\mu\) and \(\sigma\), but for the sake of this example, let's assume that we do. Now let's further suppose five people were randomly selected from this population, and their cholesterol levels were as follows:

\[5.59, 5.71, 4.89, 4.55, 5.61\]

The resulting sample mean is \(\bar{x} = 5.27\). Do you think our sample mean did a good job of estimating \(\mu\)?

Well, let's suppose that now a different sample of five people were randomly selected from the population, with cholesterol levels as follows:

\[3.18, 5.63, 4.72, 4.72, 4.08\] This time, the resulting sample mean is \(\bar{x} = 4.47\). How well do you think our sample mean did at estimating \(\mu\) this time?

Suppose now we carry out this experiment three more times, with results from all five samples shown in the below table:

Table 2.1: Sample means and standard deviations resulting from five different samples with \(n = 5\)
Sample Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
Sample mean 5.27 4.47 5.37 5.2 4.69
Sample standard deviation 0.52 0.91 0.95 0.82 1.38

Would you be more confident of \(\bar{x}\) getting closer to \(\mu\) if the sample size \(n\) was bigger? Let's suppose now, that instead of taking a sample of \(n = 5\), we took a sample of \(n = 200\). If we repeated this experiment 5 times, and recorded the sample mean each time, the results would be as follows:

Table 2.2: Sample means and standard deviations resulting from five different samples with n = 200
Sample Sample 1 Sample 2 Sample 3 Sample 4 Sample 5
Sample mean 5.13 5.08 5.01 5.14 4.83
Sample standard deviation 1.09 0.91 0.99 1.03 0.94

How do you think our estimates with \(n = 200\) did compared to our estimates with \(n = 5\)? Let's take a look at how this experiment can be displayed graphically:

We could, in fact, carry out this experiment many, many more times, and we could let \(n\) get bigger and bigger. If we did, we would see that as the sample size \(n\) increases, our sample estimates would get closer and closer to \(\mu\). This example demonstrates that, holding all other things equal, the higher our sample size is, the more confident we can be in our estimates.

Now, one more very important concept to demonstrate here, is the difference between a sample of values for individuals (or units, or subjects), and a sample of means. Each of the above two plots show five sample means. Let's compare these with a sample of values for five individuals from the population:

The above picture demonstrates that when we are looking at a sample of values for individuals, as shown by the green dots above, we would expect there to be more variation: these are simply cholesterol readings of \(n = 5\) individuals from the population, and could take any value. We can also see that if we are looking at a sample of means instead, we have less variability. This is because red dots do not represent individuals. They represent sample means, each calculated from a sample of individuals.

As the above chart suggests, when looking at a distribution of individuals compared with a distribution of means, we have the following:

  • The mean of both types of distributions is the same (in the example above, it was \(\mu = 5\))
  • The variability of a distribution of individuals will be greater than the variability of a distribution of means
  • As the sample size \(n\) increases, the variability of a distribution of means will decrease.

Of course, the above example does not prove these facts to be true, it simply demonstrates them. In the next section however, we will consider these concepts more formally.