Chapter 1 Sampling

In previous weeks, we have discussed the concept of sampling, which involves randomly selecting a sample of \(n\) units from a given population. Why do we do this? Often, we are wanting to learn something about a population. For example: for a given population of people, what is the average cholesterol level? The problem here is that, especially for very large populations, it would be very difficult, if not impossible, to find out the cholesterol level for every person to then find out what the population average, \(\mu\) was. A much more feasible prospect is to measure the cholesterol level for a sample of people from the population. From this sample, we could then calculate the average cholesterol which we would call the sample mean and denote \(\bar{x}\). We could then use \(\bar{x}\) to estimate the population mean \(\mu\).

Our hope would be that the sample mean is close to the population mean, because it is really the population we are wanting to learn about. However, since the sample mean is only an estimate, we are never really sure how close our estimate is to the true value. However, statistics can help us to know how confident we can be in a given estimate. We can factor in things like variabililty, sample size, and sample design, to help us know how far we can go in drawing inferences about a population from our sample estimates.

In order to do this, we need to know what the sampling distribution is. Once we know this, it will be much easier draw conclusions about our estimates.

In this Topic, we will discuss these concepts with a focus on the sample mean as it is used to estimate the population mean. We will discuss the distribution of the sample mean, and how we can use this distribution to draw conclusions about our estimates.