22 CIs for one mean

So far, you have learnt to ask a RQ, design a study, describe and summarise the data, and construct confidence intervals for one proportion. You have also been introduced to confidence intervals. In this chapter, you will learn to construct confidence intervals for one mean. You will learn to:

  • produce confidence intervals for one mean.
  • determine whether the conditions for using the confidence intervals apply in a given situation.

22.1 Describing the sampling distribution: \(\sigma\) known

In this chapter, we study the situation where a population mean \(\mu\) (the parameter) is estimated by a sample mean \(\bar{x}\) (the statistic). The sample mean comes from just one of the many possible samples, and each possible sample is likely to produce a different value \(\bar{x}\). That is, the value of the sample mean varies from sample to sample, called sampling variation (which can be quantified using the standard error).

Remember: Studying a sample leads to the following observations:

  • Each sample is likely to be different.
  • Our sample is just one of countless possible samples from the population.
  • Each sample is likely to produce a different value for the sample mean.
  • Hence we only observe one of the many possible values for the sample mean.

Since many values for the sample mean are possible, the possible values of the sample mean vary (called sampling variation) and have a distribution (called a sampling distribution).

Consider rolling dice again. Suppose a die is rolled \(n = 25\) times, and the mean of the 25 numbers that are rolled is recorded. Since every face of the die is equally likely to appear on any one roll, the population mean of all possible rolls is \(\mu = 3.5\) (in the middle of the numbers on the faces of the die, which is also the median).

What will be the sample mean of the numbers in the 25 rolls? We cannot be sure, as the sample mean will vary from sample to sample (sampling variation).

Suppose we try rolling a die 25 times, to see how much the sample mean varies in 25 rolls as shown in the animation below for 10 sets of 25 rolls. The mean of the 25 rolls clearly varies, as expected. In the simulation, the sample mean of 25 rolls was as low as 3.08 and as high as 3.76.

The mean for any single sample of \(n = 25\) rolls will sometimes be higher than \(\mu = 3.5\), and sometimes lower than \(\mu = 3.5\), but most of the time the mean should be close to 3.5. If thousands of people made one set of 25 rolls each, and computed the mean for their set, every person would have a sample mean for their set of 25 rolls, and we could produce a histogram of all these sample means; see the animation below.

From the animation above, the sample means vary with an approximate normal distribution (as we saw with the sample proportions). This normal distribution is not describing the data; it is describing how the values of sample means vary across all possible samples. Under certain conditions, the values of the sample means can vary with a normal distribution, and this normal distribution has a mean and a standard deviation.

This distribution describes how sample means vary. The mean of this distribution---the sampling mean---has the value \(\mu\). The standard deviation of this distribution is called the standard error of the sample means, denoted \(\text{s.e.}(\bar{x})\). When the population standard deviation \(\sigma\) is known, the standard error happens to be

\[ \text{s.e.}(\bar{x}) = \frac{\sigma}{\sqrt{n}}. \] So the possible values of the sample means have a sampling distribution described by:

  • an approximate normal distribution,
  • with a sampling mean whose value is \(\mu\), and
  • a standard deviation, called the standard error, of \(\text{s.e.}(\bar{x}) = \sigma/\sqrt{n}\).

However, almost always the population mean, and the population standard deviation, are unknown (if they were known, we wouldn't need to take a sample to estimate them). Since the sampling distribution has an approximate normal distribution, the 68--95--99.7 rule can be applied: approximately 95% of the sample means are expected to be within two standard errors of \(\mu\).

22.2 Describing the sampling distribution: \(\sigma\) unknown

When a sample mean is used to estimate a population mean, the sample mean varies from sample to sample: sampling variation exists, as we saw in the previous section.

When the population standard deviation \(\sigma\) is unknown (which is almost always the case), it is estimated using the sample standard deviation \(s\). Then, the best we can do is use the estimate of the standard error of the sample mean: \(\displaystyle\text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}\). With this information, we can describe the sampling distribution of the sample mean (see Table 22.1).

Definition 22.1 (Sampling distribution of a sample mean) When the population standard deviation is unknown, the sampling distribution of the sample mean is described by:

  • an approximate a normal distribution,
  • centred around a sampling mean whose value is \(\mu\),
  • with a standard deviation (called the standard error of the mean) \(\text{s.e.}(\bar{x}\), whose value is \[\begin{equation} \text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}, \tag{22.1} \end{equation}\] when certain conditions are met (Sect. 22.4), where \(n\) is the size of the sample, and \(s\) is the standard deviation describing the variation in the individual observations in the sample (that is, the sample standard deviation).
TABLE 22.1: The notation used for describing means, and the sampling distribution of the sample means
Quantity Description
Individual values in the population Vary with mean \(\mu\) and standard deviation \(\sigma\)
Individual values in a sample Vary with mean \(\bar{x}\) and standard deviation \(s\)
Sample means (\(\bar{x}\)) across all possible samples Vary with approx. normal distribution (under certain conditions): sampling mean \(\mu\); standard deviation \(\text{s.e.}(\bar{x})\)

22.3 Computing confidence intervals

We don't know the value of \(\mu\) (the parameter), the population mean, but we have an estimate: the value of \(\bar{x}\), the sample mean (the statistic). The actual value of \(\mu\) might be a bit larger than \(\bar{x}\), or a bit smaller than \(\bar{x}\); that is, \(\mu\) is probably about \(\bar{x}\), give-or-take a bit.

Furthermore, the values of \(\bar{x}\) vary from sample to sample (sampling variation), and they vary with an approximate normal distribution. So, using the 68--95--99.7 rule, an approximate 95% interval could be constructed for the plausible values of \(\mu\) that may have given the observed values of the sample mean. This is a confidence interval.

A confidence interval (CI) for the population mean is an interval surrounding a sample mean. An approximate 95% confidence interval (CI) for \(\mu\) is \(\bar{x}\) give-or-take about two standard errors. In general, an confidence interval (CI) for \(\mu\) is

\[ \bar{x} \pm \overbrace{(\text{Multiplier}\times\text{s.e.}(\bar{x}))}^{\text{The `margin of error'}}. \] For an approximate 95% CI, the multiplier is, as usual, about \(2\) (since about 95% of values are within two standard deviations of the mean from the 68--95--99.7 rule).

The most common CIs are 95% CIs, but any level of confidence can be used, when a different multiplier is needed. In this book, a multiplier of \(2\) is used to create CIs manually (to find approximate 95% CIs), and otherwise software is used. Commonly, CIs are computed at 90%, 95% and 99% confidence levels.

The multiplier of 2 is not a \(z\)-score here. It would be a \(z\)-score if the value of the population standard deviation was known. Since we don't know the population standard deviation, and use the sample standard deviation instead, the multiplier is actually a \(t\)-score.

However, \(t\)- and \(z\)-multipliers are very similar, and (except for small sample sizes) using an approximate multiplier of 2 is reasonable for computing approximate 95% CIs in either case.

If we collected many samples of a specific size, \(\bar{x}\) and \(s\) would be different for each sample, so the calculated CI would be different for each. Some CIs would straddle the population mean \(\mu\), and some would not. We never know if the CI computed from our single sample straddles \(\mu\) or not.

Loosely speaking, there is a 95% chance that our 95% CI straddles \(\mu\). For a CI computed from a single sample, we don't know if our CI includes the value of \(\mu\) or not. The CI could also be interpreted as the range of plausible values of \(\mu\) that could have produced the observed value of \(\bar{x}\).

A CI gives a range of possible values of $\mu$ for which it is reasonable to produce the observed value of $\bar{x}$. The shaded regions represent the regions containing 95\% of the values of $\bar{x}$ for each value of $p$.

FIGURE 22.1: A CI gives a range of possible values of \(\mu\) for which it is reasonable to produce the observed value of \(\bar{x}\). The shaded regions represent the regions containing 95% of the values of \(\bar{x}\) for each value of \(p\).

Example 22.1 (School bags) A study of the school bags that 586 children (in Grades 6--8 in Tabriz, Iran) take to school found the mean weight was \(\bar{x} = 2.8\) kg with a standard deviation of \(s = 0.94\) kg (Dianat et al. 2014). The parameter \(\mu\) is the population mean weight of school bags for Iranian children in Grades 6--8.

Another sample of 586 children would produce a different sample mean, so the sample mean varies from sample to sample; sampling variation exists. The standard error of the sample mean is
\[ \text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}} = \frac{0.94}{\sqrt{586}} = 0.03883; \] see Fig. 22.2. The approximate 95% CI for the population mean school-bag weight is
\[ 2.8\pm(2 \times 0.03883), \] or \(2.8\pm0.07766\). (The margin of error is 0.07766.) This is an interval from 2.72 kg to 2.88 kg. This CI has a 95% chance of straddling the population mean bag weight.

Would a 99% CI for \(\mu\) be wider or narrower than the 95% CI? Why?

A wider interval is needed to be more confident that the interval contains the population mean.

The sampling distribution is a normal distribution; it shows how the sample mean bag weight varies in samples of size $n = 586$

FIGURE 22.2: The sampling distribution is a normal distribution; it shows how the sample mean bag weight varies in samples of size \(n = 586\)

22.4 Statistical validity conditions

As with any inference procedure, the underlying mathematics requires certain conditions to be met so that the results are statistically valid. The CI for one mean will be statistical valid if one of these is true:

  1. The sample size is at least 25, or
  2. The sample size is smaller than 25 and the population data has an approximate normal distribution.

The sample size of 25 is a rough figure here, and some books give other (similar) values (such as 30). This condition ensures that the sampling distribution of the sample means has an approximate normal distribution (so that, for example, the 68--95--99.7 rule can be used).

Provided the sample size is larger than about 25, this will be approximately true even if the distribution of the individuals in the population does not have a normal distribution. That is, when \(n > 25\) the sample means generally have an approximate normal distribution, even if the data themselves don't have a normal distribution.

A mean or a median may be appropriate for describing the data. However, the CI is about the mean since the sampling distribution for the sample mean (under certain conditions) has a normal distribution and so the mean is appropriate for describing the sampling distribution.

When \(n > 25\) approximately, we do not require that the data has a normal distribution. The sample means need to have a normal distribution, which is approximately true if the statistical validity condition is true.

This is one reason why means are used to describe samples: under certain conditions, sample means have an approximate normal distribution (so the 68--95--99.7 rule applies). In contrast, the distribution of sample medians is far more complicated to describe.

To determine if assuming the population has an approximate normal distribution in the statistical validity condition, the histogram of the sample can be constructed. However, we can't really be sure about the distribution of the population from the distribution of the sample. All we can reasonably do is to identify (from the sample) populations that likely to be very non-normal (when the CI would be not valid).

Example 22.2 (Assumptions) A study (Silverman et al. 1999; Zou, Tuncali, and Silverman 2003) to examine exposure to radiation for CT scans in the abdomen assessed \(n = 17\) patients. A CI for the mean radiation dose received could be formed. However, as the sample size is 'small' (less than 25), the population data must have a normal distribution for the CI to be statistically valid.

A histogram of the total radiation dose received using the sample data (Fig. 22.3) suggests this is very unlikely. Even though the histogram is from sample data, it seems improbable that the data in the sample would have come from a population with a normal distribution.

Computing a CI for the mean of these data will probably be statistically invalid. Other methods (beyond the scope of this course) are possible for computing a CI for the mean.

The radiation doses from CT scans for 17 people

FIGURE 22.3: The radiation doses from CT scans for 17 people

Example 22.3 (School bags) In Example 22.1, an approximate 95% CI was formed for the mean weight of school bags for Iranian children. Since the sample size was \(n = 586\), the CI is statistically valid. We do not have to assume that the distribution of school bag weights has a normal distribution in the population, as the sample size is (much) larger than 25.

22.5 Example: NHANES

Previously, this RQ was asked about the NHANES data:

Among Americans, is the mean direct HDL cholesterol different for current smokers and non-smokers?

The response variable is direct HDL cholesterol concentration. The parameter is \(\mu\), the population mean HDL cholesterol concentration. What is the population mean direct HDL cholesterol concentration?

From the data (using jamovi or SPSS), the sample mean is \(\bar{x} = 1.3649\) mmol/L; the standard deviation is \(s =0.39926\) mmol/L; and the sample size is \(n = 8474\). The value of \(\bar{x}\) will vary from sample to sample; sampling variation exists. The standard error is:

\[ \text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}} = \frac{ 0.39926} {\sqrt{8474}} = 0.00434\text{ mmol/L}. \] The approximate 95% CI uses a multiplier of \(2\), so the margin-of-error is \(2\times 0.0043 = 0.00867\). The approximate 95% CI is \(1.365\), give-or-take \(0.00867\); or from \(1.356\) to \(1.374\) mmol/L:

Based on the sample of size \(n = 8474\), a 95% CI for the population mean direct HDL cholesterol levels of Americans is between \(1.356\) and \(1.374\) mmol/L.

If many samples of the same size were found in the same way, and computed the CI from each, about 95% of the CIs would contain \(\mu\) (but this particular CI may or may not contain the value of \(\mu\)). Alternatively, the CI gives a range of plausible values for \(\mu\), or that we are about 95% confident that this CI straddles the value of \(\mu\).

Since the sample size is much larger than 25, this CI for mean direct HDL cholesterol is statistically valid, even though the histogram of direct HDL cholesterol for individuals is skewed right (Fig. 22.4). The distribution of the sample means should be normally distributed, not the distribution of the data.

Histogram of direct HDL cholesterol concentration

FIGURE 22.4: Histogram of direct HDL cholesterol concentration

22.6 Example: cadmium in peanuts

A study of peanuts from the United States (Blair and Lamb 2017) found the sample mean cadmium concentration was 0.0768 ppm with a standard deviation of 0.0460 ppm, from a sample of 290 peanuts gathered from a variety of regions at various times (attempting to find a representative sample). The parameter is \(\mu\), the population mean cadmium concentration in peanuts.

Every sample of \(n = 290\) peanuts is likely to produce a different sample mean, so sampling variation exists and can be measured using the standard error:

\[ \text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}} = \frac{0.0460}{\sqrt{290}} = 0.002701\text{ ppm}. \] The approximate 95% CI is \(0.0768 \pm (2 \times 0.002701)\), or \(0.0768 \pm 0.00540\), which is from 0.0714 to 0.0822 ppm. (The margin of error is 0.00540.)

If we repeatedly took samples of size 290 from this population, about 95% of the 95% CIs would contain the population mean (but this CI may or may not contain the value of \(\mu\)). The plausible values of \(\mu\) that could have produced \(\bar{x} = 0.0768\) are between 0.0714 and 0.0822ppm. Alternatively, we are about 95% confident that the CI of 0.0714 to 0.0822 ppm straddles the population mean.

Since the sample size is larger than \(25\), the CI is statistically valid.

22.7 Quick review questions

  1. True or false: The value of \(\bar{x}\) varies from sample to sample.

  2. True or false: A CI for \(\mu\) is statistically valid only if the histogram of the data has an approximate normal distribution.

  3. Suppose \(s = 8\) and \(n=20\). Which one of the following is true?

22.8 Exercises

Selected answers are available in Sect. D.21.

Exercise 22.1 A study of American black bears (Bartareau 2017) found the mean weight of the \(n = 185\) male bears was \(\bar{x} = 84.9\) kg, with a standard deviation of \(s = 51.1\) kg.

  1. Write down the parameter of interest.
  2. Compute the standard error of the mean.
  3. Compute the approximate 95% CI.
  4. Write a conclusion.
  5. Is the CI likely to be statistically valid?

Exercise 22.2 A study of the lung capacity of children in East Boston (Tager et al. 1979; Kahn 2005) measured the forced expiratory volume (FEV) of children in the area. The sample contained \(n = 45\) eleven-year-old girls. For these children, the mean lung capacity was \(\bar{x} = 2.85\) litres and the standard deviation was \(s = 0.43\) litres.

Find an approximate 95% CI for the population mean lung capacity of eleven-year-old females from East Boston.

Exercise 22.3 A study of lead smelter emissions near children's public playgrounds (Taylor et al. 2013) found the mean lead concentration at one playground (Memorial Park, Port Pirie, in South Australia) to be 6956.41 micrograms per square metre, with a standard deviation of 7571.74 micrograms of lead per square metre, from a sample of \(n = 58\) wipes taken over a seven-day period. (As a reference, the Western Australian Government recommends a maximum of 400 micrograms of lead per square metre.)

Find an approximate 95% CI for the mean lead concentration at this playground. Would these results apply to other playgrounds?

Exercise 22.4 A study (Ian D. M. Macgregor and Rugg-Gunn 1985) of the brushing time for 60 young adults (aged 18--22 years old) found the mean brushing time was 33.0 seconds, with a standard deviation of 12.0 seconds. Find an approximate 95% CI for the mean brushing time for young adults.

Exercise 22.5 A study of paramedics (B. Williams and Boyle 2007) asked participants (\(n = 199\)) to estimate the amount of blood loss on four different surfaces. When the actual amount of blood spill on concrete was 1000 ml, the mean guess was 846.4 ml (with a standard deviation of 651.1 ml).

  1. What is the approximate 95% CI for the mean guess of blood loss?
  2. Are the participants good at estimating the amount of blood loss on concrete?
  3. Is this CI likely to be valid?

Exercise 22.6 In Sect. 22.5, the approximate 95% CI for the mean direct HDL cholesterol was given as \(1.356\) to \(1.374\) mmol/L. Which (if any) of these interpretations are acceptable? Explain why are the other interpretations are incorrect.

  1. In the sample, about 95% of individuals have a direct HDL concentration between \(1.356\) to \(1.374\) mmol/L.
  2. In the population, about 95% of individuals have a direct HDL concentration between \(1.356\) to \(1.374\) mmol/L.
  3. About 95% of the samples are between \(1.356\) to \(1.374\) mmol/L.
  4. About 95% of the populations are between \(1.356\) to \(1.374\) mmol/L.
  5. The population mean varies so that it is between \(1.356\) to \(1.374\) mmol/L about 95% of the time.
  6. We are about 95% sure that sample mean is between \(1.356\) to \(1.374\) mmol/L.
  7. It is plausible that the sample mean is between \(1.356\) to \(1.374\) mmol/L.

Exercise 22.7 An article (Grabosky and Bassuk 2016) describes the diameter of Quercus bicolor trees planted in a lawn as having a mean of 25.8 cm, with a standard error of 0.64 cm, from a sample of 19 trees. Which (if any) of the following is correct?

  1. About 95% of the trees in the sample will have a diameter between \(25.8 - (2\times 0.64)\) and \(25.8 + (2\times 0.64)\) (based on using the 68--95--99.7 rule).
  2. About 95% of these types of trees in the population will have a diameter between \(25.8 - (2\times 0.64)\) and \(25.8 + (2\times 0.64)\) (based on using the 68--95--99.7 rule)?

Exercise 22.8 In a study of \(n = 30\) five-year-old children (Watanabe et al. 1995), the mean time for the children to eat a cookie was 61.3 s, with a standard deviation of 29.4 s.

  1. What is an approximate 95% CI for the population mean time for a five-year-old child to eat a cookie?
  2. Is the CI likely to be statistically valid?