24 Confidence intervals for one proportion

So far, you have learnt to ask a RQ, design a study, describe and summarise the data, understand how sample statistics vary from sample to sample. In this chapter, you will learn to construct confidence intervals for one proportion. You will learn to:

  • identify situations where computing a sample proportion is appropriate.
  • form confidence intervals for single proportions.

24.1 Sampling distribution for \(\hat{p}\): known proportion

Suppose a fair, six-sided die is rolled \(25\) times. What proportion of the rolls will produce an even number? That is, what will be the value of the sample proportion of numbers that are even? Of course, no-one knows what proportion will be even for any sample of \(25\) rolls. In addition, the proportion of the \(25\) rolls that will be even will not be the same for every sample of \(25\) rolls. The sample proportion varies from sample to sample: sampling variation exists.

We have seen that the value of the sample statistic often varies between samples with a normal distribution (whose standard deviation is called the standard error). However, being more specific when describing the sampling distribution is useful.

Remember: studying a sample leads to the following observations:

  • Every sample is likely to be different.
  • We observe just one of the many possible samples.
  • Every sample is likely to yield a different value for the sample statistic.
  • We observe just one of the many possible values for the statistic.

Since many values for the sample proportion are possible, the possible values of the sample proportion vary (called sampling variation) and have a distribution (called a sampling distribution).

To better understand the sampling distribution for the proportion of even numbers in \(25\) rolls of a die, statistical theory could be used... or thousands of repetitions of a sample of \(25\) rolls could be performed... or a computer could simulate many samples of \(25\) rolls. Let's simulate rolling a die \(25\) times, using just ten samples of \(25\) rolls, and find the value of \(\hat{p}\) for each; see the animation below.

In this example, the population proportion of even rolls is \(p = 0.5\) (using the classical approach to probability: three of the six faces of the die are even). Each sample of \(n = 25\) rolls produces a sample proportion, denoted by \(\hat{p}\), which varies from sample to sample. For these ten samples, the proportion of even rolls ranged from \(\hat{p} = 0.32\) to \(\hat{p} = 0.60\).

The sample proportions would be expected to vary around \(p = 0.5\) (the population proportion). Of course, the sample proportion in \(25\) rolls could be very small or very high by chance, but we wouldn't expect to see that very often. The sample proportions exhibit sampling variation, and the amount of sampling variation is quantified using a standard error.

\(p\) refers to the population proportion, and \(\hat{p}\) refers to the sample proportion.

The symbol \(\hat{p}\) is pronounced 'pee-hat'. (The \(\hat{\null}\) is called a 'hat'.)

Suppose a fair die was rolled \(25\) times, and this was repeated thousands of times (not just for ten samples as in the animation above), and the proportion of even rolls was recorded for every one of those thousands of samples. These thousands of sample proportions \(\hat{p}\), one from every sample of rolls, could be graphed using a histogram; see the animation below.

The shape of the histogram is roughly a normal distribution. This is no accident: statistical theory says this will happen (when certain conditions are met: see Sect. 24.6). The mean of this distribution is called the sampling mean, and the standard deviation for this distribution is called the standard error, denoted \(\text{s.e.}(\hat{p})\) (see Fig. 24.1).

More specifically, the values of the mean and standard deviation of the normal distribution the animation above can be determined using statistical theory. The sampling distribution for \(\hat{p}\) has:

  • an approximate normal distribution,
  • centred around a sampling mean whose value is \(p = 0.5\),
  • with a standard deviation, called the standard error \(\text{s.e.}(\hat{p})\), whose value is \(0.1\) (where this number comes from will be revealed later, in Eq. (24.1)).

This distribution is called a sampling distribution (Sect. 21.1), whose standard deviation is called a standard error. A picture of this normal distribution can be drawn (Fig. 24.1). While we still don't know exactly what we'll find next roll, we have some idea of how the sample proportion varies in samples of \(25\) rolls. For instance, values of \(\hat{p}\) less than \(0.2\), or greater than \(0.8\) are unlikely to be observed.

The normal distribution, showing a model of how the proportion of even rolls varies when a die is rolled $25$ times

FIGURE 24.1: The normal distribution, showing a model of how the proportion of even rolls varies when a die is rolled \(25\) times

The parameter \(p\) and the statistic \(\hat{p}\) are both proportions.

However, the average value of the sample proportions can be described by a sampling mean, whose value is \(p\), and the amount of variation in the sample proportions can be described by a standard deviation (called a standard error in this context) \(\text{s.e.}(\hat{p})\).

For example, the sampling mean of the sampling distribution is the 'average' value of all possible sample proportions, \(\hat{p}\).

The value of the standard error for a sample proportion, when the value of \(p\) is known, is
\[ \text{s.e.}(\hat{p}) = \sqrt{\frac{p \times (1 - p)}{n}}, \] where \(n\) is the sample size used to compute \(\hat{p}\), and \(p\) is the population proportion. For the die example, where \(n = 25\) rolls and the population proportion of even rolls is \(p = 0.5\), the standard error of the sample proportion is
\[\begin{equation} \text{s.e.} (\hat{p}) = \sqrt{\frac{0.5 \times (1 - 0.5)}{25}} = 0.1. \tag{24.1} \end{equation}\] This standard error is the standard deviation of the normal distribution in Fig. 24.1.

However, almost always, the value of \(p\) is unknown. This situation is studied from Sect. 24.3 onwards.

Definition 24.1 (Sampling distribution of a sample proportion with $p$ known) When the value of \(p\) is known, the sampling distribution of the sample proportion is (when certain conditions are met; Sect. 24.6) described by

  • an approximate normal distribution,
  • centred around the sampling mean whose value is \(p\),
  • with a standard deviation (called the standard error of \(\hat{p}\)) \(\text{s.e.}(\hat{p})\), whose value is
    \[\begin{equation} \text{s.e.}(\hat{p}) = \sqrt{\frac{ p \times (1 - p)}{n}}, \tag{24.2} \end{equation}\] where \(n\) is the size of the sample, and \(p\) is the population proportion. In general, the approximation gets better as the sample size gets larger.

Based on the sampling distribution in Fig. 24.1, how often would a value of \(\hat{p}\) larger than \(0.80\) be expected?

Figure 24.1 suggests that, while not impossible, 0.80 or greater will be observed rarely.

24.2 Sampling intervals: known proportion

Since the possible values of the sample proportions \(\hat{p}\) can be described by an approximate normal distribution, the \(68\)--\(95\)--\(99.7\) rule (Def. 22.1) applies. For example (Fig. 24.1), about \(68\)% of the time, a sample of \(25\) rolls will have a value of \(\hat{p}\) between \(0.5\) give-or-take one standard deviation (that is, give-or-take \(0.1\)). So, about \(68\)% of the time, the proportion of even rolls in a sample of \(25\) rolls will be between \(0.5 - 0.1 = 0.4\) and \(0.5 + 0.1 = 0.6\). Similarly, about \(95\)% of the time, the proportion of even rolls will be between \(0.5\) give-or-take \((2\times0.1\)), or between \(0.3\) and \(0.7\).

These intervals tell us what values of \(\hat{p}\) are likely to be observed in samples of size \(25\). Most of the time (i.e., approximately \(95\)% of the time), the value of \(\hat{p}\) is expected to be between \(0.30\) and \(0.70\). (In the animation above, all ten samples of \(25\) rolls (i.e., \(100\)%) had a sample proportion between \(0.30\) and \(0.70\).)

Formally, the sample proportion \(\hat{p}\) is likely to lie within the interval
\[ p \pm (\text{multiplier} \times \text{s.e.}(\hat{p})), \] where \(\text{s.e.}(\hat{p})\) is the standard error of the sample proportion (calculated using Eq. (24.2)), and the multiplier comes from the \(68\)--\(95\)--\(99.7\) rule, depending on the level of coverage being sought. This is called a sampling interval.

A known value of $p$ produces a range of $\hat{p}$ values.

FIGURE 24.2: A known value of \(p\) produces a range of \(\hat{p}\) values.

The symbol '\(\pm\)' means 'plus or minus', or (colloquially) 'give-or-take'.

The multiplier depends on how confident we wish to be that the interval contains the value of \(\hat{p}\). For a \(95\)% interval, the multiplier is approximately \(2\), based on the \(68\)--\(95\)--\(99.7\) rule: Approximately \(95\)% of observations are within two standard deviations of the value of \(p\) (the mean of the normal distribution in Fig. 24.1). That is, the approximate \(95\)% sampling interval is:
\[\begin{equation} p \pm (2 \times \text{s.e.}(\hat{p}) ). \tag{24.3} \end{equation}\] For a \(90\)% sampling interval, for example, either tables or a computer would be used to find the correct multiplier, since the \(68\)--\(95\)--\(99.7\) rule isn't helpful.

24.3 Sampling distribution for \(\hat{p}\): unknown proportion

In the die example (Sects. 24.1 and 24.2), the value of \(p\) was known. However, usually the value of \(p\) (the parameter) is ; after all, the reason for taking a sample is to estimate the unknown value of \(p\).

When \(p\) is unknown, the best available estimate of \(p\) (which is \(\hat{p}\)) is used to compute the standard error. When the value of \(p\) is unknown, the standard error of the sample proportion (written \(\text{s.e.}(\hat{p})\)) is approximately
\[ \text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1 - \hat{p})}{n}}. \]

Definition 24.2 (Sampling distribution of a sample proportion with $p$ unknown) When the value of \(p\) is unknown, the sampling distribution of the sample proportion is (when certain conditions are met; Sect. 24.6) described by

  • an approximate normal distribution,
  • centred around the sampling mean, whose value is \(p\),
  • with a standard deviation (called the standard error of \(\hat{p}\)) whose value is
    \[\begin{equation} \text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1-\hat{p})}{n}}, \tag{24.4} \end{equation}\] where \(n\) is the size of the sample, and \(\hat{p}\) is the sample proportion. In general, the approximation gets better as the sample size gets larger.

When computing the standard error for a proportion, take care!

Make sure you use a proportion in the formula, not a percentage (i.e., \(0.5\) rather than \(50\)%). Also: Don't forget to take the square root!

Let's pretend for the moment that the proportion of even rolls on a die is unknown (to demonstrate ideas). An estimate of the proportion of even rolls can be found by rolling a die \(n = 25\) times, and computing \(\hat{p}\) (an estimate of \(p\)).

Suppose \(11\) of the \(n = 25\) rolls produced an even number, so that \(\hat{p} = 11/25 = 0.44\). The unknown value of \(p\) could be a bit larger than \(\hat{p} = 0.44\), or a bit smaller than \(\hat{p} = 0.44\). In other words, the unknown value of \(p\) is likely to be \(\hat{p}\), give-or-take a bit.

Since the sampling distribution has an approximate normal distribution (Def. 24.2), the \(68\)--\(95\)--\(99.7\) rule can be used to compute the approximate give-or-take amount (using the ideas in Sect. 24.2): the give-or-take amount, called the margin of error, is \(\left(\text{multiplier}\times\text{s.e.}(\hat{p})\right)\). This means that the interval is
\[ \hat{p} \pm \left(\text{multiplier}\times\text{s.e.}(\hat{p})\right), \] for a suitable multiplier. This interval for \(p\) is called a confidence interval.

Using \(\hat{p} = 0.44\) and \(n = 25\), then
\[ \text{s.e.}(\hat{p}) = \sqrt{ \frac{ 0.44 \times (1 - 0.44)}{25}} = 0.099277. \] So, the (approximate) \(95\)% CI is \(0.44 \pm (2 \times 0.099277)\), or from \(0.241\) to \(0.639\). This confidence interval is an interval containing values of \(p\) that could have reasonably produced the observed value of \(\hat{p}\) (Fig. 24.3), with \(95\)% confidence.

Various possible values for $p$, and the corresponding $95$\% sampling intervals (i.e., the values of $\hat{p}$ these are likely to produce).

FIGURE 24.3: Various possible values for \(p\), and the corresponding \(95\)% sampling intervals (i.e., the values of \(\hat{p}\) these are likely to produce).

In general, we do not know if the computed interval straddles the value of \(p\), since the value of \(p\) is usually unknown. However, in this contrived example, the CI does straddle the known value of \(p = 0.5\).

In this case, we know the value of the population parameter: \(p = 0.5\). Usually we do not know the value of the parameter:. After all, that's why we take a sample; to estimate the value of the population proportion.

24.4 Confidence intervals for \(p\): unknown proportion

Suppose thousands of people rolled a die \(25\) times, and each person found \(\hat{p}\) for their sample, and hence computed the CI for their sample of \(25\) rolls. Every sample of \(25\) rolls could produce a different estimate \(\hat{p}\), and so a different value for \(\text{s.e.}(\hat{p})\), and hence a different \(95\)% CI. However, about \(95\)% of these thousands of confidence intervals from those thousands of samples would straddle the true proportion \(p\).

Since we usually don't know the value of \(p\), and we usually only have one sample (and hence one CI), in general we never know whether any single CI computed from our single sample straddles \(p\) or not.

Again, consider letting the computer simulate the situation. Suppose the process of recording the sample proportion of even numbers in \(n = 25\) rolls is repeated fifty times, and for each of those fifty sets of 25 rolls a CI is produced (see the animation below). About \(95\)% of those \(95\)% CIs straddle the value \(p = 0.5\) (shown as solid lines)... but some do not (shown as dashed lines). Of course, since the value of \(p\) is usually unknown, we never know if our CI from a single sample contains \(p\) or not.

Definition 24.3 (Confidence interval) A confidence interval is an interval in which the population parameter is likely to be contained, if we found many samples the same way. If a \(95\)% confidence interval (or CI) is computed from many samples, about \(95\)% of the CIs would straddle the parameter of interest. This interval is called a confidence interval. Almost always, we only have one sample.

A confidence interval is an interval containing values of \(p\) that could have reasonably produced the observed value of \(\hat{p}\) (Fig. 24.3). In general, a CI for the population proportion \(p\) is found using
\[ \hat{p} \pm ( \text{multiplier} \times \text{s.e.}(\hat{p})), \] where the multiplier is \(2\) for an approximate \(95\)% CI (based on the \(68\)--\(95\)--\(99.7\) rule).

Definition 24.4 (Confidence interval for $p$) A confidence interval (CI) for the unknown value of the population proportion \(p\) is
\[\begin{equation} \hat{p} \pm ( \text{multiplier} \times \text{s.e.}(\hat{p})), \tag{24.5} \end{equation}\] where \(( \text{multiplier} \times \text{s.e.}(\hat{p}))\) is the margin of error, and \[ \text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1 - \hat{p}) }{n}} \] is the standard error of \(\hat{p}\), where \(\hat{p}\) is the sample proportion, and \(n\) is the sample size. For an approximate \(95\)% CI, the multiplier is \(2\). The quantity \(( \text{multiplier} \times \text{s.e.}(\hat{p}))\) is called the margin of error.

In general, higher confidence means wider intervals (Fig. 24.4), since wider intervals are needed to be more certain that the interval contains \(\hat{p}\). Try changing the confidence level for the CI in the interaction below.

FIGURE 24.4: Changing the confidence level of the CI changes the width, for any given sample size

Using the \(68\)--\(95\)--\(99.7\) rule produces approximate multipliers and hence approximate CIs. In reality, finding the exact multipliers (and hence exact CIs) is more involved.

In this book, we use multipliers from the \(68\)--\(95\)--\(99.7\) rule and create approximate CIs. Except for small sample sizes, the approximations are generally very good. To form exact CIs, software would be used.

24.5 Interpretation of a CI

The correct interpretation (Def. 24.3) of a \(95\)% CI is the following:

If the same size samples were repeatedly taken many times, and the \(95\)% confidence interval computed for each sample, \(95\)% of these confidence intervals formed would contain the population parameter.

In Sect. 24.4, the CI was interpreted as giving a range of values of \(p\) that could reasonably be expected to produce the observed value of \(\hat{p}\). The CI can also be seen as having a \(95\)% chance of straddling the value of the parameter. These are close to the correct interpretation.

Commonly, the CI is interpreted as having a \(95\)% chance of containing the value of population parameter \(p\). This is not strictly correct (the CI either does or does not contain the value of \(p\)), but is like a convenience that captures the essence of the correct interpretation. More details on interpreting a CI are given in Sect. 26.3.

24.6 Statistical validity conditions

The histogram in Sect. 24.1 shows the proportion of \(n = 25\) rolls that were even for many samples; it has an approximate normal distribution. Because of this, the \(68\)--\(95\)--\(99.7\) rule could be used to determine the multipliers and form the approximate \(95\)% CIs. However, certain conditions must be true for the sampling distribution to be approximately normally distributed, or statistically valid. Whenever a confidence interval is formed, the relevant statistical validity conditions need to be checked.

Definition 24.5 (Statistical validity) A CI is statistically valid if the conditions for the underlying mathematical calculations and assumptions to be approximately correct are met.

Example 24.1 (Statistical validity analogy) Suppose your doctor asks you to get a blood test, after fasting (refraining from eating) for \(12\) hours before your test.

The next day, you have a big breakfast, lunch at a cafe, and then have your blood test. Your blood is analysed, and your doctor is emailed the results of the blood test.

Since you did not fast, the results may or may not be valid. The doctor can learn something... but not as much as if you had followed instructions. Similarly, if the conditions for computing the confidence interval are not met, we can still perform the mathematical calculations and obtains values... but the results may be suspect.

The statistical validity conditions for creating CI for a single proportion are that:

  • the number of individuals in the group of interest must exceed \(5\), and
  • the number of individuals in the group not of interest must exceed \(5\).

These conditions ensure that the sampling distribution of \(\hat{p}\) has an approximate normal distribution, so that the \(68\)--\(95\)--\(99.7\) rule (approximately) applies. If this condition is not met, the normal distribution may not approximate the sampling distribution well, so the \(68\)--\(95\)--\(99.7\) rule may be inappropriate, and so the CI may also be slightly wrong.

Example 24.2 (Statistical validity) For the die-throwing example in Sect. 24.3, \(n = 25\) and \(\hat{p} = 0.44\). This means there were \(11\) even rolls, and \(14\) odd rolls.

Both these values exceed \(5\), so the CI is statistically valid.

Example 24.3 (Statistical validity conditions) Consider a situation where \(p = 0.1\).

A sample of size \(n = 10\) is taken, giving \(\hat{p} = 0.1\). The statistical validity conditions are not satisfied: the sampling distribution is not well modelled by a normal distribution (Fig. 24.5, left panel). Using a normal distribution to model the sampling distribution would be silly.

In contrast, assume a sample of size \(n = 100\) is taken, giving \(\hat{p} = 0.1\). The statistical validity conditions are satisfied, and the sampling distribution is well modelled by a normal distribution (Fig. 24.5, right panel).

Two proposed sampling distributions. Left: When the statistical validity conditions are not met. Right: when the statistical validity conditions are met.

FIGURE 24.5: Two proposed sampling distributions. Left: When the statistical validity conditions are not met. Right: when the statistical validity conditions are met.

24.7 Example: female coffee drinkers

A study of \(360\) female college students in the United States (Kelpin et al. 2018) found that \(61\) drank coffee daily. The unknown parameter is \(p\), the population proportion of female college students in the United States that drink coffee daily.

The sample size is \(n = 360\), and the sample proportion of daily coffee drinkers is \(\hat{p} = 61/360 = 0.16944\). The sample proportion has sampling variation, measured by the standard error:
\[ \text{s.e.}(\hat{p}) = \sqrt{ \frac{ 0.16944 \times (1 - 0.16944)}{360}} = 0.01977. \] An approximate \(95\)% CI is \(0.16944 \pm (2 \times 0.01977)\), or \(0.16944 \pm 0.03954\) (i.e., the margin of error is \(0.03954\)). Equivalently, the approximate \(95\)% CI is from \(0.130\) to \(0.209\), after rounding appropriately. We write:

The sample proportion of female US college students who drank coffee daily is \(\hat{p} = 0.169\) (\(n = 360\)), with an approximate \(95\)% CI from \(0.130\) to \(0.209\).

That is, the plausible values for \(p\) that may have led to this value of \(\hat{p} = 0.1694\) are between \(0.130\) and \(0.209\). (This CI may or may not contain the true proportion \(p\).) This CI is statistically valid.

Many decimal places are used in the working, but final answers are rounded.

24.8 Chapter summary

To compute a confidence interval (CI) for a proportion, compute the sample proportion, \(\hat{p}\), and identify the sample size \(n\). Then compute the standard error, which quantifies how much the value of \(\hat{p}\) varies across all possible samples:
\[ \text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1-\hat{p})}{n}}. \] The margin of error is (Multiplier\(\times\)standard error), where the multiplier is \(2\) for an approximate \(95\)% CI (using the \(68\)--\(95\)--\(99.7\) rule). Then the CI is:
\[ \hat{p} \pm \left( \text{Multiplier}\times\text{standard error} \right). \] The statistical validity conditions should also be checked.

You must use proportions in these formulas, not percentages; that is, use values between \(0\) and \(1\) (like \(0.169\) rather than \(16.9\)%).

24.9 Quick review questions

  1. True or false: \(p\) is called a parameter.
  2. True or false: The value of \(p\) will vary from sample to sample.
  3. True or false: The standard error refers to the sampling variation in \(p\).
  4. Suppose \(n = 50\) and \(\hat{p} = 0.4\). What is the standard error of \(\hat{p}\)?

24.10 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 24.1 A study of hiccups (G.-W. Lee et al. 2016) found that, of \(864\) patients examined (across different studies) who had hiccups, \(708\) were male.

  1. Compute the sample proportion of people with hiccups who are male.
  2. Find an approximate \(95\)% CI for the proportion of people with hiccups who are male.
  3. Check if the statistical validity conditions are met or not.
  4. Draw a sketch of how the sample proportion varies for samples of size \(864\).

Exercise 24.2 A study of how paramedics administer pain medication (Lord, Cui, and Kelly 2009) found that \(791\) of patients reporting pain did not receive pain relief, out of \(1766\) patients in the study who reported pain.

  1. Compute the sample proportion of patient who received pain medication.
  2. Find an approximate \(95\)% CI for the proportion of patients who receive pain medication.
  3. Check if the statistical validity conditions are met or not.
  4. Draw a sketch of how the sample proportion varies for samples of size \(1766\).

Exercise 24.3 A study of the eating habits of university students in Canada (Mann and Blotnicky 2017) found that \(8\) students out of \(154\) met the recommendation for eating a sufficient number of servings of grains each day.

  1. Find an approximate \(95\)% CI for the population proportion of Canadian students that meet the recommendation for eating a sufficient number of servings of grains each day.
  2. Would these results be likely to apply to US university students? Explain.

Exercise 24.4 A study of koalas (C. E. Dexter et al. 2018) found that \(18\) of the \(n = 51\) koalas studied in a certain area over \(30\) months had crossed at least one road during that time. The unknown parameter is \(p\), the population proportion of koalas that had crossed at least one road over the \(30\) months.

  1. Find an approximate \(95\)% CI for the proportion of koalas that had crossed the road at least once.
  2. Check if the statistical validity conditions are met or not.
  3. Draw a sketch of how the sample proportion varies for samples of size \(51\).

Exercise 24.5 A study of salt intake in the United Kingdom (Sutherland et al. 2012) found that \(2\ 182\) out of the \(6\ 882\) people sampled in 2007 'generally added salt at the table'. Find an approximate \(95\)% CI for the population proportion of Britons that generally add salt at the table.

Exercise 24.6 A study of turbine failures (Myers, Montgomery, and Vining 2002; Nelson 1982) ran \(42\) turbines for around \(3\ 000\) hours, and found that nine developed fissures (small cracks). Find a \(95\)% CI for the true proportion of turbines that would develop fissures after \(3\ 000\) hours of use. Are the statistical validity conditions satisfied?

The study also ran \(39\) turbines for around \(400\) hours, and found that zero developed fissures. Find a \(95\)% CI for the true proportion of turbines that would develop fissures after \(400\) hours of use. Are the statistical validity conditions satisfied?

Exercise 24.7 A study of young Canadians aged \(12\)--\(24\) (Hammond, Reid, and Zukowski 2018) found \(365\) of the \(1516\) respondents reported sleeping difficulties after consuming energy drinks. Find a \(95\)% CI for the true proportion of turbines that would develop fissures after \(3\ 000\) hours of use. Are the statistical validity conditions satisfied?