# 20 Confidence intervals for one proportion

So far, you have learnt to ask a RQ, design a study, describe and summarise the data, understand how sample statistics vary from sample to sample.
**In this chapter**, you will learn to construct *confidence intervals* for one proportion.
You will learn to:

- identify situations where the analysis of one sample proportion is appropriate.
- form confidence intervals for single proportions.

## 20.1 Sampling distribution: known proportion

Suppose a fair, six-sided die is rolled 25 times (Fig. 18.1, top-right panel).
What proportion of the rolls will produce an even number?
That is, what will be the *sample proportion* of even numbers?

Of course, since no-one knows exactly what will happen for *any* individual roll, no-one knows what proportion will be even for any sample of 25 rolls.
In addition, the proportion of the 25 rolls that will be even will not be the same for every sample of 25 rolls.
The sample proportion will *vary* from sample to sample: *sampling variation* exists.

Remember: Studying a sample leads to the following observations:

- Each sample is likely to be different.
- Our sample is just one of countless possible samples from the population.
- Each sample is likely to produce a different value for the sample proportion.
- Hence we only observe one of the many possible values for the sample proportion.

Since many values for the sample proportion are possible, the possible values of the sample proportion vary (called *sampling variation*) and have a *distribution* (called a *sampling distribution*).

We have seen that the sample statistic often varies with a normal distribution (whose standard deviation is called the *standard error*).
However, being more specific when describing the sampling distribution is useful.

To better understand the sampling distribution, statistical theory could be used... or thousands of repetitions of a sample of 25 rolls could be performed... or a computer could *simulate* many sets of 25 rolls.
Let's simulate rolling a die 25 times, using just 10 sets of 25 rolls;
see the animation below.

The proportion of even rolls varies from set to set. For these \(10\) sets of \(n = 25\) rolls, the percentage of even rolls ranged from \(\hat{p} = 0.32\) even rolls to \(\hat{p} = 0.60\) even rolls.

The sample proportion of even rolls would be expected to vary around \(p = 0.5\), since three of the six faces of the die are even numbers (the *population proportion*), using the classical approach to probability.
Of course, the sample proportion could be very small or very high by chance, but we wouldn't expect to see that very often (Fig. 18.1, top-right panel).

In this example, the *population proportion* of even rolls is known to be \(p = 0.5\).
Each sample of \(n = 25\) rolls is a *sample* of all possible sets of \(n = 25\) rolls, and the *sample* proportion of even rolls is denoted by \(\hat{p}\).

For any sample of \(25\) rolls, the value of \(\hat{p}\) will be unknown until we roll the die.
The proportion of even rolls is likely to vary from sample to sample; that is, the sample proportions exhibit sampling variation, and the *amount* of sampling variation is quantified using a *standard error*.

\(p\) refers to the *population* proportion, and \(\hat{p}\) refers to the *sample* proportion.

The symbol \(\hat{p}\) is pronounced 'pee-hat'.

Suppose a fair die was rolled \(25\) times, and this was repeated *thousands* of times (not just \(10\) times as in
the animation above),
and the proportion of even rolls was recorded for every one of those thousands of sets of \(25\) rolls.
These thousands of sample proportions \(\hat{p}\), one from every sample of \(25\) rolls, could be graphed using a histogram;
see the animation below.

The shape of the histogram is roughly a normal distribution.
This is no accident: statistical theory says this will happen (when certain conditions are met: see Sect. 20.6).
The mean of this distribution is the *sampling mean*, and its value is \(p\); the standard deviation for this distribution is the *standard error*, denoted \(\text{s.e.}(\hat{p})\).

The *values* of the mean and standard deviation of the normal distribution
the animation above
can even be determined (Fig. 18.2).
The sampling distribution for \(\hat{p}\) is:

- an approximate normal distribution;
- centred around a
*sampling mean*whose value is \(p = 0.5\) (the population proportion); - with a standard deviation (called the
*standard error*\(\text{s.e.}(\hat{p})\)) whose value is \(0.1\) (where this number comes from will be revealed soon).

This distribution is called a *sampling distribution*, as discussed in Sect. 18.1.
The standard deviation of the sampling distribution is called a *standard error*, since it measures how much a sample statistic (in this case, a sample proportion \(\hat{p}\)) varies from sample to sample.

Since the variation in the sample proportions can be described, a picture of this normal distribution can be drawn (Fig. 20.1).
We still don't know *exactly* what we'll find next roll... but we have some idea of *how* the sample proportion is likely to vary in sets of \(25\) rolls.

The parameter \(p\) and the statistic \(\hat{p}\) are both *proportions*.

However, the *average value* of the sample proportion can be described by a *sampling mean*, whose value is \(p\), and the amount of variation in the sample proportions can be described by a *standard deviation* (called a *standard error* in this context) \(\text{s.e.}(\hat{p})\).

For example, the sampling mean of the sampling distribution is the 'average' value of all possible sample proportions, \(\hat{p}\).

The value of \(p\) (the *population* proportion: the proportion of even numbers on the die) remains the same, but the value of \(\hat{p}\) (the *sample* proportion: the proportion of even numbers in the sample of 25 rolls) is not the same in every sample of 25 rolls.
That is, \(\hat{p}\) varies, and exhibits *sampling variation*.
The variation in \(\hat{p}\) from sample to sample is measured by the *standard error of the sample proportion*, written as \(\text{s.e.}(\hat{p})\).

The value of the sampling mean is \(p\).
The value of the **standard error for a sample proportion**, when the value of \(p\) is known, is
\[\begin{equation}
\text{s.e.}(\hat{p}) = \sqrt{\frac{p \times (1 - p)}{n}},
\tag{20.1}
\end{equation}\]
where \(n\) is the number of rolls, and \(p\) is the population proportion.
For this example, there are \(n = 25\) rolls of a die, and the population proportion of even rolls is \(p = 0.5\).
Then, the standard error of the sample proportion is

\[\begin{equation}
\text{s.e.} (\hat{p}) = \sqrt{\frac{0.5 \times (1-0.5)}{25}} = 0.1.
\tag{20.2}
\end{equation}\]
This standard error is the standard deviation of the normal distribution in Fig. 20.1.

Recall that the *the standard error is the standard deviation of the distribution of a sample statistic*, that measures how much a sample estimate is varies across all possible samples.
In that sense, the standard error of the proportion measures how precisely \(\hat{p}\) estimates the population proportion \(p\) in samples of size \(n\).

Almost always, the value of \(p\) is unknown. This situation is studied from Sect. 20.3 onwards.

**Definition 20.1 (Sampling distribution of a sample proportion when $p$ is known) **When the value of \(p\) is *known*, the *sampling distribution of the sample proportion* is described by

- an approximate normal distribution,
- centred around the sampling mean whose value is \(p\),
- with a standard deviation (called the
*standard error*of \(\hat{p}\)) \(\text{s.e.}(\hat{p})\), whose value is

\[ \text{s.e.}(\hat{p}) = \sqrt{\frac{ p \times (1-p)}{n}}, \] when certain conditions are met (Sect. 20.6), where \(n\) is the size of the sample, and \(p\) is the population proportion. In general, the approximation gets better as the sample size gets larger.

From the die example, the values of \(\hat{p}\) will vary with an approximate normal distribution, centred around \(p = 0.5\), and with a standard error of \(\text{s.e.}({\hat{p}}) = 0.1\). This distribution is shown in Fig. 20.1. Based on this picture, how often would a value of \(\hat{p}\) larger than 0.80 be expected?

Figure 20.1 suggests that, while not impossible, 0.80 or greater will be observed rarely.

## 20.2 Sampling intervals: known proportion

Since the possible values of the sample proportions \(\hat{p}\) can be described by an approximate *normal distribution*, the 68--95--99.7 rule can be applied.
For example (see Fig. 20.1), about 68% of the time, a set of 25 rolls will have between \(0.5\) give-or-take *one* standard deviation (that is, give-or-take 0.1) rolls that are even.
So, about 68% of the time, the proportion of even rolls in a set of 25 rolls will be between \(0.5 - 0.1 = 0.4\) and \(0.5 + 0.1 = 0.6\).

Similarly, about 95% of the time, the proportion of even rolls will be between \(0.5\) give-or-take *two* standard deviations, or between \(0.5 - (2\times0.1) = 0.3\) and \(0.5 + (2\times0.1) = 0.7\).
This interval tell us what values of \(\hat{p}\) are likely to be observed in samples of size 25.
Most of the time (i.e., approximately 95% of the time), the value of \(\hat{p}\) is expected to be between 0.30 and 0.70.
(In the animation above,
all ten sets of 25 rolls (or 100%) had a sample proportion between 0.30 and 0.70.)

Formally, the sample proportion \(\hat{p}\) is likely to lie within the interval

\[
p \pm (\text{multiplier} \times \text{s.e.}(\hat{p})),
\]
where \(\text{s.e.}(\hat{p})\) is the *standard error of the sample proportion* (calculated using Eq. (20.1)).

The symbol '\(\pm\)' means 'plus or minus', or (colloquially) 'give-or-take'.

The *multiplier* depends on how confident we wish to be that the interval contains the value of \(\hat{p}\).
For a 95% interval---the most common *level of confidence*---the multiplier is *approximately* 2, based on the 68--95--99.7 rule:
Approximately 95% of observations are within *two* standard deviations of the value of \(p\) (the mean of the normal distribution in Fig. 20.1).
That is, the *approximate* 95% interval is:

\[ p \pm (2 \times \text{s.e.}(\hat{p}) ). \] For a 90% interval, either tables or a computer would be used to find the correct multiplier, since the 68--95--99.7 rule isn't helpful.

In practice, 95% intervals are the most common, and we'll use a multiplier of \(2\) to find an *approximate* 95% interval when computing the interval without using software.
Software can be used for any other percentage interval (or for an *exact* 95% interval).

## 20.3 Sampling distribution: unknown proportion

In the die example (Sect. 20.1), the sampling distribution for the sample proportion was given, including an equation for computing the standard error for the sample proportion for samples of size \(n\), *when the value of \(p\) was known*.

However, usually the value of \(p\) (the *parameter*) is unknown; after all, the reason for taking a sample is to *estimate* the unknown value of \(p\).
When \(p\) is unknown, the best available estimate can be used, which is \(\hat{p}\).
*When the value of \(p\) is unknown*, the standard error of the sample proportion (written \(\text{s.e.}(\hat{p})\)) is approximately

\[\begin{equation}
\text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1 - \hat{p})}{n}}.
\tag{20.3}
\end{equation}\]

**Definition 20.2 (Sampling distribution of a sample proportion when $p$ is unknown) **When the value of \(p\) is *unknown*, the *sampling distribution of the sample proportion* is described by

- an approximate normal distribution,
- centred around the sampling mean, whose value is \(p\),
- with a standard deviation (called the
*standard error*of \(\hat{p}\)) of

\[\begin{equation} \text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1-\hat{p})}{n}}, \tag{20.4} \end{equation}\] when certain conditions are met (Sect. 20.6), where \(n\) is the size of the sample, and \(\hat{p}\) is the sample proportion. In general, the approximation gets better as the sample size gets larger.

Quantity | Description |
---|---|

Describing the population | Proportion of successes \(p\) |

Describing a sample | Proportion of successes \(\hat{p}\) |

Describing sample proportions (\(\hat{p}\)) across all possible samples | Vary with approx. normal distribution (under certain conditions): sampling mean: \(p\); standard deviation \(\text{s.e.}(\hat{p})\) |

Let's *pretend* for the moment that the proportion of even rolls of a fair die is *unknown* (to demonstrate ideas).
In this case, an *estimate* of the proportion of even rolls can be found by rolling a die \(n = 25\) times and computing \(\hat{p}\).

Suppose 11 of the \(n = 25\) rolls produced an even number, so that \(\hat{p} = 11/25 = 0.44\).
Then (from Definition 20.2),

\[
\text{s.e.}(\hat{p}) = \sqrt{ \frac{ 0.44 \times (1 - 0.44)}{25}} = 0.099277.
\]
(This is very similar to the value of 0.1, the value of the standard error when the value of \(p\) was known; see Sect. 20.1.)

Hence, the sample proportions will vary with an approximate normal distribution (Fig. 20.2), centred around the unknown value of \(p\) with a standard deviation of \(\text{s.e.}(\hat{p}) = 0.099277\).

Using the 68--95--99.7 rule again:
about 95% of the values of \(\hat{p}\) are expected to be between \(p - 0.199\) and \(p + 0.199\).
That is, for some *known* value of \(p\), the *unknown* value of \(\hat{p}\) is expected to vary between \(p - 0.199\) and \(p + 0.199\) (Fig. 20.3).

However, we *know* the value of \(\hat{p}\), but the value of \(p\) is *unknown*.
So we could ask: What values of \(p\) could reasonably be expected to produce our observed value of \(\hat{p}\)?
See Fig. 20.1, which shows that values of \(p\) between roughly \(0.3\) and \(0.6\) could be reasonably expected to produce the value \(\hat{p} = 0.44\).
More precisely, a value of \(p\) between \(0.44 - 0.199\) and \(0.44 + 0.199\) could reasonably be expected to produce the observed value of \(\hat{p}\) (Fig. 20.4).

This is equivalent to saying that we are reasonably sure that a population with a value of \(p\) between \(0.24\) and \(0.64\) could reasonably have produced the observed value of \(\hat{p} = 0.44\).
This interval is called a *confidence interval* (or CI), based on ideas from Sect. 20.2.

In summary, using \(\hat{p} = 0.44\) and \(\text{s.e.}(\hat{p}) = 0.0993\), the (approximate) 95% CI is \(0.44 \pm (2 \times 0.0993)\), or from \(0.241\) to \(0.639\). This CI straddles the known population proportion of \(p = 0.5\), though we would not know this if \(p\) was unknown (which is usually the case).

In this case, we know the value of the population parameter: \(p = 0.5\).
Usually we do *not* know the value of the parameter:.
After all, that's why we take a sample; to *estimate* the value of the population proportion.

## 20.4 Confidence intervals: unknown proportion

Suppose *thousands* of people rolled a die 25 times, and *each* person found \(\hat{p}\) for their sample, and hence computed the CI for their sample of 25 rolls.
Every sample of 25 rolls could produce a different estimate \(\hat{p}\), and so a different value for \(\text{s.e.}(\hat{p})\), and hence a different 95% CI.
However, *about 95% of these thousands of confidence intervals from those thousands of repetitions would straddle the true proportion \(p\)*.

Since we usually don't know the value of \(p\), and we usually only have one sample (and hence one CI), in general *we never know whether the single CI computed from our single sample straddles \(p\) or not*.

Again, consider letting the computer *simulate* the situation.
Suppose the process of recording the sample proportion of even numbers in \(n = 25\) rolls is repeated 50 times, and for each of those 50 sets of 25 rolls a CI is produced
(see the animation below).

Most of those CIs straddle the population proportion of \(p = 0.5\) (shown as solid lines)... but some do not (shown as dashed lines). Of course, since the value of \(p\) is usually unknown, we never know if our CI contains \(p\) or not.

**Definition 20.3 (Confidence interval) **A *confidence interval* is an interval in which the population *parameter* is likely to be contained, if we found many samples the same way.
If a 95% confidence interval (or CI) is computed from each sample, about 95% of the CIs would straddle the *parameter* of interest.
This interval is called a *confidence interval*.

In general, a CI for the population proportion \(p\) is found using

\[
\hat{p} \pm ( \text{multiplier} \times \text{s.e.}(\hat{p})),
\]
where the multiplier is 2 for an *approximate* 95% CI (based on the 68--95--99.7 rule).

**Definition 20.4 (Confidence interval for p) **A *confidence interval* (CI) for the unknown value of the population proportion \(p\) is
\[\begin{equation}
\hat{p} \pm ( \text{multiplier} \times \text{s.e.}(\hat{p})),
\tag{20.5}
\end{equation}\]
where

\[
\text{s.e.}(\hat{p})
=
\sqrt{\frac{ \hat{p} \times (1 - \hat{p}) }{n}}
\]
is the *standard error* of \(\hat{p}\), where \(\hat{p}\) is the sample proportion, and \(n\) is the sample size.
For an *approximate* 95% CI, the multiplier is 2.

In general, higher confidence means wider intervals (Fig. 20.5), since wider intervals are needed to be more certain that the interval contains \(\hat{p}\). Try changing the confidence level for the CI in the interaction below.

**Example 20.1 (Energy drinks in Canadian youth) **A study of young Canadians aged 12--24 (Hammond, Reid, and Zukowski 2018) found that 365 of the 1516 respondents reported sleeping difficulties after consuming energy drinks.
The unknown parameter is \(p\), the *population* proportion of young Canadians reporting sleeping difficulties.

The sample proportion reporting sleeping difficulties after consuming energy drinks is \(\hat{p} = 365/1516 = 0.241\).
As usual, the sample proportion would vary from one sample of size \(n = 1516\) to another; *sampling variation* exists.
The *standard error* (Definition 20.4) quantifies how much the sample proportion is likely to vary from sample to sample:
\[\begin{align*}
\text{s.e.}(\hat{p})
&= \sqrt{\frac{\hat{p}\times(1 - \hat{p})}{n}}\\
&= \sqrt{\frac{0.241 \times (1 - 0.241)}{1516}} = 0.01098449,
\end{align*}\]
or about \(0.011\).
So, in samples of size 1516, the approximate 95% CI (Definition 20.4) is between \(0.241 - (2\times 0.01098449) = 0.2190\) and \(0.241 + (2\times 0.01098449) = 0.2627\).
The *approximate* 95% CI is from 0.219 to 0.263.

This CI may or may not straddle the population proportion \(p\); it is *likely* that the interval straddles the value of \(p\).
In other words, it is plausible that the sample proportion of \(p = 0.241\) may have come from a population with a proportion somewhere between 0.219 and 0.263.

Notice that many decimal places are used in the working, but final answers are rounded.

**Example 20.2 (Koalas crossing roads) **A study of koalas (C. E. Dexter et al. 2018) found that 18 of the \(n = 51\) koalas studied in a certain area over 30 months had crossed at least one road during that time.
The unknown parameter is \(p\), the *population* proportion of koalas that had crossed at least one road over the 30 months.

The sample proportion having crossed a road is \(\hat{p} = 18/51 = 0.3529\).
The standard error (Definition 20.4) is
\[
\text{s.e.}(\hat{p})
= \sqrt{ \frac{0.3529 \times (1 - 0.3529)}{51} } = 0.06692.
\]
An approximate 95% CI, then, is \(0.3529 \pm (2 \times 0.06692)\), or \(0.3529 \pm 0.1338\).
The *margin of error* is \(0.1338\).
Computing the 'plus' and the 'minus' bits, the approximate 95% CI is from 0.219 to 0.487 (after rounding appropriately).

The approximate 95% CI for the population proportion of koalas that crossed at least one road in the last 30 months is from 0.219 to 0.487. That is, it is plausible that the sample proportion of \(\hat{p} = 0.3529\) may have come from a population with a proportion somewhere between 0.219 and 0.487.

The research article reports the exact 95% CI as from 22% to 48%, very close to our approximate 95% CI.

**Example 20.3 (CI) **A study of how paramedics administer pain medication (Lord, Cui, and Kelly 2009) found that 791 of patients reporting pain did *not* receive pain relief, out of 1766 patients studied who reported pain.
That is, \(\hat{p} = 791/1766 = 0.4479049\) and \(n = 1766\).
Hence, \(\text{s.e.}(\hat{p}) = 0.01183326\), so the *approximate* 95% CI is from

\[
0.4479049 - (2\times 0.0118332) = 0.424238 \text{\ \ to\ \ }
0.4479049 + (2\times 0.0118332) = 0.471571,
\]
or from \(0.424\) to \(0.472\).
This agrees with the article.
(Notice that many decimal places were kept in the working, but final answers were rounded.)

## 20.5 Interpretation of a CI

The *correct* interpretation (Definition 20.3) of a 95% CI is the following:

Ifsamples were repeatedly taken many times, and the 95% confidence interval computed for each sample, 95% of these confidence intervals formed would contain the populationparameter.

In Sect. 20.4, the CI was interpreted as giving a range of values of \(p\) that could reasonably be expected to produce the observed value of \(\hat{p}\). This is close to the correct interpretation.

However, commonly the CI is interpreted as having a 95% chance of containing the value of population parameter \(p\).
This is not strictly correct (since the CI either *does* or *does not* contain the value of \(p\)), but is common.
More details on interpreting a CI are given in Sect. 21.2.

## 20.6 Statistical validity conditions

The histogram in Sect. 20.1, shows the proportion of \(n = 25\) rolls that were even for many samples; it has an approximate normal distribution.
Because of this, the 68--95--99.7 rule could be used to form the approximate 95% CIs.
However, the distribution of the sample proportions only looks like a normal distribution under certain conditions.
Certain conditions must be true for the calculations to be sensible, or **statistically valid**.

**Definition 20.5 (Statistical validity) **A result is *statistically valid* if the conditions for the underlying mathematical calculations and assumptions to be approximately correct are met.
Every confidence interval has statistical validity conditions.

**Example 20.4 (Statistical validity analogy) **Suppose your doctor asks you to get a blood test, after fasting (refraining from eating) for 12 hours before your test.

After leaving the doctor, you go to a restaurant for dinner. You start the next day with a big breakfast, have lunch at a cafe, and then go for your blood test. Your blood is extracted, analysed in the pathology lab, and your doctor is emailed the results of the blood test.

Since you did not fast as required, the results may or may not be valid.
The doctor can still learn *something*... but not as much as if you had followed the instructions.
Similarly, if the conditions for computing the confidence interval are not met, the results may be suspect.

The *statistical validity conditions* for creating CI for a single proportion is that:

- the number of individuals in the group of interest must exceed 5,
**and** - the number of individuals in the group
*not*of interest must exceed 5.

These conditions ensure that the sampling distribution of \(\hat{p}\) has an approximate normal distribution, so that the 68--95--99.7 rule (approximately) applies. If this condition is not met, the sampling distribution may not have normal distribution, so the 68--95--99.7 rule (used to create the CI) may be inappropriate, and so the CI may also be inappropriate.

**Example 20.5 (Energy drinks in Canadian youth) **In Example 20.1, the approximate 95% CI was from 0.192 to 0.236.
This CI for the proportion will be *statistically* valid if:

- the number of youth in the sample who experienced sleeping difficulties exceeds 5, which it is (there are 365);
**and** - the number of youth in the sample who
*did not*experience sleeping difficulties exceeds 5, which it is (there are \(1516 - 365 = 1151\)).

The CI is *statistically* valid.
In addition, the CI will be *internally* valid if the study was well designed, and will be *externally* valid if the sample is a simple random sample from the population and is internally valid.

Consider Example 20.2, about koalas crossing roads. Is the CI likely to be statistically, internally and externally valid?

Statistically: The number of koalas that had crossed a road is 18, and the number that had *not* crossed a road is \(51 - 18 = 33\).
Both of these exceed 5, so the CI is statistically valid.

Internally: We do not have enough information about *how* the study was conducted to know, but presumably the scientists conducted the study well.
Externally: A random sample of koalas is unlikely to be obtained (though it may be reasonably representative), so probably not.

**Example 20.6 (Statistical validity) **Consider a situation to estimate the proportion of die rolls that show as a (clearly, this is contrived as we know the population proportion).
The population proportion (using the classical approach to probability) is \(1/6\), or about \(0.167\).

If we repeatedly rolled a die in sets of \(n = 20\) rolls thousands of times, the proportion of rolls that showed as could be recorded for each sample of 20 rolls.
Then, a histogram of the sample proportions could be produced.
Simulating this on a computer, a histogram of the sample proportions (Fig. 20.6, left panel) shows that the normal distribution does a poor job of describing the sampling distribution (the distribution is not even symmetric).
The statistical validity conditions do *not* seem satisfied.

Alternatively, we could repeatedly roll a die in sets of \(n = 60\) rolls thousands of times, and record the proportion of rolls that show as for each sample of 60 rolls.

Then, a histogram of the proportion of for those sets of 60 rolls could be produced. Again simulating on a computer, a histogram of these sample proportions (Fig. 20.6, right panel) shows that the normal distribution does a reasonable job of describing the sampling distribution. The statistical validity conditions seem satisfied.

## 20.7 Summary: finding a CI for \(p\)

To compute a confidence interval (CI) for a proportion, compute the sample proportion, \(\hat{p}\), and identify the sample size \(n\). Then compute the standard error, which quantifies how much the value of \(\hat{p}\) varies across all possible samples:

\[
\text{s.e.}(\hat{p})
=
\sqrt{\frac{ \hat{p} \times (1-\hat{p})}{n}}.
\]
The *margin of error* is (Multiplier\(\times\)standard error), where the multiplier is \(2\) for an approximate 95% CI (using the 68--95--99.7 rule).
Then find the CI:

\[ \hat{p} \pm \left( \text{Multiplier}\times\text{standard error} \right). \]

Always check whether the statistical validity conditions are satisfied.

You must use *proportions* in these formulas, **not** *percentages*; that is, use values between 0 and 1 (like 0.23 rather than 23%).

## 20.8 Example: female coffee drinkers

A study of 360 female college students in the United States (Kelpin et al. 2018) found that 61 drank coffee daily.
The unknown parameter is \(p\), the *population* proportion of female college students in the United States that drink coffee daily.

The sample size is \(n = 360\), and the *sample* proportion of daily coffee drinkers is \(\hat{p} = 61/360 = 0.16944\).
Another sample of 360 students from the same population is likely to produce a different sample proportion \(\hat{p}\) of daily coffee drinkers: the sample proportion has *sampling variation*.
The *standard error* is, using (20.4):

\[ \text{s.e.}(\hat{p}) = \sqrt{ \frac{ 0.16944 \times (1 - 0.16944)}{360}} = 0.01977. \]

An *approximate* 95% CI is \(0.1694 \pm (2 \times 0.01977)\), or \(0.1694 \pm 0.03954\).
(That is, the *margin of error* is \(0.03954\).)
Equivalently, the approximate 95% CI is from \(0.1694 - 0.03954 = 0.12986\) to \(0.1694 + 0.03954 = 0.20894\).
Round appropriately, the approximate 95% CI is from \(0.130\) to \(0.209\).

The plausible values for \(p\) that may have led to this value of \(\hat{p} = 0.1694\) are between 0.130 and 0.209. (This CI may or may not contain the true proportion \(p\).)

This CI is *statistically* valid.
We cannot comment on the internal validity: we would need details of how the study was conducted.

The CI is *externally* valid if the sample is simple random sample of some population, and the study is internally valid.
The CI is approximately *externally* valid if the sample is somewhat representative of some population, and the study is internally valid.

## 20.9 Quick review questions

- True or false: \(p\) is called a
*parameter*. - True or false: The value of \(p\) will vary from sample to sample.

- True or false: The
*standard error*refers to the sampling variation in \(p\).

- Suppose \(n = 50\) and \(\hat{p} = 0.4\). What is the standard error?

## 20.10 Exercises

Selected answers are available in Sect. D.19.

**Exercise 20.1 **A study of salt intake in the United Kingdom (Sutherland et al. 2012) found that 2,182 out of the 6,882 people sampled in 2007 'generally added salt at the table'.
Find an approximate 95% CI for the population proportion of Britons that generally add salt at the table.

**Exercise 20.2 **A study of the eating habits of university students in Canada (Mann and Blotnicky 2017) found that 8 students out of 154 met the recommendation for eating a sufficient number of servings of grains each day.

- Find an approximate 95% CI for the population proportion of Canadian students that meet the recommendation for eating a sufficient number of servings of grains each day.
- Would these results be likely to apply to Australian university students? Explain.

**Exercise 20.3 **A study of hiccups (G.-W. Lee et al. 2016) found that, of 864 patients examined (across different studies) who had hiccups, 708 were male.

- Find an approximate 95% CI for the true proportion of people with hiccups who are male.
- Check if the statistical validity conditions are met or not.
- Draw a sketch of how the sample proportion varies from sample to sample for samples of size 864.

**Exercise 20.4 **A study of turbine failures (Myers, Montgomery, and Vining 2002; Nelson 1982) ran 42 turbines for around 3000 hours, and found that nine developed fissures (small cracks).
Find a 95% CI for the true proportion of turbines that would develop fissures after 3000 hours of use.
Are the statistical validity conditions satisfied?

The study also ran 39 turbines for around 400 hours, and found that zero developed fissures. Find a 95% CI for the true proportion of turbines that would develop fissures after 400 hours of use. Are the statistical validity conditions satisfied?