## 20.2 Sampling intervals: Known proportion

The possible values of the sample proportions \(\hat{p}\)
can be described by an
approximate *normal distribution*,
as just discussed.
This enables the 68–95–99.7 rule
to be applied;
for example,
about 68% of the time with sets of 25 rolls,
the sample proportion of even rolls will be between
\(0.5\) give-or-take *one* standard deviation
(that is, give-or-take 0.1).
So, about 68% of the time,
the proportion of even rolls in a set of 25 rolls will be between:

- \(0.5 - 0.1 = 0.4\) and
- \(0.5 + 0.1 = 0.6\).

Similarly,
about 95% of the time,
the proportion of even rolls will be between
\(0.5\) give-or-take *two* standard deviations,
or between:

- \(0.5 - (2\times0.1) = 0.3\) and
- \(0.5 + (2\times0.1) = 0.7\).

This interval tell us what values of \(\hat{p}\) are likely to be observed in samples of size 25. Most of the time (i.e., approximately 95% of the time), the value of \(\hat{p}\) is expected to be between 0.30 and 0.70. (For instance, in the animation above, all ten sets of 25 rolls (or 100%) had a sample proportion betweeen 0.30 and 0.70.)

More formally, the sample proportion \(\hat{p}\) is likely to lie within the interval

\[
p \pm (\text{multiplier} \times \text{s.e.}(\hat{p})),
\]
where \(\text{s.e.}(\hat{p})\) is the
*standard error of the sample proportion*
(calculated using Eq. (20.1)).
The symbol ‘\(\pm\)’ means ‘plus or minus,’ or ‘give-or-take.’

The *multiplier* depends on how confident
we wish to be that the
interval contains the value of \(\hat{p}\).

For a 95% interval—the most common *level of confidence*—the multiplier is *approximately* 2,
based on the 68–95–99.7 rule:
Approximately 95% of observations are within
*two* standard deviations of the value of \(p\)
(the mean of the normal distribution in
Fig. 20.1).

That is,
the *approximate* 95% interval is:

\[ p \pm (2 \times \text{s.e.}(\hat{p}) ). \] For a 90% interval, either tables or a computer would be used to find the correct multiplier, since the 68–95–99.7 rule isn’t helpful.

In practice,
95% intervals are the most common,
and we’ll use a multiplier of \(2\) to find an
*approximate* 95% interval when computing the interval
without using software.
Software can be used for any other percentage interval
(or for an *exact* 95% interval).

In general, higher confidence means wider intervals (Fig. 20.2), since wider intervals are needed to be more certain that the interval contains \(\hat{p}\).