## 20.7 Summary: Finding a CI for $$p$$

The procedure for computing a confidence interval (CI) for a proportion is:

• Compute the sample proportion, $$\hat{p}$$, and identify the sample size $$n$$.
• Compute the standard error, which quantifies how much the value of $$\hat{p}$$ varies from one sample to the next:

$\text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1-\hat{p})}{n}}.$

• Find the multiplier: this is $$2$$ for an approximate 95% CI using the 68–95–99.7 rule. (Note: (Multiplier$$\times$$standard error) is called the margin of error.)
• Compute:

$\hat{p} \pm \left( \text{Multiplier}\times\text{standard error} \right).$

You must use proportions in this formula, not percentages (that is, values like 0.23 and not 23%). Example 20.7 (NHANES data) For the NHANES data, first seen in Sect. 12.10, the unknown parameter is $$p$$, the population proportion of Americans that currently smoke.

In the study, 1466 out of the 3211 respondents who reported their smoking status said they currently smoked: $$\hat{p}= 1466\div 3211 = 0.4566$$.

What is the population proportion $$p$$ that currently smoke? We don’t know, and the estimate of $$p$$ from every sample is likely to be different. The standard error is $$\text{s.e.}(\hat{p}) = 0.00879$$, so the approximate 95% CI for $$p$$ is $$0.4566\pm 0.01758$$, or from 0.439 to 0.474. (Check the calculations!)

For the conclusions to be statistically valid, the number of smokers must exceed 5, and the number of non-smokers must exceed 5. Both are true. The CI appears to be statistically valid.

We write:

Based on the sample, we are approximately 95% confident that the interval from from 0.429 to 0.474 straddles the population proportion of smokers in the USA.