4.8 One-Sample Proportion z Test

The z-test uses the sample proportion of group \(j\), \(p_j\), as an estimate of the population proportion \(\pi_j\) to evaluate an hypothesized population proportion \(\pi_{0j}\) and/or construct a \((1−\alpha)\%\) confidence interval around \(p_j\) to estimate \(\pi_j\) within a margin of error \(\epsilon\).

The z-test is intuitive to learn, but it only applies when the central limit theorem conditions hold:

the sample is independently drawn, meaning random assignment (experiments), or random sampling without replacement from <10% of the population (observational studies),
there are at least 5 successes and 5 failures,
the sample size is >=30, and
the expected probability of success is not extreme, between 0.2 and 0.8.

If these conditions hold, the sampling distribution of \(\pi\) is normally distributed around \(p\) with standard error \(se_p = \frac{s_p}{\sqrt{n}} = \frac{\sqrt{p(1−p)}}{\sqrt{n}}\). The measured values \(p\) and \(s_p\) approximate the population values \(\pi\) and \(\sigma_\pi\). You can define a \((1 − \alpha)\%\) confidence interval as \(p \pm z_{\alpha / 2}se_p\). Test the hypothesis of \(\pi = \pi_0\) with test statistic \(z = \frac{p − \pi_0}{se_{\pi_0}}\) where \(se_{\pi_0} = \frac{s_{\pi_0}}{\sqrt{n}} = \frac{\sqrt{{\pi_0}(1−{\pi_0})}}{\sqrt{n}}\).

Example

A machine is supposed to randomly churn out prizes in 60% of boxes. In a random sample of n = 40 boxes there are prizes in 20 boxes. Is the machine flawed?

prop.test(20, 40, 0.6, "two.sided", correct = FALSE)

## 
##  1-sample proportions test without continuity correction
## 
## data:  20 out of 40, null probability 0.6
## X-squared = 1.6667, df = 1, p-value = 0.1967
## alternative hypothesis: true p is not equal to 0.6
## 95 percent confidence interval:
##  0.3519953 0.6480047
## sample estimates:
##   p 
## 0.5

The first thing you’ll notice is that prop.test() performs a chi-squared goodness-of-fit test, not a one-proportion Z-test!

chisq.test(c(20, 40-20), p = c(.6, .4), correct = FALSE)

## 
##  Chi-squared test for given probabilities
## 
## data:  c(20, 40 - 20)
## X-squared = 1.6667, df = 1, p-value = 0.1967

It turns out \(P(\chi^2 > X^2)\) equals \(2 \cdot P(Z > z).\) Here is the manual calculation of the chi-squared test statistic \(X^2\) and resulting p-value on 1 dof.

pi_0 <- .6
p <- 20 / 40

observed <- c(p, 1-p) * 40
expected <- c(pi_0, 1-pi_0) * 40

X2 <- sum((observed - expected)^2 / expected)
pchisq(X2, 1, lower.tail = FALSE)

## [1] 0.1967056

And here is the manual calculation of the Z-test statistic \(z\) and resulting p-value.

se <- sqrt(pi_0*(1-pi_0)) / sqrt(40)
z <- (p - pi_0) / se
pnorm(z, lower.tail = TRUE) * 2

## [1] 0.1967056

The 95% CI presented by prop.test() is also not the \(p \pm z_{\alpha / 2}se_p\) Wald interval; it is the Wilson interval!

DescTools::BinomCI(20, 40, method = "wilson")

##      est    lwr.ci    upr.ci
## [1,] 0.5 0.3519953 0.6480047

There are a lot of methods (see ?DescTools::BinomCI), and Wilson is the one Agresti-Coull recommends. If you want Wald, use DescTools::BinomCI() with method = "wald".

DescTools::BinomCI(20, 40, method = "wald")

##      est    lwr.ci    upr.ci
## [1,] 0.5 0.3450512 0.6549488

This matches the manual calculation below.

z_crit = qnorm(1 - .05/2)
se <- sqrt(p*(1-p)) / sqrt(40)

(CI <- c(p - z_crit*se, p + z_crit*se))

## [1] 0.3450512 0.6549488

prop.test() (and chissq.test()) reported a p-value of 0.1967056, so you cannot reject the null hypothesis that \(\pi = 0.6\). It’s good practice to plot this out to make sure your head is on straight.

Incidentally, if you have a margin of error requirement, you can back into the required sample size to achieve it. Just solve the margin of error equation \(\epsilon = z_{\alpha/2}^2 = \sqrt{\frac{\pi_0(1-\pi_0)}{n}}\) for \(n = \frac{z_{\alpha/2}^2 \pi_0(1-\pi_0)}{\epsilon^2}.\)