4.7 z-Test of Two Proportions
The z-test uses the difference in sample proportions \(\hat{d} = p_1 - p_2\) as an estimate of the difference in population proportions \(\delta = \pi_1 - \pi_2\) to evaluate an hypothesized difference in population proportions \(d_0 = \pi_0 - \pi_1\) and/or construct a \((1−\alpha)\%\) confidence interval around \(\hat{d}\) to estimate \(\delta\) within a margin of error \(\epsilon\).
The z-test applies when the central limit theorem conditions hold so that the normal distribution approximates the binomial distribution.
- the sample is independently drawn, meaning random assignment (experiments) or random sampling without replacement from \(n < 10\%\) of the population (observational studies),
- there are at least \(n_i p_i >= 5\) successes and \(n_i (1 - p_i) >= 5\) failures for each group \(i\),
- the sample sizes are both \(n_i >= 30\), and
- the probability of success for each group is not extreme, \(0.2 < \pi_i < 0.8\).
If these conditions hold, the sampling distribution of \(\delta\) is normally distributed around \(\hat{d}\) with standard error \(se_\hat{d} = \sqrt{\frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 − p_2)}{n_2}}\). The measured values \(\hat{d}\) and \(se_\hat{d}\) approximate the population values \(\delta\) and \(se_\delta\). Define a \((1 − \alpha)\%\) confidence interval as \(\hat{d} \pm z_{\alpha / 2}se_\hat{d}\) or test the hypothesis of \(d = d_0\) with test statistic \(z = \frac{\hat{d} − d_0}{se_{d_0}}\) where \(se_{d_0} = \sqrt{p^*(1 - p^*) \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}\) and \(p^*\) is the overall success probability.