4.7 z-Test of Two Proportions

The z-test uses the difference in sample proportions \(\hat{d} = p_1 - p_2\) as an estimate of the difference in population proportions \(\delta = \pi_1 - \pi_2\) to evaluate an hypothesized difference in population proportions \(d_0 = \pi_0 - \pi_1\) and/or construct a \((1−\alpha)\%\) confidence interval around \(\hat{d}\) to estimate \(\delta\) within a margin of error \(\epsilon\).

The z-test applies when the central limit theorem conditions hold so that the normal distribution approximates the binomial distribution.

the sample is independently drawn, meaning random assignment (experiments) or random sampling without replacement from \(n < 10\%\) of the population (observational studies),
there are at least \(n_i p_i >= 5\) successes and \(n_i (1 - p_i) >= 5\) failures for each group \(i\),
the sample sizes are both \(n_i >= 30\), and
the probability of success for each group is not extreme, \(0.2 < \pi_i < 0.8\).

If these conditions hold, the sampling distribution of \(\delta\) is normally distributed around \(\hat{d}\) with standard error \(se_\hat{d} = \sqrt{\frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 − p_2)}{n_2}}\). The measured values \(\hat{d}\) and \(se_\hat{d}\) approximate the population values \(\delta\) and \(se_\delta\). Define a \((1 − \alpha)\%\) confidence interval as \(\hat{d} \pm z_{\alpha / 2}se_\hat{d}\) or test the hypothesis of \(d = d_0\) with test statistic \(z = \frac{\hat{d} − d_0}{se_{d_0}}\) where \(se_{d_0} = \sqrt{p^*(1 - p^*) \left(\frac{1}{n_1} + \frac{1}{n_2}\right)}\) and \(p^*\) is the overall success probability.