3.4 Two population proportions

  • Comparing two proportions, like comparing two means, is common

  • A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the two population proportions

  • The null hypothesis states that there is no difference between two population proportions \(p_1\) and \(p_2\)

\[\begin{equation} H_0:~~p_1=p_2 \tag{3.12} \end{equation}\]

  • The alternative hypothesis can take one of three forms:

\[\begin{align} H_1:&~~p_1 \ne p_2 &\text{two-tailed test} \\ \\ H_1:&~~p_1 < p_2 &\text{left-tailed test} \\ \\ H_1:&~~p_1 > p_2 &\text{right-tailed test} \\ \tag{3.13} \end{align}\]

  • A distribution of the differences in samples proportions is found to be a normal distribution in a large samples (according to the central limit theorem CLT)

  • Therefore a test statistic follows a normal distribution withe zero mean and unit variance

\[\begin{align} Z&=\frac{\hat{p}_1-\hat{p}_2}{\sqrt{S_p^2 \bigg(\frac{1}{n_1}+\frac{1}{n_2} \bigg)}} \sim N(0,~1) \\ \\ S_p^2 &=\bar{p}(1-\bar{p}) \\ \tag{3.14} \end{align}\]

  • Here is what each term means:

\[\begin{align} \hat{p}_1&=\frac{x_1}{n_1}~\text{is the proportion of the first sample (chosen from population 1)} \\ \\ \hat{p}_2&=\frac{x_2}{n_2}~\text{is the proportion of the second sample (chosen from population 2)} \\ \\ x_1&~\text{is the the number of successes in the first sample} \\ \\ x_2&~\text{is the the number of successes in the second sample} \\ \\ n_1&~\text{is the size of the first sample} \\ \\ n_2&~\text{is the size of the second sample} \\ \\ S_p^2&~\text{is the pooled (common) variance from both samples} \\ \bar{p}&=\frac{x_1+x_2}{n_1+n_2}~\text{is the pooled (common) proportion from both samples} \end{align}\]

Example 3.7 A bank has recently acquired a new branch in location B. Bank wishes to test the hypothesis that default rate of new customres in the location B is different from default rate of current customers. They sample 200 current customers, and find that 20 have defaulted. In location B, another sample of 100 new customres shows that 6 have defaulted on their loans. Significance level is \(5\%\). Compute the p-value in Excel using function =2*(1-NORM.S.DIST(ABS(z);TRUE)).

Example 3.8 Use the same information from previous example 3.7 to test the hypothesis that default rate of new customres in the location B is lower than default rate of current customers. Compute the p-value in Excel using function =1-NORM.S.DIST(ABS(z);TRUE).