2.1 Visualising the data and checking assumptions

We will now visualise the data to gain an appreciation of the difference in proportions between groups. The below plot is again a stacked bar chart and provides a visual breakdown between those who do and do not use Facebook for each age group.

To carry out the hypothesis test, we will again use the Normal distribution due to the Central Limit Theorem. (Note: some statistical software packages apply a small 'continuity correction' to the estimates that provides slightly improved confidence intervals.) However, this means that the following conditions apply:

Two-sample test of proportion conditions:

  • \(n_1p_1 \geq 5\) and \(n_1(1 - p_1) \geq 5\)
  • \(n_2p_2 \geq 5\) and \(n_2(1 - p_2) \geq 5\).

Let's now check and see whether the conditions have been met. Since, for this test, we do not know the true value of \(p_1\) or \(p_2\), we will instead use \(\hat{p}_1\) and \(\hat{p}_2\):

  • \(n_1\hat{p}_1 = 220\times 0.7 = 154\) which is greater than 5
  • \(n_1(1 - \hat{p}_1) = 220\times (1 - 0.7) = 220\times (0.3) = 66\) which is greater than 5.
  • \(n_2\hat{p}_2 = 416\times 0.77 = 320.32\) which is greater than 5
  • \(n_2(1 - \hat{p}_2) = 416\times (1 - 0.77) = 416\times (0.23) = 95.68\) which is greater than 5.

Therefore, the conditions have been met and we are now ready to carry out the hypothesis test.