2.1 Visualising the data and checking assumptions
We will now visualise the data to gain an appreciation of the difference in proportions between groups. The below plot is again a stacked bar chart and provides a visual breakdown between those who do and do not use Facebook for each age group.
To carry out the hypothesis test, we will again use the Normal distribution due to the Central Limit Theorem. (Note: some statistical software packages apply a small 'continuity correction' to the estimates that provides slightly improved confidence intervals.) However, this means that the following conditions apply:
Two-sample test of proportion conditions:
- n1p1≥5 and n1(1−p1)≥5
- n2p2≥5 and n2(1−p2)≥5.
Let's now check and see whether the conditions have been met. Since, for this test, we do not know the true value of p1 or p2, we will instead use ˆp1 and ˆp2:
- n1ˆp1=220×0.7=154 which is greater than 5
- n1(1−ˆp1)=220×(1−0.7)=220×(0.3)=66 which is greater than 5.
- n2ˆp2=416×0.77=320.32 which is greater than 5
- n2(1−ˆp2)=416×(1−0.77)=416×(0.23)=95.68 which is greater than 5.
Therefore, the conditions have been met and we are now ready to carry out the hypothesis test.