Chapter 3 Chi-squared Test of Independence
In the previous topic, we compared proportions between two independent populations via the two-sample test of proportions. The chi-squared test of independence also allows such comparisons, and there is no restriction on the number of categories (or populations). For example, we could use the chi-squared test of independence to test whether there is a significant difference in the proportion of US adults who say they use Facebook between two groups: those aged 18-29, and those aged 30-49. We could also test whether there is a difference between more than two groups: those aged 18-29, those aged 30-49, as well as those aged 50-64, and those aged 65 and older. Another way to describe the chi-squared test of independence is to say that it allows us to test whether there is an association between two categorical variables. Consider the hypotheses
\(H_0:\) There is no association between Facebook usage and age
versus
\(H_1:\) There is an association between Facebook usage and age.
These hypotheses can be tested via a chi-squared test of independence.
Since there are two categorical variables involved with a chi-squared test of independence, it is useful to look at a two-way table so we can see the observed number in each category. A survey was carried out (Auxier and Anderson 2021) to better understand Americans' use of social media, online platforms, and messaging apps. The below two-way table shows the number of people who say they use Facebook by age group:
Age | Use Facebook | Do not use Facebook |
---|---|---|
18-29 | 154 | 66 |
30-49 | 320 | 96 |
50-64 | 279 | 103 |
65+ | 215 | 214 |
For the chi-squared test of independence, the degrees of freedom is:
Degrees of freedom for chi-squared test of independence:
\(\text{df} = (r - 1)(c - 1),\)
where:
- \(r\) is the number of rows (i.e. the number of categories in the first variable)
- \(c\) is the number of columns (i.e. the number of categories in the second variable).
In our example, there are four rows and two columns, so we have that
\[\text{df} = (4 - 1)(2 - 1) = 3\times 1 = 3.\]
Your turn
Suppose a group of university students have been asked how often they smoke, and possible answers are:
- Never
- Sometimes
- Often.
Also suppose the students have also been asked how frequently they exercise, with possible answers:
- Never
- Occasionally
- Regularly
- At least once a day
If we were to carry out a chi-squared test of independence to test for an association between smoking and exercise, what would the degrees of freedom be for this test?
\((3 - 1) \times (4 - 1)\)
6