# Chapter 3 Chi-squared Test of Independence

In the previous topic, we compared proportions between two independent populations via the two-sample test of proportions. The chi-squared test of independence also allows such comparisons, and there is no restriction on the number of categories (or populations). For example, we could use the chi-squared test of independence to test whether there is a significant difference in the proportion of US adults who say they use Facebook between two groups: those aged 18-29, and those aged 30-49. We could also test whether there is a difference between more than two groups: those aged 18-29, those aged 30-49, as well as those aged 50-64, and those aged 65 and older. Another way to describe the chi-squared test of independence is to say that it allows us to test whether there is an association between two categorical variables. Consider the hypotheses

\(H_0:\) There is no association between Facebook usage and age

versus

\(H_1:\) There is an association between Facebook usage and age.

These hypotheses can be tested via a chi-squared test of independence.

Since there are two categorical variables involved with a chi-squared test of independence, it is useful to look at a two-way table so we can see the observed number in each category. A survey was carried out (Auxier and Anderson 2021) to better understand Americans' use of social media, online platforms, and messaging apps. The below two-way table shows the number of people who say they use Facebook by age group:

Age | Use Facebook | Do not use Facebook |
---|---|---|

18-29 | 154 | 66 |

30-49 | 320 | 96 |

50-64 | 279 | 103 |

65+ | 215 | 214 |

For the chi-squared test of independence, the **degrees of freedom** is:

**Degrees of freedom for chi-squared test of independence:**

\(\text{df} = (r - 1)(c - 1),\)

where:

- \(r\) is the number of rows (i.e. the number of categories in the first variable)
- \(c\) is the number of columns (i.e. the number of categories in the second variable).

In our example, there are four rows and two columns, so we have that

\[\text{df} = (4 - 1)(2 - 1) = 3\times 1 = 3.\]

**Your turn**

Suppose a group of university students have been asked how often they smoke, and possible answers are:

- Never
- Sometimes
- Often.

Also suppose the students have also been asked how frequently they exercise, with possible answers:

- Never
- Occasionally
- Regularly
- At least once a day

If we were to carry out a chi-squared test of independence to test for an association between smoking and exercise, what would the degrees of freedom be for this test?

\((3 - 1) \times (4 - 1)\)

6