# Chapter 3 Chi-squared Test of Independence

In the previous topic, we compared proportions between two independent populations via the two-sample test of proportions. The chi-squared test of independence also allows such comparisons, and there is no restriction on the number of categories (or populations). For example, we could use the chi-squared test of independence to test whether there is a significant difference in the proportion of US adults who say they use Facebook between two groups: those aged 18-29, and those aged 30-49. We could also test whether there is a difference between more than two groups: those aged 18-29, those aged 30-49, as well as those aged 50-64, and those aged 65 and older. Another way to describe the chi-squared test of independence is to say that it allows us to test whether there is an association between two categorical variables. Consider the hypotheses

$$H_0:$$ There is no association between Facebook usage and age

versus

$$H_1:$$ There is an association between Facebook usage and age.

These hypotheses can be tested via a chi-squared test of independence.

Since there are two categorical variables involved with a chi-squared test of independence, it is useful to look at a two-way table so we can see the observed number in each category. A survey was carried out to better understand Americans' use of social media, online platforms, and messaging apps. The below two-way table shows the number of people who say they use Facebook by age group:

18-29 154 66
30-49 320 96
50-64 279 103
65+ 215 214

For the chi-squared test of independence, the degrees of freedom is:

Degrees of freedom for chi-squared test of independence:

$$\text{df} = (r - 1)(c - 1),$$

where:

• $$r$$ is the number of rows (i.e. the number of categories in the first variable)
• $$c$$ is the number of columns (i.e. the number of categories in the second variable).

In our example, there are four rows and two columns, so we have that

$\text{df} = (4 - 1)(2 - 1) = 3\times 1 = 3.$

Suppose a group of university students have been asked how often they smoke, and possible answers are:

• Never
• Sometimes
• Often.

Also suppose the students have also been asked how frequently they exercise, with possible answers:

• Never
• Occasionally
• Regularly
• At least once a day

If we were to carry out a chi-squared test of independence to test for an association between smoking and exercise, what would the degrees of freedom be for this test?

$$(3 - 1) \times (4 - 1)$$

6

### References

Auxier, Brooke, and Monica Anderson. 2021. “Social Media Use in 2021.” Pew Research Center. 2021. https://www.pewresearch.org/internet/2021/04/07/social-media-use-in-2021/.