Chapter 3 Chi-squared Test of Independence

In the previous topic, we compared proportions between two independent populations via the two-sample test of proportions. The chi-squared test of independence also allows such comparisons, and there is no restriction on the number of categories (or populations). For example, we could use the chi-squared test of independence to test whether there is a significant difference in the proportion of US adults who say they use Facebook between two groups: those aged 18-29, and those aged 30-49. We could also test whether there is a difference between more than two groups: those aged 18-29, those aged 30-49, as well as those aged 50-64, and those aged 65 and older. Another way to describe the chi-squared test of independence is to say that it allows us to test whether there is an association between two categorical variables. Consider the hypotheses

\(H_0:\) There is no association between Facebook usage and age

versus

\(H_1:\) There is an association between Facebook usage and age.

These hypotheses can be tested via a chi-squared test of independence.

Since there are two categorical variables involved with a chi-squared test of independence, it is useful to look at a two-way table so we can see the observed number in each category. A survey was carried out (Auxier and Anderson 2021) to better understand Americans' use of social media, online platforms, and messaging apps. The below two-way table shows the number of people who say they use Facebook by age group:

Age Use Facebook Do not use Facebook
18-29 154 66
30-49 320 96
50-64 279 103
65+ 215 214

For the chi-squared test of independence, the degrees of freedom is:

Degrees of freedom for chi-squared test of independence:

\(\text{df} = (r - 1)(c - 1),\)

where:

  • \(r\) is the number of rows (i.e. the number of categories in the first variable)
  • \(c\) is the number of columns (i.e. the number of categories in the second variable).

In our example, there are four rows and two columns, so we have that

\[\text{df} = (4 - 1)(2 - 1) = 3\times 1 = 3.\]

Your turn

Suppose a group of university students have been asked how often they smoke, and possible answers are:

  • Never
  • Sometimes
  • Often.

Also suppose the students have also been asked how frequently they exercise, with possible answers:

  • Never
  • Occasionally
  • Regularly
  • At least once a day

If we were to carry out a chi-squared test of independence to test for an association between smoking and exercise, what would the degrees of freedom be for this test?

\((3 - 1) \times (4 - 1)\)

6

References

Auxier, Brooke, and Monica Anderson. 2021. “Social Media Use in 2021.” Pew Research Center. 2021. https://www.pewresearch.org/internet/2021/04/07/social-media-use-in-2021/.