Chapter 2 Chi-squared Goodness of Fit Test
You may recall the research question, is the proportion of social media users who use Facebook more than once per day different from 73%? that we considered in the previous topic. We tested the associated hypotheses using the one-sample test of proportions. In this test, there were only two possible categories people could be categorised into:
- Use Facebook more than once per day
- Use Facebook once per day or less
What if there were more than two categories? For example, using Facebook with:
- High frequency (> 10 times per day)
- Medium frequency (2-10 times per day)
- Low frequency (once per day)
- Never
In such a case, we could use the Chi-squared goodness of fit test.
For example, suppose a claim has been made that the frequency with which social media users use Facebook is as follows: 1. High frequency: 20% 1. Medium frequency: 65% 1. Low frequency (once per day): 10% 1. Never: 5%
A survey was carried out (Raymond 2019) to study the social media habits of regular social media users from around the world. The below table shows the results the survey based on the \(n = 484\) respondents, as well as the expected percentages based on the above claim:
Frequency | Observed frequency | Observed percentage | Expected percentage |
---|---|---|---|
High | 95 | 19.63% | 20% |
Medium | 273 | 56.40% | 65% |
Low | 94 | 19.42% | 10% |
Never | 22 | 4.55% | 5% |
To test whether the observed distribution of percentages is significantly different from what was expected (or claimed), we can test the following hypotheses via the chi-squared goodness of fit test:
\(H_0:\) There is no significant difference between the observed and expected distribution of proportions of Facebook usage frequency of social media users.\(H_1:\) There is a significant difference between the observed and expected distribution of proportions of Facebook usage frequency of social media users.
For the chi-squared goodness of fit test, the degrees of freedom is:
Degrees of freedom for chi-squared goodness of fit test:
\(\text{df} = \text{Number of categories} - 1\).
In our example, there are four categories, so we have that
\[\text{df} = 4 - 1 = 3.\] Therefore, in this example, we have that \(X^2 \sim \chi^2_3\) under \(H_0\).
Your turn
Suppose a group of university students have been asked how often they smoke, and possible answers are:
- Never
- Sometimes
- Often.
Further suppose we wish to test whether the distribution of proportions for this group of university students is the same or different from a set of expected proportions. What would be the degrees of freedom for this test?
2