# Chapter 2 Chi-squared Goodness of Fit Test

You may recall the research question, is the proportion of social media users who use Facebook more than once per day different from 73%? that we considered in the previous topic. We tested the associated hypotheses using the one-sample test of proportions. In this test, there were only two possible categories people could be categorised into:

1. Use Facebook more than once per day
2. Use Facebook once per day or less

What if there were more than two categories? For example, using Facebook with:

1. High frequency (> 10 times per day)
2. Medium frequency (2-10 times per day)
3. Low frequency (once per day)
4. Never

In such a case, we could use the Chi-squared goodness of fit test.

For example, suppose a claim has been made that the frequency with which social media users use Facebook is as follows: 1. High frequency: 20% 1. Medium frequency: 65% 1. Low frequency (once per day): 10% 1. Never: 5%

A survey was carried out to study the social media habits of regular social media users from around the world. The below table shows the results the survey based on the $$n = 484$$ respondents, as well as the expected percentages based on the above claim:

Frequency Observed frequency Observed percentage Expected percentage
High 95 19.63% 20%
Medium 273 56.40% 65%
Low 94 19.42% 10%
Never 22 4.55% 5%

To test whether the observed distribution of percentages is significantly different from what was expected (or claimed), we can test the following hypotheses via the chi-squared goodness of fit test:

$$H_0:$$ There is no significant difference between the observed and expected distribution of proportions of Facebook usage frequency of social media users.
versus

$$H_1:$$ There is a significant difference between the observed and expected distribution of proportions of Facebook usage frequency of social media users.

For the chi-squared goodness of fit test, the degrees of freedom is:

Degrees of freedom for chi-squared goodness of fit test:

$$\text{df} = \text{Number of categories} - 1$$.

In our example, there are four categories, so we have that

$\text{df} = 4 - 1 = 3.$

Suppose a group of university students have been asked how often they smoke, and possible answers are:

• Never
• Sometimes
• Often.

Further suppose we wish to test whether the distribution of proportions for this group of university students is the same or different from a set of expected proportions. What would be the degrees of freedom for this test?

2

### References

Raymond, Mark. 2019. “Social Media Usage Report 2019: User Habits You Need to Know.” 2019. https://www.goodfirms.co/resources/social-media-usage-user-habits-to-know.