Chapter 2 Chi-squared Goodness of Fit Test

You may recall the research question, is the proportion of social media users who use Facebook more than once per day different from 73%? that we considered in the previous topic. We tested the associated hypotheses using the one-sample test of proportions. In this test, there were only two possible categories people could be categorised into:

Use Facebook more than once per day
Use Facebook once per day or less

What if there were more than two categories? For example, using Facebook with:

High frequency (> 10 times per day)
Medium frequency (2-10 times per day)
Low frequency (once per day)
Never

In such a case, we could use the Chi-squared goodness of fit test.

For example, suppose a claim has been made that the frequency with which social media users use Facebook is as follows: 1. High frequency: 20% 1. Medium frequency: 65% 1. Low frequency (once per day): 10% 1. Never: 5%

A survey was carried out (Raymond 2019) to study the social media habits of regular social media users from around the world. The below table shows the results the survey based on the \(n = 484\) respondents, as well as the expected percentages based on the above claim:

Frequency	Observed frequency	Observed percentage	Expected percentage
High	95	19.63%	20%
Medium	273	56.40%	65%
Low	94	19.42%	10%
Never	22	4.55%	5%

To test whether the observed distribution of percentages is significantly different from what was expected (or claimed), we can test the following hypotheses via the chi-squared goodness of fit test:

\(H_0:\) There is no significant difference between the observed and expected distribution of proportions of Facebook usage frequency of social media users. versus

\(H_1:\) There is a significant difference between the observed and expected distribution of proportions of Facebook usage frequency of social media users.

For the chi-squared goodness of fit test, the degrees of freedom is:

Degrees of freedom for chi-squared goodness of fit test:

\(\text{df} = \text{Number of categories} - 1\).

In our example, there are four categories, so we have that

\[\text{df} = 4 - 1 = 3.\] Therefore, in this example, we have that \(X^2 \sim \chi^2_3\) under \(H_0\).

Your turn

Suppose a group of university students have been asked how often they smoke, and possible answers are:

Never
Sometimes
Often.

Further suppose we wish to test whether the distribution of proportions for this group of university students is the same or different from a set of expected proportions. What would be the degrees of freedom for this test?

References

Raymond, Mark. 2019. “Social Media Usage Report 2019: User Habits You Need to Know.” 2019. https://www.goodfirms.co/resources/social-media-usage-user-habits-to-know.