2.2 Carrying out the test: Chi-squared goodness of fit test

We are now ready to carry out the chi-squared goodness of fit test, to test whether the proportion of frequency of Facebook usage among social media users is distributed as expected. The results of the test are as follows:


    Chi-squared test for given probabilities

data:  obs.freq
X-squared = 48.696, df = 3, p-value = 1.514e-10

We note the following:

The test statistic is equal to 48.696
\(p\)-value is \(p < 0.001\). Since this is less than \(\alpha = 0.05\), we reject \(H_0\) and there is sufficient evidence to support that the distribution of proportions is significantly different from what was expected.
The degrees of freedom is \(\text{df} = 3\).

Having carried out the test, we can now also check the assumptions. If we know the expected proportions and the sample size, we can calculate the expected counts. However, most software packages will do this for us. The expected counts are:

[1]  96.8 314.6  48.4  24.2

All of these numbers are greater than five. This means that:

No more than 20% of categories have an expected count of less than 5
There are no expected counts of zero,

and the assumptions have therefore been met.

The formula for the test statistic is

\[X^2 = \sum_{i = 1}^k \frac{(O_i - E_i)^2}{E_i}, \] where:

\(X^2\) is random, with \(X^2 \sim \chi^2_{\text{df}}\) under \(H_0\)
\(O_i\) is the observed frequency for the \(i\)th category
\(E_i\) is the expected frequency for the \(i\)th category
\(k\) is the number of categories.

The formula for the observed test statistic is

\[\chi^2 = \sum_{i = 1}^k \frac{(O_i - E_i)^2}{E_i}, \] where:

\(O_i\) is the observed frequency for the \(i\)th category
\(E_i\) is the expected frequency for the \(i\)th category. That is, the proportion in the \(i\)th category under \(H_0\) multiplied by the sample size
\(k\) is the number of categories.

See if you can use this formula to calculate the observed test statistic \((\chi^2 = 48.696)\) yourself.

The \(p\)-value can be calculated as \(P(X^2 \geq \chi^2)\), where this probability is calculated under \(H_0\). As usual, if the \(p\)-value is less than \(\alpha\) (where \(\alpha\) is normally 0.05), we reject \(H_0\)