3.2 Carrying out the test: Chi-squared test of independence
We are now ready to carry out the chi-squared test of independence, to test whether Facebook usage is associated with age. The results of the test are as follows:
Pearson's Chi-squared test
data: table
X-squared = 80.892, df = 3, p-value < 2.2e-16
We note the following:
- The test statistic is equal to 80.892
- \(p\)-value is \(p < 0.001\). Since this is less than \(\alpha = 0.05\), we reject \(H_0\) and there is sufficient evidence to support that there is an association between Facebook usage and age.
- The degrees of freedom is \(\text{df} = 3\).
Having carried out the test, we can now also check the assumptions. The expected count for a given cell in the two-way table can be found by multiplying the row total by the column total, and dividing by the grand total. However, most software packages will do this for us. The expected counts are:
[,1] [,2]
group1 147.1735 72.82654
group2 278.2916 137.70836
group3 255.5466 126.45335
group4 286.9883 142.01175
All of these numbers are greater than five. This means that:
- No more than 20% of categories have an expected count of less than 5
- There are no expected counts of zero,
and the assumptions have therefore been met.
The formula for the test statistic is
\[X^2 = \sum_{i = 1}^r \sum_{j = 1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}},\] where, referring to the two-way table:
- \(O_{ij}\) is the observed frequency in the \(i\)th row and the \(j\)th column
- \(E_{ij}\) is the expected frequency of the cell in the \(i\)th row and the \(j\)th column
- \(r\) is the number of rows
- \(c\) is the number of columns.
- \(X^2\) is random, with \(X^2 \sim \chi^2_{\text{df}}\) under \(H_0\).
The formula for the observed test statistic is
\[\chi^2 = \sum_{i = 1}^r \sum_{j = 1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}},\] where, referring to the two-way table:
- \(O_{ij}\) is the observed frequency in the \(i\)th row and the \(j\)th column
- \(r\) is the number of rows
- \(c\) is the number of columns
- is the expected frequency of the cell in the \(i\)th row and the \(j\)th column
- \(E_{ij} = \displaystyle \frac{\text{row}\_\text{total}_i \times \text{column}\_\text{total}_j}{\text{grand}\_\text{total}}\) is the expected frequency of the cell in the \(i\)th row and the \(j\)th column
- \(\text{row}\_\text{total}_i\) is the number of observations in the \(i\)th row
- \(\text{column}\_\text{total}_j\) is the number of observations in the \(j\)th column
- \(\text{grand}\_\text{total}\) is the total number of observations, often denoted \(n\).
See if you can use this formula to calculate the test statistic \((\chi^2 = 80.892)\) yourself.
The \(p\)-value can be calculated as \(P(X^2 \geq \chi^2)\). As usual, if the \(p\)-value is less than \(\alpha\) (where \(\alpha\) is normally 0.05), we reject \(H_0\)