3.2 Carrying out the test: Chi-squared test of independence
We are now ready to carry out the chi-squared test of independence, to test whether Facebook usage is associated with age. The results of the test are as follows:
Pearson's Chi-squared test
data: table
X-squared = 80.892, df = 3, p-value < 2.2e-16
We note the following:
- The test statistic is equal to 80.892
- p-value is p<0.001. Since this is less than α=0.05, we reject H0 and there is sufficient evidence to support that there is an association between Facebook usage and age.
- The degrees of freedom is df=3.
Having carried out the test, we can now also check the assumptions. The expected count for a given cell in the two-way table can be found by multiplying the row total by the column total, and dividing by the grand total. However, most software packages will do this for us. The expected counts are:
[,1] [,2]
group1 147.1735 72.82654
group2 278.2916 137.70836
group3 255.5466 126.45335
group4 286.9883 142.01175
All of these numbers are greater than five. This means that:
- No more than 20% of categories have an expected count of less than 5
- There are no expected counts of zero,
and the assumptions have therefore been met.
The formula for the test statistic is
X2=r∑i=1c∑j=1(Oij−Eij)2Eij, where, referring to the two-way table:
- Oij is the observed frequency in the ith row and the jth column
- Eij is the expected frequency of the cell in the ith row and the jth column
- r is the number of rows
- c is the number of columns.
- X2 is random, with X2∼χ2df under H0.
The formula for the observed test statistic is
χ2=r∑i=1c∑j=1(Oij−Eij)2Eij, where, referring to the two-way table:
- Oij is the observed frequency in the ith row and the jth column
- r is the number of rows
- c is the number of columns
- is the expected frequency of the cell in the ith row and the jth column
- Eij=row_totali×column_totaljgrand_total is the expected frequency of the cell in the ith row and the jth column
- row_totali is the number of observations in the ith row
- column_totalj is the number of observations in the jth column
- grand_total is the total number of observations, often denoted n.
See if you can use this formula to calculate the test statistic (χ2=80.892) yourself.
The p-value can be calculated as P(X2≥χ2). As usual, if the p-value is less than α (where α is normally 0.05), we reject H0