6.5 Chi-Square Test of Independence
The chi-square test of independence tests whether two categorical variables are associated, or are instead independent13. It tests whether the observed joint frequency counts \(O_{ij}\) differ from expected frequency counts \(E_{ij}\) under the independence model (the model of independent explanatory variables, \(\pi_{ij} = \pi_{i+} \pi_{+j}\). The null hypothesis is \(O_{ij} = E_{ij}\). The test assumes the two variables are independent14 and that all cell counts are at least 5.
Choose from two test statistics, Pearson \(X^2\) (and the continuity adjusted \(X^2\)), and deviance G. As \(n \rightarrow \infty\) their sampling distributions approach \(\chi^2(df)\) with degrees of freedom (df) equal to the saturated model df \(I \times J - 1\) minus the independence model df \((I - 1) + (J - 1)\), which you can algebraically solve for \(df = (I - 1)(J - 1)\).
The Pearson goodness-of-fit statistic is
\[X^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}}\]
where \(O_{ij}\) is the observed count, and \(E_{ij}\) is the product of the row and column marginal probabilities. The deviance statistic is
\[G = 2 \sum_{ij} O_{ij} \log \left( \frac{O_{ij}}{E_{ij}} \right)\]
\(X^2\) and G increase with the disagreement between the saturated model proportions \(p_{ij}\) and the independence model proportions \(\pi_{ij}\).
The test is sometimes call the chi-square test for association.↩︎
Independence is usually true by assumption and/or construction. It can be violated by, for example, a sample that includes spouses where one spouse’s status (e.g., preference for a vacation destination) may be related to the other spouse’s status.↩︎