2.1 Chi-Square Test

These notes rely on PSU STAT 500, Wikipedia, and Disha M.

The chi-square test compares observed categorical variable frequency counts \(O\) with their expected values \(E\). The test statistic \(X^2 = \sum (O - E)^2 / E\) is distributed \(\chi^2\). \(H_0: O = E\) and \(H_a\) is at least one pair of frequency counts differ. The chi-square test relies on the central limit theorem, so it is valid for independent, normally distributed samples, typically affirmed with at least 5 successes and failures in each cell. There a small variations in the chi-square for its various applications.

  • The chi-square goodness-of-fit test tests whether observed frequency counts \(O_j\) of the \(j \in (0, 1, \cdots k)\) levels of a single categorical variable differ from expected frequency counts \(E_j\). \(H_0\) is \(O_j = E_j\).

  • The chi-square independence test tests whether observed joint frequency counts \(O_{ij}\) of the \(i \in (0, 1, \cdots I)\) levels of categorical variable \(Y\) and the \(j \in (0, 1, \cdots J)\) levels of categorical variable \(Z\) differ from expected frequency counts \(E_{ij}\) under the independence model where \(\pi_{ij} = \pi_{i+} \pi_{+j}\), the joint densities. \(H_0\) is \(O_{ij} = E_{ij}\).

  • The chi-square homogeneity test tests whether frequency counts of the \(R\) levels of a categorical variable are distributed identically across \(C\) different populations.