## 2.1 Chi-Square Test

These notes rely on PSU STAT 500, Wikipedia, and Disha M.

The chi-square test compares observed categorical variable frequency counts $$O$$ with their expected values $$E$$. The test statistic $$X^2 = \sum (O - E)^2 / E$$ is distributed $$\chi^2$$. $$H_0: O = E$$ and $$H_a$$ is at least one pair of frequency counts differ. The chi-square test relies on the central limit theorem, so it is valid for independent, normally distributed samples, typically affirmed with at least 5 successes and failures in each cell. There a small variations in the chi-square for its various applications.

• The chi-square goodness-of-fit test tests whether observed frequency counts $$O_j$$ of the $$j \in (0, 1, \cdots k)$$ levels of a single categorical variable differ from expected frequency counts $$E_j$$. $$H_0$$ is $$O_j = E_j$$.

• The chi-square independence test tests whether observed joint frequency counts $$O_{ij}$$ of the $$i \in (0, 1, \cdots I)$$ levels of categorical variable $$Y$$ and the $$j \in (0, 1, \cdots J)$$ levels of categorical variable $$Z$$ differ from expected frequency counts $$E_{ij}$$ under the independence model where $$\pi_{ij} = \pi_{i+} \pi_{+j}$$, the joint densities. $$H_0$$ is $$O_{ij} = E_{ij}$$.

• The chi-square homogeneity test tests whether frequency counts of the $$R$$ levels of a categorical variable are distributed identically across $$C$$ different populations.