# Chapter 6 Analysis of Variance (ANOVA)

• Notation: $$y = X\beta + \epsilon$$, $$\epsilon \sim N(0, \sigma^2)$$. Let $$X_1 = 1$$, $$X_m = X$$ and $$X_{m+1} = I$$. Suppose $$C(X_1)\subset C(X_2) \cdots \subset C(X_{m-1}) \subset C(X_m)$$. Let $$P_j = P_{X_j}$$ and $$r_j = rank(X_j), \forall j = 1, \ldots, m+1$$

• Total sum of squares: $$SSTo = \sum_{i=1}^n (y_i - \bar y_.)^2 = y'(I-P_1)y = \sum_{j=1}^m y'(P_{j+1} - P_j)y$$
• $$SSE = y'(I- P_X)y$$
• Sum of squares: $$SS(2|1) = y'(P_2 - P_1)y, \ldots, SS(m|m-1) = y'(P_{m+1} - P_m)y$$
• $$rank(P_{j+1} - P_{j}) = tr(P_{j+1}) - tr(P_j) = r_{j+1} - r_j$$
• zero cross-products: $$(P_{j+1} - P_j)(P_{l+1} - P_{l}) = 0$$
• because $$(\frac{P_{j+1} - P_j}{\sigma^2})(\sigma^2I)$$ is idempotent, $$\frac{y'(P_{j+1}-P_j)y}{\sigma^2} \sim \chi_{r_{j+1}-r_j}^2\left(\frac{\beta'X'(P_{j+1}-P_j)X\beta}{2\sigma^2} \right)$$ for all $$j = 1, \ldots, m$$
• Mean squares: $$MS(j+1\mid j) = \frac{SS(j+1\mid j)}{r_{j+1}-r_j}$$
• ANOVA F statistics: $F_j = \frac{MS(j+1\mid j)}{MSE} = \frac{y'(P_{j+1} - P_j)y)/(r_{j+1} - r_j)}{y'(I-P_X)y/(n-r)} \sim F_{r_{j+1} - r_j, n-r}\left( \frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2}\right)$

• $$F_j$$ can be used to test $H_{0j}: = 0$ vs. $$H_{Aj}: \frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2} \neq 0$$
• $$\frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2}$$ $$\Leftrightarrow$$ $$(P_{j+1}-P_j)X\beta = 0$$ $$\Leftrightarrow$$ $$P_j E(y) = P_{j+1}E(y)$$ $$\Leftrightarrow$$ $$P_{j+1}E(y) \in C(X_j)$$
• $$C^*_j = P_{j+1} - P_j$$ is not full rank so $$(P_{j+1}-P_j)X\beta = 0$$ is not a testable hypothesis. We can write $$H_{0j}$$ as a testable hypothesis by replacing $$C^*_j$$ with any matrix $$C_j$$ whose $$q = r_{j+1} - r_j$$ rows form a basis for the row space of $$C^*_j$$.