Chapter 6 Analysis of Variance (ANOVA)

Notation: $y = X\beta + \epsilon$ , $\epsilon \sim N(0, \sigma^2)$ . Let $X_1 = 1$ , $X_m = X$ and $X_{m+1} = I$ . Suppose $C(X_1)\subset C(X_2) \cdots \subset C(X_{m-1}) \subset C(X_m)$ . Let $P_j = P_{X_j}$ and $r_j = rank(X_j), \forall j = 1, \ldots, m+1$
- Total sum of squares: $SSTo = \sum_{i=1}^n (y_i - \bar y_.)^2 = y'(I-P_1)y = \sum_{j=1}^m y'(P_{j+1} - P_j)y$
- $SSE = y'(I- P_X)y$
- Sum of squares: $SS(2|1) = y'(P_2 - P_1)y, \ldots, SS(m|m-1) = y'(P_{m+1} - P_m)y$
- $rank(P_{j+1} - P_{j}) = tr(P_{j+1}) - tr(P_j) = r_{j+1} - r_j$
- zero cross-products: $(P_{j+1} - P_j)(P_{l+1} - P_{l}) = 0$
- because $(\frac{P_{j+1} - P_j}{\sigma^2})(\sigma^2I)$ is idempotent, $\frac{y'(P_{j+1}-P_j)y}{\sigma^2} \sim \chi_{r_{j+1}-r_j}^2\left(\frac{\beta'X'(P_{j+1}-P_j)X\beta}{2\sigma^2} \right)$ for all $j = 1, \ldots, m$
- Mean squares: $MS(j+1\mid j) = \frac{SS(j+1\mid j)}{r_{j+1}-r_j}$
ANOVA F statistics: $F_j = \frac{MS(j+1\mid j)}{MSE} = \frac{y'(P_{j+1} - P_j)y)/(r_{j+1} - r_j)}{y'(I-P_X)y/(n-r)} \sim F_{r_{j+1} - r_j, n-r}\left( \frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2}\right)$
- $F_j$ can be used to test $H_{0j}: = 0 $ vs. $H_{Aj}: \frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2} \neq 0$
- $\frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2}$ $\Leftrightarrow$ $(P_{j+1}-P_j)X\beta = 0$ $\Leftrightarrow$ $P_j E(y) = P_{j+1}E(y)$ $\Leftrightarrow$ $P_{j+1}E(y) \in C(X_j)$
- $C^*_j = P_{j+1} - P_j$ is not full rank so $(P_{j+1}-P_j)X\beta = 0$ is not a testable hypothesis. We can write $H_{0j}$ as a testable hypothesis by replacing $C^*_j$ with any matrix $C_j$ whose $q = r_{j+1} - r_j$ rows form a basis for the row space of $C^*_j$ .