Chapter 6 Analysis of Variance (ANOVA)

  • Notation: \(y = X\beta + \epsilon\), \(\epsilon \sim N(0, \sigma^2)\). Let \(X_1 = 1\), \(X_m = X\) and \(X_{m+1} = I\). Suppose \(C(X_1)\subset C(X_2) \cdots \subset C(X_{m-1}) \subset C(X_m)\). Let \(P_j = P_{X_j}\) and \(r_j = rank(X_j), \forall j = 1, \ldots, m+1\)

    • Total sum of squares: \(SSTo = \sum_{i=1}^n (y_i - \bar y_.)^2 = y'(I-P_1)y = \sum_{j=1}^m y'(P_{j+1} - P_j)y\)
    • \(SSE = y'(I- P_X)y\)
    • Sum of squares: \(SS(2|1) = y'(P_2 - P_1)y, \ldots, SS(m|m-1) = y'(P_{m+1} - P_m)y\)
    • \(rank(P_{j+1} - P_{j}) = tr(P_{j+1}) - tr(P_j) = r_{j+1} - r_j\)
    • zero cross-products: \((P_{j+1} - P_j)(P_{l+1} - P_{l}) = 0\)
    • because \((\frac{P_{j+1} - P_j}{\sigma^2})(\sigma^2I)\) is idempotent, \(\frac{y'(P_{j+1}-P_j)y}{\sigma^2} \sim \chi_{r_{j+1}-r_j}^2\left(\frac{\beta'X'(P_{j+1}-P_j)X\beta}{2\sigma^2} \right)\) for all \(j = 1, \ldots, m\)
    • Mean squares: \(MS(j+1\mid j) = \frac{SS(j+1\mid j)}{r_{j+1}-r_j}\)
  • ANOVA F statistics: \[ F_j = \frac{MS(j+1\mid j)}{MSE} = \frac{y'(P_{j+1} - P_j)y)/(r_{j+1} - r_j)}{y'(I-P_X)y/(n-r)} \sim F_{r_{j+1} - r_j, n-r}\left( \frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2}\right) \]

    • \(F_j\) can be used to test $H_{0j}: = 0 $ vs. \(H_{Aj}: \frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2} \neq 0\)
    • \(\frac{\beta'X'(P_{j+1} - P_{j})X\beta}{2\sigma^2}\) \(\Leftrightarrow\) \((P_{j+1}-P_j)X\beta = 0\) \(\Leftrightarrow\) \(P_j E(y) = P_{j+1}E(y)\) \(\Leftrightarrow\) \(P_{j+1}E(y) \in C(X_j)\)
    • \(C^*_j = P_{j+1} - P_j\) is not full rank so \((P_{j+1}-P_j)X\beta = 0\) is not a testable hypothesis. We can write \(H_{0j}\) as a testable hypothesis by replacing \(C^*_j\) with any matrix \(C_j\) whose \(q = r_{j+1} - r_j\) rows form a basis for the row space of \(C^*_j\).