Analysis Of Variance (ANOVA)
The formulas that we constructed to estimate the parameters and residual sum-of-squares for a simple linear regression model are used in the Analysis of Variance (ANOVA) table to investigate the variability explained by the model and to test the null hypothesis, that the parameters in the model, that is the regression coefficients, are zero.
Analysis Of Variance (ANOVA) Table Construction
The sum of squares formulas and their relationship provide the elements of the ANOVA table.
Component | Degrees of freedom (df) | Sum of squares (SS) | Mean squares (MS) | F value |
Model | \(p-1\) | \(MSS\) | \(\frac{MSS}{p-1}\) | \(\frac{\frac{MSS}{p-1}}{\frac{RSS}{n-p}}\) |
Residual | \(n-p\) | \(RSS\) | \(\frac{RSS}{n-p}\) | |
Total | \(n-1\) | \(TSS\) |
where \(p\) is the total number of estimated regression coefficients.
Recall \[TSS = MSS + RSS\] What we really want to know is the amount of variation explained by our model (MSS) greater than the amount of unexplained variation (RSS).
If \(MSS > RSS\) then our model is doing a good job.
If \(MSS < RSS\) then our model is not doing a good job
This is the general idea behind the ANOVA table. However we need to consider how much larger should \(MSS\) be in comparison to \(RSS\) and how can we quantify `good’. Both these questions will be addressed in later weeks.
The F-value is roughly the ratio of \(\frac{MSS}{RSS}\). Therefore higher values imply \(MSS > RSS\). Notice however that \(MSS\) and \(RSS\) are both scaled by the degrees of freedom. The degrees of freedom refers to the number of values involved in a calculation that have the freedom to vary. In other words, the degrees of freedom can be defined as the total number of observations minus the number of independent constraints imposed on the observations. Therefore both \(MSS\) and \(RSS\) are scaled, or penalised, by the number of explanatory variables in the model \(p\).
ANOVA For The Simple Linear Regression
The F statistic, \(MS_{model}/MS_{residual}\) where the subscripts indicate that these are the respective mean squares of the model and residuals provides a test statistic that allows us to test whether there is any evidence that all the model parameters are not zero.
Consider a simple linear regression with response variable \(y\) and explanatory variable \(x\) such that
\[y_{i}=\alpha + \beta x_{i}+\epsilon_i \quad \quad \mbox{for } i=1,\ldots n\] Then we want to test the null hypothesis
\[\text{H}_0: \beta = 0\]
which will be tested against the alternative that the parameter is not zero. Or in other words, \(\beta \neq 0\).
We have already derived expression for \(RSS\) and \(TSS\) under this model, that is
\[\begin{aligned} RSS &= S_{yy} - \frac{(S_{xy})^2}{S_{xx}} \\ TSS &= \sum_i(y_i-\bar{y})^2 \\ &= S_{yy} \\ MSS &= TSS-RSS\\ &= S_{yy} - \left(S_{yy} - \frac{(S_{xy})^2}{S_{xx}}\right)\\ &=\frac{(S_{xy})^2}{S_{xx}} \end{aligned}\]
Therefore, the ANOVA table for a simple linear regression model, with 2 estimated regression coefficients \(\alpha\) and \(\beta\) is given by:
Component | Degrees of freedom (df) | Sum of squares (SS) | Mean squares (MS) | F value |
Model | 1 | \((S_{xy})^2/S_{xx}\) | \(\frac{(S_{xy})^2/S_{xx}}{1}\) | \(\frac{\frac{(S_{xy})^2/S_{xx}}{1}}{\frac{S_{yy}-(S_{xy})^2/S_{xx}}{n-2}}\) |
Residual | \(n-2\) | \(S_{yy}-(S_{xy})^2/S_{xx}\) | \(\frac{S_{yy}-(S_{xy})^2/S_{xx}}{n-2}\) | |
Total | \(n-1\) | \(\sum_i(y_i-\bar{y})^2 = S_{yy}\) |
The total (corrected) sum of squares has \(n - 1\) degrees of freedom because the intercept term has been implicitly fitted. Notice that we in the same situation with two nested models
Model 0: \(E(y_i) = \alpha\)
Model 1: \(E(y_i) = \alpha+\beta x_{i}\)
and we want to know if we should prefer our fitted model, Model 1, to the null model, Model 0.
ANOVA For The Multiple Linear Regression
Consider a multiple linear regression with response variable \(y\) and \(p\) explanatory variables \(x_1, \ldots, x_p\) such that \[y_{i}=\alpha + \sum_{j=1}^{p}{\beta_jx_{ij}}+\epsilon_i \quad \quad \mbox{for } i=1,\ldots n.\] Then we are testing the null hypothesis
H\(_0\): all \(p\) parameters, \(\beta_1,\ldots,\beta_p\) = 0,
which will be tested against the alternative that all the parameters are not zero. Or in other words, at least one of \(\beta_1, \ldots,\beta_p\) does not equal zero.