2.6 ANOVA

The variance of the error, σ2, plays a fundamental role in the inference for the model coefficients and in prediction. In this section we will see how the variance of Y is decomposed into two parts, each corresponding to the regression and to the error, respectively. This decomposition is called the ANalysis Of VAriance (ANOVA).

An important fact to highlight prior to introducing the ANOVA decomposition is that Y¯=Y^¯.38 The ANOVA decomposition considers the following measures of variation related with the response:

  • SST:=i=1n(YiY¯)2, the Total Sum of Squares. This is the total variation of Y1,,Yn, since SST=nsy2, where sy2 is the sample variance of Y1,,Yn.
  • SSR:=i=1n(Y^iY¯)2, the Regression Sum of Squares. This is the variation explained by the regression plane, that is, the variation from Y¯ that is explained by the estimated conditional mean Y^i=β^0+β^1Xi1++β^pXip. Also, SSR=nsy^2, where sy^2 is the sample variance of Y^1,,Y^n.
  • SSE:=i=1n(YiY^i)2, the Sum of Squared Errors.39 Is the variation around the conditional mean. Recall that SSE=i=1nε^i2=(np1)σ^2, where σ^2 is the rescaled sample variance of ε^1,,ε^n.

The ANOVA decomposition states that:

(2.20)SSTVariation of Yis=SSRVariation of Y^is+SSEVariation of ε^is

or, equivalently (dividing by n in (2.20)),

sy2Variance of Yis=sy^2Variance of Y^is+(np1)/n×σ^2Variance of ε^is.

The graphical interpretation of (2.20) when p=1 is shown in Figure 2.16. Figure 2.17 dynamically shows how the ANOVA decomposition places more weight on SSR or SSE according to σ^2 (which is obviously driven by the value of σ2).

The ANOVA table summarizes the decomposition of the variance:

Degrees of freedom Sum Squares Mean Squares F-value p-value
Predictors p SSR SSRp SSR/pSSE/(np1) p-value
Residuals np1 SSE SSEnp1

The F-value of the ANOVA table represents the value of the F-statistic SSR/pSSE/(np1). This statistic is employed to test

H0:β1==βp=0vs.H1:βj0 for any j1,

that is, the hypothesis of no linear dependence of Y on X1,,Xp.40 This is the so-called F-test and, if H0 is rejected, allows to conclude that at least one βj is significantly different from zero.41 It happens that

F=SSR/pSSE/(np1)H0Fp,np1,

where Fp,np1 represents the Snedecor’s F distribution with p and np1 degrees of freedom. If H0 is true, then F is expected to be small since SSR will be close to zero.42 The F-test rejects at significance level α for large values of the F-statistic, precisely for those above the α-upper quantile of the Fp,np1 distribution, denoted by Fp,np1;α.43 That is, H0 is rejected if F>Fp,np1;α.

Visualization of the ANOVA decomposition. SST measures the variation of \(Y_1,\ldots,Y_n\) with respect to \(\bar Y.\) SSR measures the variation of \(\hat Y_1,\ldots,\hat Y_n\) with respect to \(\bar{\hat Y} = \bar Y.\) SSE collects the variation between \(Y_1,\ldots,Y_n\) and \(\hat Y_1,\ldots,\hat Y_n,\) that is, the variation of the residuals.

Figure 2.16: Visualization of the ANOVA decomposition. SST measures the variation of Y1,,Yn with respect to Y¯. SSR measures the variation of Y^1,,Y^n with respect to Y^¯=Y¯. SSE collects the variation between Y1,,Yn and Y^1,,Y^n, that is, the variation of the residuals.

Figure 2.17: Illustration of the ANOVA decomposition and its dependence on σ2 and σ^2. Larger (respectively, smaller) σ^2 results in more weight placed on the SSE (SSR) term. Application available here.

The “ANOVA table” is a broad concept in statistics, with different variants. Here we are only covering the basic ANOVA table from the relation SST=SSR+SSE. However, further sophistication is possible when SSR is decomposed into the variations contributed by each predictor. In particular, for multiple linear regression R’s anova implements a sequential (type I) ANOVA table, which is not the previous table!

The anova function takes a model as an input and returns the following sequential ANOVA table:44

Degrees of freedom Sum Squares Mean Squares F-value p-value
Predictor 1 1 SSR1 SSR11 SSR1/1SSE/(np1) p1
Predictor 2 1 SSR2 SSR21 SSR2/1SSE/(np1) p2
Predictor p 1 SSRp SSRp1 SSRp/1SSE/(np1) pp
Residuals np1 SSE SSEnp1

Here the SSRj represents the regression sum of squares associated to the inclusion of Xj in the model with predictors X1,,Xj1, this is:

SSRj=SSR(X1,,Xj)SSR(X1,,Xj1).

The p-values p1,,pp correspond to the testing of the hypotheses

H0:βj=0vs.H1:βj0,

carried out inside the linear model Y=β0+β1X1++βjXj+ε. This is like the t-test for βj for the model with predictors X1,,Xj. Recall that there is no F-test in this version of the ANOVA table.

In order to exactly45 compute the simplified ANOVA table seen before, we can rely on the following ad-hoc function. The function takes as input a fitted lm:

# This function computes the simplified anova from a linear model
simpleAnova <- function(object, ...) {

  # Compute anova table
  tab <- anova(object, ...)

  # Obtain number of predictors
  p <- nrow(tab) - 1

  # Add predictors row
  predictorsRow <- colSums(tab[1:p, 1:2])
  predictorsRow <- c(predictorsRow, predictorsRow[2] / predictorsRow[1])

  # F-quantities
  Fval <- predictorsRow[3] / tab[p + 1, 3]
  pval <- pf(Fval, df1 = p, df2 = tab$Df[p + 1], lower.tail = FALSE)
  predictorsRow <- c(predictorsRow, Fval, pval)

  # Simplified table
  tab <- rbind(predictorsRow, tab[p + 1, ])
  row.names(tab)[1] <- "Predictors"
  return(tab)

}

2.6.1 Case study application

Let’s compute the ANOVA decomposition of modWine1 and modWine2 to test the existence of linear dependence.

# Models
modWine1 <- lm(Price ~ ., data = wine)
modWine2 <- lm(Price ~ . - FrancePop, data = wine)

# Simplified table
simpleAnova(modWine1)
## Analysis of Variance Table
## 
## Response: Price
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Predictors  5 8.6671 1.73343  20.188 2.232e-07 ***
## Residuals  21 1.8032 0.08587                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
simpleAnova(modWine2)
## Analysis of Variance Table
## 
## Response: Price
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Predictors  4 8.6645 2.16613   26.39 4.057e-08 ***
## Residuals  22 1.8058 0.08208                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# The null hypothesis of no linear dependence is emphatically rejected in
# both models

# R's ANOVA table -- warning this is not what we saw in lessons
anova(modWine1)
## Analysis of Variance Table
## 
## Response: Price
##             Df Sum Sq Mean Sq F value    Pr(>F)    
## WinterRain   1 0.1905  0.1905  2.2184 0.1512427    
## AGST         1 5.8989  5.8989 68.6990 4.645e-08 ***
## HarvestRain  1 1.6662  1.6662 19.4051 0.0002466 ***
## Age          1 0.9089  0.9089 10.5852 0.0038004 ** 
## FrancePop    1 0.0026  0.0026  0.0305 0.8631279    
## Residuals   21 1.8032  0.0859                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Compute the ANOVA table for the regression Price ~ WinterRain + AGST + HarvestRain + Age in the wine dataset. Check that the p-value for the F-test given in summary and by simpleAnova are the same.

For the y6 ~ x6 and y7 ~ x7 in the assumptions dataset, compute their ANOVA tables. Check that the p-values of the t-test for β1 and the F-test are the same (any explanation of why this is so?).


  1. This is an important result that can be checked using the matrix notation introduced in Section 2.2.3.↩︎

  2. Recall that SSE and RSS (of the least squares estimator β^) are two names for the same quantity (that appears in different contexts): SSE=i=1n(YiY^i)2=i=1n(Yiβ^0β^1Xi1β^pXip)2=RSS(β^).↩︎

  3. Geometrically: the plane is completely flat, it does not have any inclination in the Y direction.↩︎

  4. And therefore, there is a statistical meaningful (i.e., not constant) linear trend to model.↩︎

  5. Little variation is explained by the regression model since β^0.↩︎

  6. In R, qf(p = 1 - alpha, df1 = n - p - 1, df2 = p) or qf(p = alpha, df1 = n - p - 1, df2 = p, lower.tail = FALSE).↩︎

  7. More complex – included here just for clarification of the anova’s output.↩︎

  8. Note that, if mod <- lm(resp ~ preds, data) represents a model with response resp and predictors preds, and mod0, is the intercept-only model mod0 <- lm(resp ~ 1, data) that does not contain predictors, anova(mod0, mod) gives a similar, output to the seen ANOVA table. Precisely, the first row of the outputted table stands for the SST and the second row for the SSE row (so we call it the SST–SSE table). The SSR row is not present. The seen ANOVA table (which contains SSR and SSE) and the SST–SSE table encode the same information due to the ANOVA decomposition. So it is a matter of taste and tradition to employ one or the other. In particular, both have the F-test and its associated p-value (in the SSE row for the SST–SSE table).↩︎