15.3 Comparing regression models with anova()

A good model not only needs to fit data well, it also needs to be parsimonious. That is, a good model should be only be as complex as necessary to describe a dataset. If you are choosing between a very simple model with 1 IV, and a very complex model with, say, 10 IVs, the very complex model needs to provide a much better fit to the data in order to justify its increased complexity. If it can’t, then the more simpler model should be preferred.

To compare the fits of two models, you can use the anova() function with the regression objects as two separate arguments. The anova() function will take the model objects as arguments, and return an ANOVA testing whether the more complex model is significantly better at capturing the data than the simpler model. If the resulting p-value is sufficiently low (usually less than 0.05), we conclude that the more complex model is significantly better than the simpler model, and thus favor the more complex model. If the p-value is not sufficiently low (usually greater than 0.05), we should favor the simpler model.

Let’s do an example with the diamonds dataset. I’ll create three regression models that each predict a diamond’s value. The models will differ in their complexity – that is, the number of independent variables they use. diamonds.mod1 will be the simplest model with just one IV (weight), diamonds.mod2 will include 2 IVs (weight and clarity) while diamonds.mod3 will include three IVs (weight, clarity, and color).

# model 1: 1 IV (only weight)
 diamonds.mod1 <- lm(value ~ weight, data = diamonds)

 # Model 2: 2 IVs (weight AND clarity)
 diamonds.mod2 <- lm(value ~ weight + clarity, data = diamonds)

 # Model 3: 3 IVs (weight AND clarity AND color)
 diamonds.mod3 <- lm(value ~ weight + clarity + color, data = diamonds)

Now let’s use the anova() function to compare these models and see which one provides the best parsimonious fit of the data. First, we’ll compare the two simplest models: model 1 with model 2. Because these models differ in the use of the clarity IV (both models use weight), this ANVOA will test whether or not including the clarity IV leads to a significant improvement over using just the weight IV:

 # Compare model 1 to model 2
 anova(diamonds.mod1, diamonds.mod2)
## Analysis of Variance Table
## 
## Model 1: value ~ weight
## Model 2: value ~ weight + clarity
##   Res.Df  RSS Df Sum of Sq   F Pr(>F)    
## 1    148 5569                            
## 2    147 3221  1      2347 107 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As you can see, the result shows a Df of 1 (indicating that the more complex model has one additional parameter), and a very small p-value (< .001). This means that adding the clarity IV to the model did lead to a significantly improved fit over the model 1.

Next, let’s use anova() to compare model 2 and model 3. This will tell us whether adding color (on top of weight and clarity) further improves the model:

# Compare model 2 to model 3
anova(diamonds.mod2, diamonds.mod3)
## Analysis of Variance Table
## 
## Model 1: value ~ weight + clarity
## Model 2: value ~ weight + clarity + color
##   Res.Df  RSS Df Sum of Sq    F Pr(>F)
## 1    147 3221                         
## 2    146 3187  1        34 1.56   0.21

The result shows a non-significant result (p = 0.21). Thus, we should reject model 3 and stick with model 2 with only 2 IVs.

You don’t need to compare models that only differ in one IV – you can also compare models that differ in multiple DVs. For example, here is a comparison of model 1 (with 1 IV) to model 3 (with 3 IVs):

# Compare model 1 to model 3
anova(diamonds.mod1, diamonds.mod3)
## Analysis of Variance Table
## 
## Model 1: value ~ weight
## Model 2: value ~ weight + clarity + color
##   Res.Df  RSS Df Sum of Sq    F Pr(>F)    
## 1    148 5569                             
## 2    146 3187  2      2381 54.5 <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The result shows that model 3 did indeed provide a significantly better fit to the data compared to model 1. However, as we know from our previous analysis, model 3 is not significantly better than model 2.