# 9 Review: Comparing Models (R)

## 9.1 Lab Overview

This lab reviews a fundamental from linear regression we’ll be adding onto: comparing the fit of models!

Research Question: What characteristics are associated with separtist movements in ethnocultural groups?

Lab Files

Packages for this lab: You may need to install packages that you have never used before with install.packages("packagename")

# Basic data cleaning and visualization packages
library(tidyverse)
# Package for Wald Test
library(car)
# Package for Liklihood Ratio Test
library(lmtest)

Setting factor variables
Some quick data cleaning to set these variables as factors

# Set categorical variables as factors
df_MAR <- df_MAR %>%
mutate(
trsntlkin = factor(trsntlkin, labels = c("No Close Kindred",
"Close Kindred, More than 1 Country")),
kinmatsup = factor(kinmatsup, labels = c("No material kinship support",
"Material kinship support")),
kinmilsup = factor(kinmilsup, labels = c("No military kinship support",
"Military kinship support")),
grplang = factor(grplang, labels = c("<10% Speak Native Lang",
"Some Speak Native Lang",
"Most Speak Native Lang"))
)

## 9.2 Wald Test

### Test overview

The Wald test helps you see whether adding a set of variables collectively improves the fit of the model. Said another way, we can see if removing the two or more variables from the model would substantially harm the fit of the model. This would mean it removes something statistically significant to the fit of the model.

Test hypotheses:
* Null hypothesis: the coefficients are all equal to 0 ($$H_0: X1 = X2 = 0$$)
* Alt hypothesis: the coefficients are not all equal to zero ($$H_A: X1 \ or \ X2 \neq 0$$)

Interpretation: If the p value for the test is < .05, then we reject the null hypothesis and the variables significantly explain some piece of the model. It is like the p-value for our coefficients, except for multiple coefficients at the same time.

Even though we are looking at the Wald Test first, it is actually just an approximation of the likelihood ratio test coming next. If that is the case (you might be asking), why would we even learn the Wald test? Well, the advantage of the Wald test is that it only requires that we run a single model, rather than several. If you have a huge, computationally intensive model, the Wald test might be the better choice. Otherwise, many recommend the likelihood ratio test over the Wald Test.

### R code

#### STEP 1: Run your regression

fit1 <- glm(sepkin ~ trsntlkin + kinmatsup + kinmilsup + grplang + pct_ctry,
data = df_MAR, family = binomial(link = "logit"))
summary(fit1)

Call:
glm(formula = sepkin ~ trsntlkin + kinmatsup + kinmilsup + grplang +
pct_ctry, family = binomial(link = "logit"), data = df_MAR)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.7868  -0.8022  -0.7235   1.2534   2.4011

Coefficients:
Estimate Std. Error z value
(Intercept)                                 -3.0587629  0.4824304  -6.340
trsntlkinClose Kindred/Not Adjoining Base    1.8558380  0.4394281   4.223
trsntlkinClose Kindred, Adjoining Base       1.6448396  0.4544475   3.619
trsntlkinClose Kindred, More than 1 Country  2.4308115  0.4495357   5.407
kinmatsupMaterial kinship support           -0.2695913  0.3839679  -0.702
kinmilsupMilitary kinship support            1.5463083  0.4640465   3.332
grplangSome Speak Native Lang                0.2350165  0.2380001   0.987
grplangMost Speak Native Lang                1.0265800  0.2753856   3.728
pct_ctry                                    -0.0002825  0.0061351  -0.046
Pr(>|z|)
(Intercept)                                 2.29e-10 ***
trsntlkinClose Kindred/Not Adjoining Base   2.41e-05 ***
trsntlkinClose Kindred, Adjoining Base      0.000295 ***
trsntlkinClose Kindred, More than 1 Country 6.40e-08 ***
kinmatsupMaterial kinship support           0.482605
kinmilsupMilitary kinship support           0.000862 ***
grplangSome Speak Native Lang               0.323415
grplangMost Speak Native Lang               0.000193 ***
pct_ctry                                    0.963279
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1013.80  on 836  degrees of freedom
Residual deviance:  926.81  on 828  degrees of freedom
(15 observations deleted due to missingness)
AIC: 944.81

Number of Fisher Scoring iterations: 5

#### STEP 2: STEP 2: Use the linearHypothesis command

The R command you will be using is the linearHypothesis command from the car package. You simply have to list the saved model results two or more variables you want to jointly test.

The only tricky part of this command is working with categorical variables. When you are comparing categorical variables you have to specify the category value before the name of the coefficient. I’m copying these directly from the regression results above. We’ll run two examples so you can see what I mean.

Example 1: Material and Military Support from Kin
In this example because both kinmatsup and kinmilsup are binary variables, we want to test whether the coefficients for “Yes” are equal to 0. We can refer to the coefficients using what appears in the regression results: kinmatsupMaterial kinship support and kinmilsupMilitary kinship support.

linearHypothesis(fit1, c("kinmatsupMaterial kinship support = 0",
"kinmilsupMilitary kinship support = 0"))
Linear hypothesis test

Hypothesis:
kinmatsupMaterial kinship support = 0
kinmilsupMilitary kinship support = 0

Model 1: restricted model
Model 2: sepkin ~ trsntlkin + kinmatsup + kinmilsup + grplang + pct_ctry

Res.Df Df  Chisq Pr(>Chisq)
1    830
2    828  2 11.108   0.003871 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 2: All the categories of Language Distinctiveness (grplang)
In this example, we’re only looking at one variable, but because it is a categorical variable with multiple levels we can test to see whether each category, jointly, adds to the model. We will list all the categories of grplang except for the base category. Again, we pull this from the regression results.

linearHypothesis(fit1, c("grplangSome Speak Native Lang",
"grplangMost Speak Native Lang"))
Linear hypothesis test

Hypothesis:
grplangSome Speak Native Lang = 0
grplangMost Speak Native Lang = 0

Model 1: restricted model
Model 2: sepkin ~ trsntlkin + kinmatsup + kinmilsup + grplang + pct_ctry

Res.Df Df  Chisq Pr(>Chisq)
1    830
2    828  2 18.999  7.489e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### STEP 3: Interpret the test

If the p-value for a test is LESS THAN .05 we can reject the null hypothesis that the coefficients are both equal to zero, meaning adding these two variables improves the fit of the model.

For example 1, we can reject the null ($$p = 0.0039$$), adding the two variables improves fit.

For example 2, we can also reject the null ($$p < 0.0001$$), adding the categories of grplang improves fit.

## 9.3 Likelihood Ratio Test

This is one of the most recommended tests to use when comparing models in the generalized linear model family, which contains linear regression, logistic regression, probit regression, ordinal models, poisson, and negative binomial models.

### Test Overview

What is likelihood:
The likelihood of a model refers to the probability that relationship between variables described in our model equation is the true, but unknown real-life relationship between those variables. Said another way, what are the chances that the coefficients from our model could predict the actual observations in our dataset. This is exact likelihood doesn’t tell us much on it’s own. However, it helps regressions choose the set of coefficients that has the highest probability of describing our data. Maximum likelihood estimation, which is what the regressions we are using in this class rely on, calculates coefficients have the greatest likelihood of producing the observed data. Maximum likelihood estimation actually uses the log of the likelihood, because it has some mathematical properties that make it easier to use, which is why you will see the term “log likelihood.” The higher the log-likelihood, the better the model fits the data.

If you really want to dive into the math of this test, the first 14 minutes of this video provide a good, relatively straightforward, explanation of likelihood and log-likelihood. Though beware, there is lots of math notation: https://www.youtube.com/watch?v=ScduwntrMzc

How does the likelihood ratio test work?
Each model that you run will have a log-likelihood. The likelihood ratio test looks to see if the difference between the log-likelihood from two models is statistically significant. If the difference is statistically significant, then one model has a better fit.

Test hypotheses:
* Null hypothesis: the log-likelihoods from one model is equal to the other ($$H_0$$: log-likelihood model 1 - log-likelihood model 2 = 0)
* Alt hypothesis: the log-likelihoods are not equal to one another ($$H_A$$: log-likelihood model 1 - log-likelihood model 2 $$\neq$$ 0)

Interpretation: If the p value for the test is < .05, then we reject the null hypothesis and one model improves the fit of the model.

This is a really math-heavy concept. I wanted to provide an overview for anyone who gets frustrated using these tests without seeing what is in side the black box. If that’s not you…

Here is the takeaway: The likelihood ratio test compares the fit statistic (some number somebody theorized describes how well the model fits the data) from two nested models and tells you if the fit from one is better.

What is a “nested model”:
Two models are “nested” when they have the same dependent variable and the independent variables from one model are a subset of the independent variables in the larger model. Said another way, each model has the same variables, but the second one adds at least one additional independent variable. To be nested the models must also be run on the same sample. This can trip you up when running a likelihood ratio test because R automatically drops missing values during regressions. When you add a new variable that has missing values in your dataset, the larger model might have less cases than the smaller model. You have to make sure you run the smaller model without the cases that the larger model drops.

### R code

#### STEP 1: Drop observations with missing values

There are many ways to do this in R. My preference is to use drop_na() from the tidyverse.

df_MAR <- df_MAR %>%
drop_na(sepkin, trsntlkin, kinmatsup, kinmilsup, grplang, pct_ctry)

#### STEP 2: Run two or more “nested” models and save the results

Now we need to run two or more nested models and save the model results.

Basic model with only key explanatory variable: trsntlkin

fit_a <- glm(sepkin ~ trsntlkin,
data = df_MAR, family = binomial(link = "logit"))
summary(fit_a)

Call:
glm(formula = sepkin ~ trsntlkin, family = binomial(link = "logit"),
data = df_MAR)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.0780  -0.8360  -0.8216   1.2802   2.3926

Coefficients:
Estimate Std. Error z value
(Intercept)                                  -2.8034     0.4204  -6.668
trsntlkinClose Kindred/Not Adjoining Base     1.9319     0.4362   4.429
trsntlkinClose Kindred, Adjoining Base        1.8906     0.4487   4.213
trsntlkinClose Kindred, More than 1 Country   2.5649     0.4469   5.740
Pr(>|z|)
(Intercept)                                 2.60e-11 ***
trsntlkinClose Kindred/Not Adjoining Base   9.47e-06 ***
trsntlkinClose Kindred, Adjoining Base      2.52e-05 ***
trsntlkinClose Kindred, More than 1 Country 9.47e-09 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1013.80  on 836  degrees of freedom
Residual deviance:  959.09  on 833  degrees of freedom
AIC: 967.09

Number of Fisher Scoring iterations: 5

Second model adding grplang and pct_countrykinmatsup

fit_b <- glm(sepkin ~ trsntlkin + grplang + pct_ctry,
data = df_MAR, family = binomial(link = "logit"))
summary(fit_b)

Call:
glm(formula = sepkin ~ trsntlkin + grplang + pct_ctry, family = binomial(link = "logit"),
data = df_MAR)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.3814  -0.8019  -0.7257   1.2542   2.4015

Coefficients:
Estimate Std. Error z value
(Intercept)                                 -3.053297   0.480804  -6.350
trsntlkinClose Kindred/Not Adjoining Base    1.854847   0.438831   4.227
trsntlkinClose Kindred, Adjoining Base       1.763777   0.451477   3.907
trsntlkinClose Kindred, More than 1 Country  2.491353   0.448909   5.550
grplangSome Speak Native Lang                0.231829   0.235364   0.985
grplangMost Speak Native Lang                1.030939   0.272657   3.781
pct_ctry                                    -0.001023   0.006064  -0.169
Pr(>|z|)
(Intercept)                                 2.15e-10 ***
trsntlkinClose Kindred/Not Adjoining Base   2.37e-05 ***
trsntlkinClose Kindred, Adjoining Base      9.36e-05 ***
trsntlkinClose Kindred, More than 1 Country 2.86e-08 ***
grplangSome Speak Native Lang               0.324634
grplangMost Speak Native Lang               0.000156 ***
pct_ctry                                    0.865976
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1013.80  on 836  degrees of freedom
Residual deviance:  938.83  on 830  degrees of freedom
AIC: 952.83

Number of Fisher Scoring iterations: 5

Third model adding kinmatsup and kinmilsup

fit_c <- glm(sepkin ~ trsntlkin + kinmatsup + kinmilsup + grplang + pct_ctry,
data = df_MAR, family = binomial(link = "logit"))
summary(fit_c)

Call:
glm(formula = sepkin ~ trsntlkin + kinmatsup + kinmilsup + grplang +
pct_ctry, family = binomial(link = "logit"), data = df_MAR)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.7868  -0.8022  -0.7235   1.2534   2.4011

Coefficients:
Estimate Std. Error z value
(Intercept)                                 -3.0587629  0.4824304  -6.340
trsntlkinClose Kindred/Not Adjoining Base    1.8558380  0.4394281   4.223
trsntlkinClose Kindred, Adjoining Base       1.6448396  0.4544475   3.619
trsntlkinClose Kindred, More than 1 Country  2.4308115  0.4495357   5.407
kinmatsupMaterial kinship support           -0.2695913  0.3839679  -0.702
kinmilsupMilitary kinship support            1.5463083  0.4640465   3.332
grplangSome Speak Native Lang                0.2350165  0.2380001   0.987
grplangMost Speak Native Lang                1.0265800  0.2753856   3.728
pct_ctry                                    -0.0002825  0.0061351  -0.046
Pr(>|z|)
(Intercept)                                 2.29e-10 ***
trsntlkinClose Kindred/Not Adjoining Base   2.41e-05 ***
trsntlkinClose Kindred, Adjoining Base      0.000295 ***
trsntlkinClose Kindred, More than 1 Country 6.40e-08 ***
kinmatsupMaterial kinship support           0.482605
kinmilsupMilitary kinship support           0.000862 ***
grplangSome Speak Native Lang               0.323415
grplangMost Speak Native Lang               0.000193 ***
pct_ctry                                    0.963279
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1013.80  on 836  degrees of freedom
Residual deviance:  926.81  on 828  degrees of freedom
AIC: 944.81

Number of Fisher Scoring iterations: 5

NOTE: I always give the smaller model names to models with less variables. You might include the number of variables or degrees of freedom as well to help you tell them apart.

#### STEP 3: Run the liklihood ratio test

Here we will use the lrest function from the lmtest package.

The hypothesis we are testing is: Is difference in fit between one model and another model a statistically significant improvement?

• Null hypothesis: The two models have equal fit.
• Alternate hypothesis: One model has a better fit.

If p < .05 then we reject the null!

Is model 2 an improvement upon model 1?
Log-likelihood model 1: -479.54 Log-likelihood model 2: -469.41

lrtest(fit_b, fit_a)
Likelihood ratio test

Model 1: sepkin ~ trsntlkin + grplang + pct_ctry
Model 2: sepkin ~ trsntlkin
#Df  LogLik Df  Chisq Pr(>Chisq)
1   7 -469.41
2   4 -479.54 -3 20.259    0.00015 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lrtest(fit_a, fit_b) # so you can see that order of models doesn't matter
Likelihood ratio test

Model 1: sepkin ~ trsntlkin
Model 2: sepkin ~ trsntlkin + grplang + pct_ctry
#Df  LogLik Df  Chisq Pr(>Chisq)
1   4 -479.54
2   7 -469.41  3 20.259    0.00015 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Reject the null - model 2 is a better fit

Is model 3 an improvement upon model 2?
Log-likelihood model 2: -469.41
Log-likelihood model 3: -463.40

lrtest(fit_c, fit_b)
Likelihood ratio test

Model 1: sepkin ~ trsntlkin + kinmatsup + kinmilsup + grplang + pct_ctry
Model 2: sepkin ~ trsntlkin + grplang + pct_ctry
#Df  LogLik Df  Chisq Pr(>Chisq)
1   9 -463.40
2   7 -469.41 -2 12.021   0.002452 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Reject the null - model 3 is a better fit

Almost always the larger model will have the larger log-likelihood, so you are testing if that larger log-likelihood is a statistically significant improvement. But remember the test is telling you whether the difference is significant, so check to see which log-likelihood is larger, and that is the one with the better fit if you have a significant p-value.

## 9.4 AIC and BIC

### Test Overview

AIC stands for Akaike Information criteria. BIC stands for Bayesian Information Criteria. Both are forms of probabilistic model selection. Probabilistic model selection or “information criteria” is a type of analysis that creates a score for your model to help you compare model fit and parsimony. You may have seen the AIC at the bottom of your glm() summary.

What is each test doing?
Both of these tests also rely on log-likelihood statistics and Maximum Likelihood Estimation (MLE). Without going into a full mathematical explanation of MLE and the log liklihood functions it uses, I will simply state that both AIC and BIC come up with a formula using these underlying statistics to create a score for your model. That score says how well your model fits the data.

What is the difference between AIC and BIC
There is a really technical answer to this about how AIC is looking for a hypothetical ideal model that doesn’t actually exist and BIC is looking for the best model among actual options. To be honest, I do not know the full theory behind it. The most meaningful difference is that BIC penalizes complicated models with more covariates more than AIC does. So if you are looking for the simplest model, BIC is usually recommended. Most people look at both, but if you really want to choose, people probably rely on BIC more. They will almost always lead to the same conclusion.

AIC and BIC are extremely easy tests to run after a regression, which is why I find them so helpful to compare models. The lower the AIC or BIC, the better the model!

### R code

#### STEP 1: Run a couple regressions

These models do not need to be nested, but they MUST be run on the same set of observations (follow the same procedure in the lrtest code to drop any observations with missing values before you run your regression).

First model:

fit_b <- glm(sepkin ~ trsntlkin + grplang + pct_ctry,
data = df_MAR, family = binomial(link = "logit"))
summary(fit_b)

Call:
glm(formula = sepkin ~ trsntlkin + grplang + pct_ctry, family = binomial(link = "logit"),
data = df_MAR)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.3814  -0.8019  -0.7257   1.2542   2.4015

Coefficients:
Estimate Std. Error z value
(Intercept)                                 -3.053297   0.480804  -6.350
trsntlkinClose Kindred/Not Adjoining Base    1.854847   0.438831   4.227
trsntlkinClose Kindred, Adjoining Base       1.763777   0.451477   3.907
trsntlkinClose Kindred, More than 1 Country  2.491353   0.448909   5.550
grplangSome Speak Native Lang                0.231829   0.235364   0.985
grplangMost Speak Native Lang                1.030939   0.272657   3.781
pct_ctry                                    -0.001023   0.006064  -0.169
Pr(>|z|)
(Intercept)                                 2.15e-10 ***
trsntlkinClose Kindred/Not Adjoining Base   2.37e-05 ***
trsntlkinClose Kindred, Adjoining Base      9.36e-05 ***
trsntlkinClose Kindred, More than 1 Country 2.86e-08 ***
grplangSome Speak Native Lang               0.324634
grplangMost Speak Native Lang               0.000156 ***
pct_ctry                                    0.865976
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1013.80  on 836  degrees of freedom
Residual deviance:  938.83  on 830  degrees of freedom
AIC: 952.83

Number of Fisher Scoring iterations: 5

Second model:

fit_d <- glm(sepkin ~ kinmatsup + kinmilsup,
data = df_MAR, family = binomial(link = "logit"))
summary(fit_d)

Call:
glm(formula = sepkin ~ kinmatsup + kinmilsup, family = binomial(link = "logit"),
data = df_MAR)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.5197  -0.8202  -0.8202   1.5830   1.7271

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)                       -0.91655    0.07924 -11.566  < 2e-16 ***
kinmatsupMaterial kinship support -0.31986    0.37158  -0.861 0.389330
kinmilsupMilitary kinship support  1.69269    0.44906   3.769 0.000164 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1013.80  on 836  degrees of freedom
Residual deviance:  998.35  on 834  degrees of freedom
AIC: 1004.3

Number of Fisher Scoring iterations: 4

#### STEP 2: Note the AIC at the bottom of the regression results

Remember: smaller = better fit.

Model 1
AIC: 952.83

Model 2
AIC: 1004.3

Interpretation: Model 1 is the better fit

#### Calculate the BIC using the BIC() funciton

This function is built into R’s base statistics. It does not require loading a package. Remember: smaller = better fit

Model 1

BIC(fit_b)
[1] 985.9374

Model 2

BIC(fit_d)
[1] 1018.535

Model 1 has a better overall fit!

## 9.5 Class Activity

For lab today you will be put into two groups. In the first group, you will be assigned one of the three comparison procedures described in this lab (Wald, Likelihood Ratio, or AIC/BIC). You will download the activity script file and compare two models using the method you were assigned. You will see a base model and it is your group’s job to create a second model and compare the goodness of fit between the base model and your model.

After your group finishes your analysis you will move to your second group. In this group, each person will have worked with a different comparison test. Each member of the group should describe the fit test that you ran and the results. You should also explain any part of the coding that you found tricky to your group members.