Chapter 7 Multi-level Models Part A

7.1 Introduction

7.1.1 Hierarchical Data

Data structures for multile-level models are often hierarchical: Some variables are clustered or nested within other variables.

Examples:

– Children nested within classrooms

– Data points nested within people

– Employees nested within organisations

– Patients nested within hospitals

– Patients nested within doctors nested within clinics

A Two-Level Hierarchy such as children (or case) are organized by classroom. In here, children are said to be nested within classes. The Level 1 variable at the bottom of the hierarchy.

A Three-Level Hierarchy: There are three levels in the hierarchy: the child (level 1), the class to which the child belongs (level 2) and the school within which that class exists (level 3). Hierarchical data structures need not apply only to between-participant situations. We can also think of data as being nested within people. In this situation the case, or person, is not at the bottom of the hierarchy (level 1), but is further up.

7.1.2 Multilevel Model

The models appear in diverse literature under a variety of titles, for example:

  • In R&B, they adopt the term hierarchical linear models.

  • In sociological research, they are often referred to as multilevel linear models (e.g., Goldstein, 1995; Mason et al., 1983).

  • In biometric applications, the terms mixed-effects models and random effects models are common (e.g., Elston & Grizzle, 1962; Laird & Ware, 1982; Singer, 1998).

  • In the econometric literature, they are also called random-coefficient regression models (e.g., Rosenberg, 1973; Longford, 1993)

  • In the statistical literature, they have been referred to as covariance components models (e.g., Dempster, Rubin, & Tsutakawa, 1981; Longford,1987).

7.1.3 Intraclass Correlation (ICC)

Data from the same context will be more similar than data from different contexts:

– E.g. Children in the same class will perform more similarly than children from different classes.

Multilevel regression is not always a superior alternative to OLS regression. How do we decide when random coefficients, which imply the need for contextual variables and cross-level interaction terms, should be used? A simpler way of asking the same question goes like this: Are observations dependent? A good way to find out is to calculate the intraclass correlation coefficient.

The ICC measures the proportion of the total variability in the outcome that is attributable to the classes. ICC is a good gauge of whether a contextual variable has an effect on the outcomes.

How to calculate and interpret?

\[\mathrm{ICC}=\frac{\text { between-group variability }}{\text { between-group variability+within-group variability }}=\frac{\operatorname{Var}\left(u_{0 j}\right)}{\operatorname{Var}\left(u_{0 j}\right)+\operatorname{Var}\left(r_{i j}\right)}=\frac{\tau_{00}}{\tau_{00}+\sigma^{2}}\] If ICC value is large (r = 0.371), this tells us that about 37.1% of the variability in outcome difference occurs between level-2 units, with the remaining 62.9%occurring within level-2. By most standards, this represents a high level of dependence.

7.1.4 Benefits of Multilevel Models

  • Homogeneity of regression slopes
–	Model the variability in regression slopes
  • Assumption of independence
–	Regression for repeated observations: there are situations in which you might want to measure someone on more than one occasion (i.e. over time)
–	Lack of independence creates problems in ANOVA/Regression (correlated residuals)
–	Responses are not independent from each other when they are nested within same group.
  • Missing Data
–	Multilevel models can cope with missing data. We can keep more data than ANOVA/Regression

7.2 R Lab: Running Multilevel models in R

7.2.1 Prepare the data & R packages

## Prepare the dataset
library(haven)
Cosmetic_Surgery <- read_sav("Cosmetic_Surgery.sav")
head(Cosmetic_Surgery)
## # A tibble: 6 x 9
##      ID Post_QoL Base_QoL      Surgery Clinic   Age   BDI        Reason   Gender
##   <dbl>    <dbl>    <dbl>    <dbl+lbl>  <dbl> <dbl> <dbl>     <dbl+lbl> <dbl+lb>
## 1     1     71.3       73 0 [Waiting …      1    31    12 0 [Change Ap… 0 [Fema…
## 2     2     77         74 0 [Waiting …      1    32    16 0 [Change Ap… 0 [Fema…
## 3     3     73         80 0 [Waiting …      1    33    13 0 [Change Ap… 0 [Fema…
## 4     4     68.9       76 0 [Waiting …      1    59    11 0 [Change Ap… 1 [Male]
## 5     5     69         71 0 [Waiting …      1    61    11 0 [Change Ap… 1 [Male]
## 6     6     68.5       72 0 [Waiting …      1    32    10 1 [Physical … 0 [Fema…
## Take a glance of the data
summary(Cosmetic_Surgery)
##        ID            Post_QoL        Base_QoL        Surgery      
##  Min.   :  1.00   Min.   :40.00   Min.   :43.00   Min.   :0.0000  
##  1st Qu.: 69.75   1st Qu.:52.30   1st Qu.:57.00   1st Qu.:0.0000  
##  Median :138.50   Median :58.00   Median :63.00   Median :0.0000  
##  Mean   :138.50   Mean   :59.61   Mean   :63.56   Mean   :0.4746  
##  3rd Qu.:207.25   3rd Qu.:67.00   3rd Qu.:71.00   3rd Qu.:1.0000  
##  Max.   :276.00   Max.   :88.20   Max.   :91.00   Max.   :1.0000  
##      Clinic            Age             BDI            Reason      
##  Min.   : 1.000   Min.   :18.00   Min.   : 0.00   Min.   :0.0000  
##  1st Qu.: 3.000   1st Qu.:31.00   1st Qu.:10.00   1st Qu.:0.0000  
##  Median : 6.000   Median :38.00   Median :20.00   Median :1.0000  
##  Mean   : 5.754   Mean   :39.17   Mean   :23.05   Mean   :0.6449  
##  3rd Qu.: 8.000   3rd Qu.:48.00   3rd Qu.:36.00   3rd Qu.:1.0000  
##  Max.   :10.000   Max.   :65.00   Max.   :63.00   Max.   :1.0000  
##      Gender      
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4312  
##  3rd Qu.:1.0000  
##  Max.   :1.0000
## Or you can use the glimpse function
library(tibble)
glimpse(Cosmetic_Surgery)
## Rows: 276
## Columns: 9
## $ ID       <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1…
## $ Post_QoL <dbl> 71.3, 77.0, 73.0, 68.9, 69.0, 68.5, 70.0, 75.0, 61.5, 68.0, …
## $ Base_QoL <dbl> 73, 74, 80, 76, 71, 72, 71, 73, 80, 64, 71, 72, 68, 65, 66, …
## $ Surgery  <dbl+lbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,…
## $ Clinic   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ Age      <dbl> 31, 32, 33, 59, 61, 32, 33, 35, 25, 55, 57, 29, 31, 32, 43, …
## $ BDI      <dbl> 12, 16, 13, 11, 11, 10, 11, 15, 30, 36, 37, 34, 30, 31, 41, …
## $ Reason   <dbl+lbl> 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1,…
## $ Gender   <dbl+lbl> 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,…

Good news, we don’t have any missing data in our data set!

# Packages for multilevel modeling in R
## There are two famous packages to do the multilevel modeling. One is "nlme", and the other is the "lme4".
## In this class, we will focus on the previous package, which is the "nlme" package.
library(nlme)
## Also, for the graphic display, we need "ggplot2" package
library(ggplot2)
## And for manage the data structure or reshape the data, we need "car" and "reshape" package.
library(car)
## For reshape the data
library(reshape)

7.2.2 Setting up the simple linear model

## At frist, we need to know whether there is a significant difference between clinics in their patient's life quality after the cosmetic surgery.
## We could use a simple regression model here. 
SimpleLinearModel<- lm(Post_QoL~Clinic,data=Cosmetic_Surgery)
summary(SimpleLinearModel)
## 
## Call:
## lm(formula = Post_QoL ~ Clinic, data = Cosmetic_Surgery)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.6125  -6.1622  -0.4786   5.1049  22.0049 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  70.0628     1.0701   65.47   <2e-16 ***
## Clinic       -1.8168     0.1672  -10.87   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.787 on 274 degrees of freedom
## Multiple R-squared:  0.3011,	Adjusted R-squared:  0.2986 
## F-statistic: 118.1 on 1 and 274 DF,  p-value: < 2.2e-16

The output indicates that there is a significant difference between clinics in their patient’s life quality after the cosmetic surgery. F(1,274)=118.1, p<.01.

7.2.3 Setting up an Unconditional Model

Firstly, we need to fit the unconditional model, allowing the intercepts to vary across contexts, in this case we want them to vary across clinics.

# Setting up an Unconditional Model
UnconditionalModel <- lme(Post_QoL ~ 1, data = Cosmetic_Surgery, random =~1|Clinic, method = "ML")
summary(UnconditionalModel)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##        AIC      BIC    logLik
##   1911.473 1922.334 -952.7364
## 
## Random effects:
##  Formula: ~1 | Clinic
##         (Intercept) Residual
## StdDev:    5.909691 7.238677
## 
## Fixed effects: Post_QoL ~ 1 
##                Value Std.Error  DF  t-value p-value
## (Intercept) 60.08377  1.923283 266 31.24022       0
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.8828507 -0.7606631 -0.1378732  0.7075242  2.8607949 
## 
## Number of Observations: 276
## Number of Groups: 10
## To obtain the -2LL the same with our SPSS section, we just need to simply use the function logLik() and multiple "-2" to achieve the -2LL.
logLik(UnconditionalModel)*-2 # 1905.473 (df=3)
## 'log Lik.' 1905.473 (df=3)

7.2.4 Random intercepts model

# Set up a random intercept model using lme() function
RandomInterceptModel <-lme(Post_QoL ~ Surgery + Base_QoL, data = Cosmetic_Surgery, random = ~1|Clinic, method = "ML")
summary(RandomInterceptModel)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##       AIC      BIC   logLik
##   1847.49 1865.592 -918.745
## 
## Random effects:
##  Formula: ~1 | Clinic
##         (Intercept) Residual
## StdDev:    3.039264 6.518986
## 
## Fixed effects: Post_QoL ~ Surgery + Base_QoL 
##                 Value Std.Error  DF   t-value p-value
## (Intercept) 29.563601  3.471879 264  8.515160  0.0000
## Surgery     -0.312999  0.843145 264 -0.371228  0.7108
## Base_QoL     0.478630  0.052774 264  9.069465  0.0000
##  Correlation: 
##          (Intr) Surgry
## Surgery   0.102       
## Base_QoL -0.947 -0.222
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -1.8872666 -0.7537675 -0.0954987  0.5657241  3.0020852 
## 
## Number of Observations: 276
## Number of Groups: 10
# Calculate the -2LL 
logLik(RandomInterceptModel)*-2 #1837.49
## 'log Lik.' 1837.49 (df=5)

Do you still remember the -2LL from our unconditional model? Now we need use it to make the comparison between two models. χ2 Difference = 1905.47 − 1837.49 = 67.98, df_change =5−3=1.

If we look at the critical values for the chi-square statistic with 2 degree of freedom in the Appendix, they are 5.99 (p < .05); therefore, this change is highly significant.We choose random intercepts model.

# A simpler way to do such thing is to use the anova() function. This function compares the change in log likelihood without computing the chi-square change as well as the df change.
anova(RandomInterceptModel,UnconditionalModel)
##                      Model df      AIC      BIC    logLik   Test  L.Ratio
## RandomInterceptModel     1  5 1847.490 1865.592 -918.7450                
## UnconditionalModel       2  3 1911.473 1922.334 -952.7364 1 vs 2 67.98286
##                      p-value
## RandomInterceptModel        
## UnconditionalModel    <.0001

Then, we can use the StdDev in the Model summary to calculate the ICC

RandomInterceptModel_ICC <- 3.04^2/(3.04^2+6.52^2)
RandomInterceptModel_ICC # 17.85
## [1] 0.1785747
# Comparing to our SPSS section, where is the estimates? No worry, use VarCorr() to check these parameters.
VarCorr(RandomInterceptModel)
## Clinic = pdLogChol(1) 
##             Variance  StdDev  
## (Intercept)  9.237126 3.039264
## Residual    42.497179 6.518986
# Not cool enough? No worry, let's build a function to calculate the ICC automatically.
varests <- as.numeric(VarCorr(RandomInterceptModel))
ICClme <- function(out) {
  varests <- as.numeric(VarCorr(out)[1:nrow(VarCorr(out))])
  return(paste("ICC=",varests[1]/ (varests[1]+varests[-1])))
}
# Try the function
ICClme(RandomInterceptModel) # Note, this function could be used to different models. 
## [1] "ICC= 0.178549339746615"
# Check the ICC for our unconditional model
ICClme(UnconditionalModel)
## [1] "ICC= 0.39994610805941"

Base on the anova outcome, χ2(1) = 107.65, p < .0001. We can conclude then that the intercepts vary significantly across the different clinics. Multilevel madness must ensue.

7.2.5 Random intercepts and slopes model

RandomInterceptSlopes <-lme(Post_QoL ~ Surgery + Base_QoL, data = Cosmetic_Surgery, random = ~Surgery|Clinic, method = "ML")
summary(RandomInterceptSlopes)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##        AIC      BIC    logLik
##   1812.624 1837.967 -899.3119
## 
## Random effects:
##  Formula: ~Surgery | Clinic
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 6.132655 (Intr)
## Surgery     6.197489 -0.965
## Residual    5.912335       
## 
## Fixed effects: Post_QoL ~ Surgery + Base_QoL 
##                Value Std.Error  DF   t-value p-value
## (Intercept) 40.10253  3.892945 264 10.301334  0.0000
## Surgery     -0.65453  2.110917 264 -0.310069  0.7568
## Base_QoL     0.31022  0.053506 264  5.797812  0.0000
##  Correlation: 
##          (Intr) Surgry
## Surgery  -0.430       
## Base_QoL -0.855 -0.063
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.4114778 -0.6628574 -0.1138411  0.6833110  2.8334730 
## 
## Number of Observations: 276
## Number of Groups: 10
logLik(RandomInterceptSlopes)*-2 # 1798.624
## 'log Lik.' 1798.624 (df=7)

Compare the model

# Compare this model with the previous model 
anova(RandomInterceptSlopes,RandomInterceptModel,UnconditionalModel)
##                       Model df      AIC      BIC    logLik   Test  L.Ratio
## RandomInterceptSlopes     1  7 1812.624 1837.966 -899.3119                
## RandomInterceptModel      2  5 1847.490 1865.592 -918.7450 1 vs 2 38.86626
## UnconditionalModel        3  3 1911.473 1922.334 -952.7364 2 vs 3 67.98286
##                       p-value
## RandomInterceptSlopes        
## RandomInterceptModel   <.0001
## UnconditionalModel     <.0001
# Check the ICC for this model
VarCorr(RandomInterceptSlopes)
## Clinic = pdLogChol(Surgery) 
##             Variance StdDev   Corr  
## (Intercept) 37.60946 6.132655 (Intr)
## Surgery     38.40887 6.197489 -0.965
## Residual    34.95570 5.912335
ICClme(RandomInterceptSlopes)
## [1] "ICC= 0.494741991832759" "ICC= 0.518285358979433"

Because we have two independent variables for this model, so we have two estimates for this model, leading to two ICC values.

7.2.6 Adding an interaction term to the model

# Watch carefully, we can still use the lme() function. Just add an interaction term in our formula.
InteractionModel <- lme(Post_QoL ~ Surgery + Base_QoL + Reason + Surgery:Reason, data = Cosmetic_Surgery, random = ~Surgery|Clinic, method = "ML")
summary(InteractionModel)
## Linear mixed-effects model fit by maximum likelihood
##  Data: Cosmetic_Surgery 
##        AIC      BIC    logLik
##   1807.045 1839.629 -894.5226
## 
## Random effects:
##  Formula: ~Surgery | Clinic
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev   Corr  
## (Intercept) 5.482364 (Intr)
## Surgery     5.417499 -0.946
## Residual    5.818911       
## 
## Fixed effects: Post_QoL ~ Surgery + Base_QoL + Reason + Surgery:Reason 
##                   Value Std.Error  DF   t-value p-value
## (Intercept)    42.51782  3.875318 262 10.971440  0.0000
## Surgery        -3.18768  2.185368 262 -1.458645  0.1459
## Base_QoL        0.30536  0.053125 262  5.747834  0.0000
## Reason         -3.51515  1.140934 262 -3.080939  0.0023
## Surgery:Reason  4.22129  1.700269 262  2.482718  0.0137
##  Correlation: 
##                (Intr) Surgry Bas_QL Reason
## Surgery        -0.356                     
## Base_QoL       -0.865 -0.078              
## Reason         -0.233  0.306  0.065       
## Surgery:Reason  0.096 -0.505  0.024 -0.661
## 
## Standardized Within-Group Residuals:
##        Min         Q1        Med         Q3        Max 
## -2.2331484 -0.6972193 -0.1541074  0.6326387  3.1641798 
## 
## Number of Observations: 276
## Number of Groups: 10
# Compare this model with the previous model 
anova(InteractionModel,RandomInterceptSlopes,RandomInterceptModel,UnconditionalModel)
##                       Model df      AIC      BIC    logLik   Test  L.Ratio
## InteractionModel          1  9 1807.045 1839.629 -894.5226                
## RandomInterceptSlopes     2  7 1812.624 1837.966 -899.3119 1 vs 2  9.57852
## RandomInterceptModel      3  5 1847.490 1865.592 -918.7450 2 vs 3 38.86626
## UnconditionalModel        4  3 1911.473 1922.334 -952.7364 3 vs 4 67.98286
##                       p-value
## InteractionModel             
## RandomInterceptSlopes  0.0083
## RandomInterceptModel   <.0001
## UnconditionalModel     <.0001
# Calculate the -2LL 
VarCorr(InteractionModel)
## Clinic = pdLogChol(Surgery) 
##             Variance StdDev   Corr  
## (Intercept) 30.05631 5.482364 (Intr)
## Surgery     29.34930 5.417499 -0.946
## Residual    33.85972 5.818911
# Calculate the ICC
ICClme(InteractionModel)
## [1] "ICC= 0.505950700615649" "ICC= 0.470246822276039"

The output shows that adding Reason to the model reduces the -2LL by 9.5782, so we choose the Interaction model.

The ICC for the Interaction model is 47.02%. This tells us that about 47.02% of the variability in outcome difference occurs between level-2 units (clinic), with the remaining 52.98% occurring within level-2.

7.3 Supplementary Learning Materials

  • Field, A (2013). Discovering statistics using IBM SPSS statistics (4th ed.). Los Angeles, CA: Sage Publications

  • Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s just regression. New York: Guilford.

  • Introduction to Multilevel Modeling in R: Link

  • Another Example for Multilevel modeling in R: Link