Chapter 8 Multi-level Models Part B
8.1 Understanding the data set
This data set we are going to use for this lab is (hsbdataset.sav).
This is a cross-sectional data set, no time involved. Data from a nationally representative sample of U.S. public and Catholic high schools. These data are a sub-sample from the 1982 High School and Beyond (HS&B) Survey, and include information on 7185 students nested within 160 schools: 90 public and 70 catholic. Sample sizes averaged about 45 students per school (7185/160 = 44.90625).
8.2 Prepare the data set and review
# Import the dataset into R
library(haven)
hsb <- read_sav("hsbdataset.sav")
# Take a glance at the first 10 lines of the data
head(hsb,10)
## # A tibble: 10 x 9
## school student ses meanses cses mathach sector himinty female
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl+lb> <dbl+lbl> <dbl+l>
## 1 1224 1 -1.53 -0.428 -1.10 5.88 0 [publ… 0 [less than… 1 [fem…
## 2 1224 2 -0.588 -0.428 -0.160 19.7 0 [publ… 0 [less than… 1 [fem…
## 3 1224 3 -0.528 -0.428 -0.100 20.3 0 [publ… 0 [less than… 0 [mal…
## 4 1224 4 -0.668 -0.428 -0.240 8.78 0 [publ… 0 [less than… 0 [mal…
## 5 1224 5 -0.158 -0.428 0.270 17.9 0 [publ… 0 [less than… 0 [mal…
## 6 1224 6 0.0220 -0.428 0.450 4.58 0 [publ… 0 [less than… 0 [mal…
## 7 1224 7 -0.618 -0.428 -0.190 -2.83 0 [publ… 0 [less than… 1 [fem…
## 8 1224 8 -0.998 -0.428 -0.570 0.523 0 [publ… 0 [less than… 0 [mal…
## 9 1224 9 -0.888 -0.428 -0.460 1.53 0 [publ… 0 [less than… 1 [fem…
## 10 1224 10 -0.458 -0.428 -0.0300 21.5 0 [publ… 0 [less than… 0 [mal…
## tibble [7,185 × 9] (S3: tbl_df/tbl/data.frame)
## $ school : num [1:7185] 1224 1224 1224 1224 1224 ...
## ..- attr(*, "format.spss")= chr "F10.0"
## $ student: num [1:7185] 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "format.spss")= chr "F9.2"
## $ ses : num [1:7185] -1.528 -0.588 -0.528 -0.668 -0.158 ...
## ..- attr(*, "format.spss")= chr "F9.2"
## $ meanses: num [1:7185] -0.428 -0.428 -0.428 -0.428 -0.428 ...
## ..- attr(*, "format.spss")= chr "F9.2"
## $ cses : num [1:7185] -1.1 -0.16 -0.1 -0.24 0.27 ...
## ..- attr(*, "format.spss")= chr "F9.2"
## $ mathach: num [1:7185] 5.88 19.71 20.35 8.78 17.9 ...
## ..- attr(*, "format.spss")= chr "F9.2"
## $ sector : dbl+lbl [1:7185] 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## ..@ format.spss: chr "F9.0"
## ..@ labels : Named num [1:2] 0 1
## .. ..- attr(*, "names")= chr [1:2] "public" "catholic"
## $ himinty: dbl+lbl [1:7185] 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
## ..@ format.spss: chr "F9.0"
## ..@ labels : Named num [1:2] 0 1
## .. ..- attr(*, "names")= chr [1:2] "less than 40% minority enrollment" "more than 40% minority enrollment"
## $ female : dbl+lbl [1:7185] 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0,...
## ..@ format.spss: chr "F9.0"
## ..@ labels : Named num [1:2] 0 1
## .. ..- attr(*, "names")= chr [1:2] "male" "female"
## school student ses meanses
## Min. :1224 Min. : 1.00 Min. :-3.758000 Min. :-1.188000
## 1st Qu.:3020 1st Qu.:12.00 1st Qu.:-0.538000 1st Qu.:-0.317000
## Median :5192 Median :23.00 Median : 0.002000 Median : 0.038000
## Mean :5278 Mean :24.51 Mean : 0.000143 Mean : 0.006138
## 3rd Qu.:7342 3rd Qu.:36.00 3rd Qu.: 0.602000 3rd Qu.: 0.333000
## Max. :9586 Max. :67.00 Max. : 2.692000 Max. : 0.831000
## cses mathach sector himinty
## Min. :-3.657000 Min. :-2.832 Min. :0.0000 Min. :0.00
## 1st Qu.:-0.454000 1st Qu.: 7.275 1st Qu.:0.0000 1st Qu.:0.00
## Median : 0.010000 Median :13.131 Median :0.0000 Median :0.00
## Mean :-0.005995 Mean :12.748 Mean :0.4931 Mean :0.28
## 3rd Qu.: 0.463000 3rd Qu.:18.317 3rd Qu.:1.0000 3rd Qu.:1.00
## Max. : 2.850000 Max. :24.993 Max. :1.0000 Max. :1.00
## female
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.5282
## 3rd Qu.:1.0000
## Max. :1.0000
# Students nested within schools
# Three student-level variabels (Level-1): mathach, ses, gender.
# Three school-level variables (Level-2): sector, meanses, himinty.
# Load the statistical package that we needed for the multilevel regression
library(nlme)
A quick Review of the Research Questions:
How much do U.S. high schools vary in their mean math achievement?
Do schools with high MEAN SES also have high math achievement?
Is the strength of association between student CSES and math achievement similar across schools? Or is CSES a better predictor of student math achievement in some schools than others?
How do public and Catholic schools compare in terms of mean math achievement and in terms of the strength of the SES-math achievement relationship, after we control for MEAN SES?
8.3 Centering for continuous X
Basically, we need two centered variable. One is the centered grand mean, the other is the centered group mean.
8.4 Question 1 - How much do U.S. high schools vary in their mean math achievement?
# This is our unconditional model
newhsb$school <-as.factor(newhsb$school)
UnconditionalModel<- lme(mathach~1,random=~1|school,data=newhsb,method="ML")
summary(UnconditionalModel)
## Linear mixed-effects model fit by maximum likelihood
## Data: newhsb
## AIC BIC logLik
## 47121.81 47142.45 -23557.91
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 2.924631 6.256868
##
## Fixed effects: mathach ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 12.63707 0.2436341 7025 51.86905 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.06261752 -0.75364531 0.02675562 0.76070102 2.74183820
##
## Number of Observations: 7185
## Number of Groups: 160
## school = pdLogChol(1)
## Variance StdDev
## (Intercept) 8.553464 2.924631
## Residual 39.148400 6.256868
## 'log Lik.' 47115.81 (df=3)
# Load our ICC calculation function
ICC <- function(out) {
varests <- as.numeric(VarCorr(out)[1:nrow(VarCorr(out))])
return(paste("ICC=",varests[1]/ (varests[1]+varests[-1])))
}
Grand_mean <- mean(newhsb$mathach) # The grand mean is 12.64.
# Calculate the ICC for unconditional model
ICC(UnconditionalModel) # ICC is 17.93%
## [1] "ICC= 0.179310896530165"
8.4.1 Answering Question 1:
A parameter estimate of τ_00 is 8.55. The population variance among the level-2 unit means is 8.55.
The average achievement across all school averages (i.e, the grand mean) is γ00 = 12.74.
ICC is equal to 17.92%. This tells us that about 18% of the variability in math achievement difference occurs between schools, with the remaining 82% occuring within level-2. It’s good to use a multilevel model.
8.5 Question 2 - Do schools with high MEAN SES also have high math achievement?
# To address this question, we need to build a fixed intercept model.
Q2_model <- lme(mathach~meanses,random=~1|school,data=newhsb,method="ML")
summary(Q2_model)
## Linear mixed-effects model fit by maximum likelihood
## Data: newhsb
## AIC BIC logLik
## 46967.11 46994.63 -23479.55
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 1.610337 6.257581
##
## Fixed effects: mathach ~ meanses
## Value Std.Error DF t-value p-value
## (Intercept) 12.649741 0.1483385 7025 85.27619 0
## meanses 5.862922 0.3591754 158 16.32328 0
## Correlation:
## (Intr)
## meanses -0.004
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.13467044 -0.75260775 0.02335172 0.76835668 2.78398911
##
## Number of Observations: 7185
## Number of Groups: 160
## school = pdLogChol(1)
## Variance StdDev
## (Intercept) 2.593186 1.610337
## Residual 39.157321 6.257581
## 'log Lik.' 46959.11 (df=4)
## [1] "ICC= 0.0621114852569335"
## Model df AIC BIC logLik Test L.Ratio p-value
## UnconditionalModel 1 3 47121.81 47142.45 -23557.90
## Q2_model 2 4 46967.11 46994.63 -23479.55 1 vs 2 156.7017 <.0001
8.5.1 Answering Question 2:
- The estimate of \(γ_{00}\) is 12.65, indicating the average math achievement when \(MEAN SES_{j}\) = 0.
The Estimate of \(γ_{01}\) is 5.86, indicating the average math achievement change when \(MEAN SES_{j}\) increases by one unit.
The t value of the MEANSES slope \(γ_{01}\) = 16.32, with df=158, p<0.01. The MEAN SES effect is significant, which means the schools with high MEAN SES also had high math achievement at alpha= .05/.01.
Estimate of \(γ_{00}\) = 2.59. The population variance among the level-2 unit means is less than the one in unconditional model (which is 8.55).Estimate of \(σ^2\) = 39.16, which is the variance of Level-1.
The ICC is 6.21%, which measures the degree of dependence among observations within schools that are of the same MEANSES.
We can also compute the proportion of variance that can be explained by MEANSES. (See details on the class slides)
8.6 Question 3 - Is the strength of association between student CSES and math achievement similar across schools? Or is CSES a better predictor of student math achievement in some schools than others?
# To address this question, we need to build a random intercept and slope model.
Q3_model <- lme(mathach~cses,random=~cses|school,data=newhsb,method="ML")
summary(Q3_model)
## Linear mixed-effects model fit by maximum likelihood
## Data: newhsb
## AIC BIC logLik
## 46722.98 46764.26 -23355.49
##
## Random effects:
## Formula: ~cses | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 2.9361649 (Intr)
## cses 0.8235601 0.021
## Residual 6.0580558
##
## Fixed effects: mathach ~ cses
## Value Std.Error DF t-value p-value
## (Intercept) 12.649430 0.2437710 7024 51.89063 0
## cses 2.193147 0.1278654 7024 17.15200 0
## Correlation:
## (Intr)
## cses 0.012
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.09688018 -0.73200190 0.01796151 0.75444958 2.89904875
##
## Number of Observations: 7185
## Number of Groups: 160
# Estimate of γ_00 = 12.65, which is the average math achievement across the population of schools.
# Estimate of γ_10 = 2.19, which is the average SES-math slope across those schools.
VarCorr(Q3_model)
## school = pdLogChol(cses)
## Variance StdDev Corr
## (Intercept) 8.6210644 2.9361649 (Intr)
## cses 0.6782512 0.8235601 0.021
## Residual 36.7000403 6.0580558
# Estimate of τ_00 is 8.62. This is the population variance among the Level-2 unit means.
# Estimate of τ_11 is 0.68. This is the population variance among the slope.
# Estimate of σ^2 is 36.70. This is the variance among the Level-1.
# In R, we can simply use the anova function to our model. It provides us the F-value and P value of the intercept which is the same with the wald Z test.
anova(Q3_model)
## numDF denDF F-value p-value
## (Intercept) 1 7024 2670.8719 <.0001
## cses 1 7024 294.1913 <.0001
From the result, we could tell the estimated variance among the mean is \(τ_{00}\) is 8.62, F=2670.83, p<.0001.
## 'log Lik.' 46710.98 (df=6)
## [1] "ICC= 0.927064396007809" "ICC= 0.190221850439581"
# Compare the Q3_model with unconditional model and Q2 Model
anova(Q3_model,Q2_model,UnconditionalModel)
## Model df AIC BIC logLik Test L.Ratio p-value
## Q3_model 1 6 46722.98 46764.26 -23355.49
## Q2_model 2 4 46967.11 46994.63 -23479.55 1 vs 2 248.1283 <.0001
## UnconditionalModel 3 3 47121.81 47142.45 -23557.90 2 vs 3 156.7017 <.0001
8.6.1 Answering Question 3:
The t-value of the CSES slope \(γ_{10}\) is 17.15, and significant. We find that, on average, student SES is significantly related (p<.001) to math achievement within schools.
The estimated variamce among the means is \(τ_{00}\) = 8.62 with a F = 2670.831. We infer that highly significant differences exist among the 160 school means (p<.001).
The estimated variamce among the means is \(τ_{11}\) = 0.68 with a F = 294.198. We reject the null hypothesis, in this case that \(τ_{11}\) = 0.68, and infer that the relationship between SES and math achievement within schools does indeed vary significantly across the population of schools.
The ICC is 19.02%. We can also compute the proportion of variance that can be explained by CSES. (Check Slides for details)
8.7 Question 4 - How do public and Catholic schools compare in terms of mean math achievement and in terms of the strength of the SES-math achievement relationship, after we control for MEAN SES?
To address this question, we need to build a random intercept and slope model with two extra interaction terms.
# The difference between Q4 model and Q3 model is, we use group mean centered SES(CSES) variable instead of "SES" for easier interpretation.
Q4_model <- lme(mathach~meanses + sector + Group_CSES + meanses:Group_CSES + sector:Group_CSES, random=~cses|school,data=newhsb,method="ML")
summary(Q4_model)
## Linear mixed-effects model fit by maximum likelihood
## Data: newhsb
## AIC BIC logLik
## 46516.43 46585.23 -23248.21
##
## Random effects:
## Formula: ~cses | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 1.5227318 (Intr)
## cses 0.2552082 0.485
## Residual 6.0598049
##
## Fixed effects: mathach ~ meanses + sector + Group_CSES + meanses:Group_CSES + sector:Group_CSES
## Value Std.Error DF t-value p-value
## (Intercept) 12.096013 0.1969189 7022 61.42638 0e+00
## meanses 5.331709 0.3656829 157 14.58014 0e+00
## sector 1.226729 0.3033724 157 4.04364 1e-04
## Group_CSES 2.939385 0.1535540 7022 19.14236 0e+00
## meanses:Group_CSES 1.042427 0.2961512 7022 3.51992 4e-04
## sector:Group_CSES -1.643907 0.2374503 7022 -6.92317 0e+00
## Correlation:
## (Intr) meanss sector G_CSES m:G_CS
## meanses 0.245
## sector -0.697 -0.356
## Group_CSES 0.075 0.019 -0.053
## meanses:Group_CSES 0.018 0.074 -0.026 0.283
## sector:Group_CSES -0.052 -0.027 0.077 -0.694 -0.351
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.16092420 -0.72444406 0.01678307 0.75488528 2.95805753
##
## Number of Observations: 7185
## Number of Groups: 160
Estimate of \(γ_{00}\) = 12.11,represents the average true intercept for the public school(SECTOR = 0) has MEANSES = 0; It is significantly different from zero.
Estimate of \(γ_{01}\) = 5.33, represents the group performance difference in the intercepts for public school (or MEAN SES effects on school means after controlling for sector effect). It is significantly different from zero; We estimate \(γ_{01}\) to study whether high-SES schools differ from low-SES schools in mean math achievement after controlling for SECTOR.
Estimate of \(γ_{02}\) = 1.23, represents the group performance difference in the intercepts when MEAN SES = 0. It is significantly different from zero; We estimate \(γ_{02}\) to learn whether Catholic schools differ from public schools in terms of the mean achievement once MEAN SES is controlled.
\(γ_{00}\) + \(γ_{02}\) = 12.11+1.21 represents the average true intercept for the private school(SECTOR=1) has MEAN SES = 0; It is significantly different from zero.
## school = pdLogChol(cses)
## Variance StdDev Corr
## (Intercept) 2.31871202 1.5227318 (Intr)
## cses 0.06513123 0.2552082 0.485
## Residual 36.72123598 6.0598049
Estimate of \(τ_{00}\) is 2.31. This is the population variance among the Level-2 unit means. The significant variation in the intercepts remains unexplained even after controlling for SECTOR and MEAN SES.
Estimate of \(τ_{11}\) is 0.07. This is the population variance among the slope. No significant variation in the slopes remains unexplained even after controlling for SECTOR and MEAN SES.
Estimate of \(σ^2\) is 36.72. This is the variance among the Level-1.
In R, we can simply use the anova function to our model. It provides us the F-value and P value of the intercept which is the same with the Wald Z test.
## numDF denDF F-value p-value
## (Intercept) 1 7022 7825.594 <.0001
## meanses 1 157 295.516 <.0001
## sector 1 157 20.809 <.0001
## Group_CSES 1 7022 395.388 <.0001
## meanses:Group_CSES 1 7022 1.353 0.2448
## sector:Group_CSES 1 7022 47.930 <.0001
From the result, we could tell the esitmated variance among the mean is \(τ_{00}\) is 8.62, F=2670.83, p<.0001.
## 'log Lik.' 46496.43 (df=10)
## [1] "ICC= 0.97267805674723" "ICC= 0.0593933173271645"
The CC is 5.94%, this tells us that about 6% of the variability in math achievement difference occurs between schools. It is okay to use a multilevel model.
Compare the Q4_model with Q3_model, unconditional model and Q2 Model.
## Model df AIC BIC logLik Test L.Ratio p-value
## Q4_model 1 10 46516.43 46585.23 -23248.22
## Q3_model 2 6 46722.98 46764.26 -23355.49 1 vs 2 214.5510 <.0001
## Q2_model 3 4 46967.11 46994.63 -23479.55 2 vs 3 248.1283 <.0001
## UnconditionalModel 4 3 47121.81 47142.45 -23557.90 3 vs 4 156.7017 <.0001
8.7.1 Answering Q4:
MEAN SES is positively related to school mean math achievement.
Catholic schools have significantly higher mean achievement than do public schools, controlling for the effect of MEAN SES.
8.8 Supplementary Learning Materials
Field, A. (2013). Discovering statistics using IBM SPSS statistics (4th ed.). Los Angeles, CA: Sage Publications. Chapter 20
Bickel, R. (2007). Multilevel Analysis for Applied Research: It’s just regression. NY: Guilford. Chapters 3 & 4
Raudenbush, S. W. & Bryk, A. S. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods (2nd ed.). CA: Sage Publications. Chapter 4