Reaction <dbl> | Days <dbl> | Subject <fct> | ||
---|---|---|---|---|
1 | 249.5600 | 0 | 308 | |
2 | 258.7047 | 1 | 308 | |
3 | 250.8006 | 2 | 308 | |
4 | 321.4398 | 3 | 308 | |
5 | 356.8519 | 4 | 308 | |
6 | 414.6901 | 5 | 308 |
4 When s vary among groups: multilevel linear model
Multilevel linear model is also known as random coefficients model (RCM), linear mixed model (LMM), and hierarchical linear model (HLM).
4.1 Multilevel data and heterogeneity (non-dependence)
Following are two real data examples.
- Repeated measure design.
The sleepstudy
data set in the lme4
packages, containing the average reaction time per day (in milliseconds) for 18 subjects in a sleep deprivation study over 10 days (day 0-day 9):
Overall, the average reaction time increase in days. However, we observe great heterogeneity. If we fitted multiple linear regression to the data of each individual, the resulting regression lines vary markedly in that different subject has different intercepts and slopes. More specifically, we observe wider variation in slopes than in intercepts.
- Hierarchical design: nested in cluster.
Data are from Barcikowski (1981), High School & Beyond Survey, a data set from a nationally representative sample of U.S. public and Catholic high schools. 7185 students nested within 160 schools.
- DV: mathach, math achievement,
- IV:
- ses, social economic status,
- sector, public school vs Catholic school.
- ID, school.
ID <int> | mathach <dbl> | ses <dbl> | sector <chr> | |
---|---|---|---|---|
1 | 1224 | 5.876 | -1.528 | Public |
2 | 1224 | 19.708 | -0.588 | Public |
3 | 1224 | 20.349 | -0.528 | Public |
4 | 1224 | 8.781 | -0.668 | Public |
5 | 1224 | 17.898 | -0.158 | Public |
6 | 1224 | 4.583 | 0.022 | Public |
It is clearly shown that different schools have different slopes and intercepts, implying school-level heterogeneity that is completely masked by multiple linear regression.
Heterogeneity sometimes refers to non-dependence. For example, students in the same school share similar environment (same math textbook, same math teacher, same study body, etc.), thus students tend to have math achievement similar to their peers from the same school than those from other school. Put another way, data are independent within school but not independent across school.
So why not fitting multiple linear regression to each group? If so, we wind up with smaller sample size and consequently lower statistical power for each regression models. Multilevel linear model uses all
4.2 The basic multilevel linear model
- Random intercept and random slopes
sleepstudy
data, and sector in Barcikowsk’s data),
- Random intercept only
- Random slopes only
Suppose we only have 1
- Random intercept and random slope

- Random intercept only

- Random slope only

Quiz: why allowing the lower-triangular parts of
4.3 Syntax style
- Mplus: multilevel model
- lme4: linear mixed model
4.4 Variance decomposition and intra-class correlation (ICC)
4.4.1 Variance decomposition
- Random intercept and random slopes
- Random intercept only
- Random slopes only
- Random intercept and random slope
- Random intercept only
- Random random slope only
4.4.2 Intra-class correlation (ICC)
An convenient summary of the “importance” of grouping variables is the proportion of the total variance accounted for, denoted as variance partition coefficient (VPC) sleepstudy
data while fixing the corvariance between the error term , we have
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: Reaction ~ Days + (Days || Subject)
#> Data: sleepstudy
#>
#> REML criterion at convergence: 1743.7
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.9626 -0.4625 0.0204 0.4653 5.1860
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> Subject (Intercept) 627.57 25.051
#> Subject.1 Days 35.86 5.988
#> Residual 653.58 25.565
#> Number of obs: 180, groups: Subject, 18
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 251.405 6.885 18.156 36.513 < 2e-16 ***
#> Days 10.467 1.560 18.156 6.712 2.59e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> Days -0.184
#> Groups Name Std.Dev.
#> Subject (Intercept) 25.0513
#> Subject.1 Days 5.9882
#> Residual 25.5653
Intra-class correlation (ICC) is defined as
Based on ICC we can calculate the reliability of
Reference:
4.5 Effects decomposition and centering
4.5.1 Within effect, between effect, contextual effect
Within effect describes the relationship between within-level background variables and within-level outcome variable. For example, how does student’s ses impact their math achievement within a given school?
Contextual effect describes the impact of the context on the relationship between between-level background variable (i.e. group mean ses) and within-level outcome variable (i.e. math achievement). In the Barcikowski’s data example, contextual effect explores whether two students from different schools with the same level of average ses have different levels of math achievement. Another classic example of contextual effect is the Big Fish-Little Pond effect (Marsh & Parker, 1984). This effect describes the situation in which high achieving students in a school that is low achieving on average, will feel better about their abilities than high achieving students in a school with higher average achievement.
Y | X | Group | Within effect | Contextual effect |
---|---|---|---|---|
Health | Vaccination (1 or 0) | Community | Positive: Within a specific community, getting vaccination decreases individual’s risk of contracting a disease | Positive: Community with higher vaccination rate (average vaccination) decreases its dweller’s risk of contracting a disease |
Profits | Overfishing | Lake | Positive: Within a specific lake, if a professional fisherman exceeded his fishing quota, his profits will increase because of larger catch | Negative: Lake with higer level of overfishing (average overfishing), the profits of all fishmen will decrease because of smaller catches |
Competitive advantage | Innovativeness | Industry | Positive: Within a industry, if a firm was innovative, it can develop valuable capabilities that lead to competitive advantage | Negative: Industry with higher level of innovativeness (average innovativeness) have more innovations, but innovativeness is less likely to lead to competitive advantage |
Performance | Gender (1 or 0) | Team | Zero: Within a specific team, one’s gender has no effect one’s performance | Inverted U-shapde: Team with half mean and half women (medium level of average gender) works best |
Between effect = within effect + contextual effect. Between effect describes the relationship between the group-level average background variables and the group-level average outcome variable. For example, how does a school’s average ses affect the average math achievement of the students in this school?
Multiple linear regression mixes the above effects together. In multilevel linear model, we can extract the three effects and further explain our findings. However, the way how these three effects are decomposed in a multilevel liear model is influenced by whether and how do we center our data.
4.5.2 Centering or not? that is the question.
4.5.2.1 Raw data
Let’s use Barcikowsk’s data as an example. With a multilevel linear model with both random slope and random intercept, fix the covariance between random intercept and random slope at 0, we have
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ ses + (ses || ID)
#> Data: Barcikowsk
#>
#> REML criterion at convergence: 46640.7
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.12410 -0.73160 0.02253 0.75467 2.93201
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> ID (Intercept) 4.853 2.2029
#> ID.1 ses 0.424 0.6511
#> Residual 36.822 6.0681
#> Number of obs: 7185, groups: ID, 160
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 12.6527 0.1903 146.2881 66.50 <2e-16 ***
#> ses 2.3955 0.1184 158.8487 20.23 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> ses 0.000
In Barcikowsk’s data,
To properly decompose them, we need to use
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ ses + mean_group_ses + (ses || ID)
#> Data: Barcikowsk
#>
#> REML criterion at convergence: 46562.4
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.15390 -0.72254 0.01774 0.75562 2.95378
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> ID (Intercept) 2.7129 1.6471
#> ID.1 ses 0.4835 0.6953
#> Residual 36.7793 6.0646
#> Number of obs: 7185, groups: ID, 160
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 12.6759 0.1510 153.1039 83.926 <2e-16 ***
#> ses 2.1923 0.1227 182.5178 17.867 <2e-16 ***
#> mean_group_ses 3.7762 0.3839 182.9834 9.836 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr) ses
#> ses -0.003
#> mean_grp_ss 0.007 -0.258
is the predicted student’s when and , however, it makes no sense for and to be equal to 0, thus the intepretation of is lacking of practical meaning, for which centering can be a helping hand, is clearly the within effect. , meaning that student’s predicted increase 2.1923 with 1-unit increase in controlling , that is, if two students from the same school, remained constant, the difference of their predicted only depend on their s. represents the contextual effect. , meaning that student’s predicted increase 3.7762 with 1-unit increase in school’s controlling , that is, if two student having identical s, the difference of their predicted only depend on the of their school, thus represent the contextual effect.- between effect = 2.1923 + 3.7762 = 5.9685.
4.5.2.2 Centering in multiple linear regression
Centering change the scale of independent variable and facilitate the interpretation of intercept.
Note that we only center background variables and leave the dependent variables in their orginal scale.
For example, suppose we fit a simple linear regression, with raw data we have
In multilevel linear model, there are basically two types of centering strategies for the within level background variables, grand-mean centering (GMC) and group-mean centering, aka cluster-mean centering (CMC). CMC is used throughout this chapter to avoid identical initials.
4.5.2.3 Grand-mean centering (GMC)
Let’s fit a multilevel linear model with both random slope and random intercept to Barcikowsk’s data, and fix the covariance between random intercept and random slope at 0.
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_grand_ses) + (ses || ID)
#> Data: Barcikowsk
#>
#> REML criterion at convergence: 46640.7
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.12410 -0.73160 0.02253 0.75467 2.93201
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> ID (Intercept) 4.853 2.2029
#> ID.1 ses 0.424 0.6511
#> Residual 36.822 6.0681
#> Number of obs: 7185, groups: ID, 160
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 12.6531 0.1903 146.2881 66.50 <2e-16 ***
#> I(ses - mean_grand_ses) 2.3955 0.1184 158.8487 20.23 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> I(-mn_grn_) 0.000
Similarly, without
Note that, it is usually recommended to grand-mean center the group mean variable before adding it to between level for better interpretation.
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_grand_ses) + mean_group_ses_cen + (ses ||
#> ID)
#> Data: Barcikowsk
#>
#> REML criterion at convergence: 46562.4
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.15390 -0.72254 0.01774 0.75562 2.95378
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> ID (Intercept) 2.7129 1.6471
#> ID.1 ses 0.4835 0.6953
#> Residual 36.7793 6.0646
#> Number of obs: 7185, groups: ID, 160
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 12.6767 0.1510 153.1025 83.931 <2e-16 ***
#> I(ses - mean_grand_ses) 2.1923 0.1227 182.5178 17.867 <2e-16 ***
#> mean_group_ses_cen 3.7762 0.3839 182.9834 9.836 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr) I(-m__
#> I(-mn_grn_) -0.003
#> mn_grp_ss_c 0.007 -0.258
is the prediction of when ( is equal to the grand mean) and ( is equal to the average of all schools),- Within effect:
, meaning that student’s predicted increase 2.1923 with 1-unit increase in controlling , that is, if two students from the same school, remained constant, the difference of their predicted only depend on their s. - Contextual effect:
, meaning that student’s predicted increase 3.7762 with 1-unit increase in school’s controlling , that is, if two student having identical s, the difference of their predicted only depend on the of their school, thus represent the contextual effect. - Between effect = 2.1923 + 3.7762 = 5.9685.
4.5.2.4 Cluster-mean centering (CMC)
Centering
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_group_ses) + (ses || ID)
#> Data: Barcikowsk
#>
#> REML criterion at convergence: 46718.2
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.09777 -0.73211 0.01331 0.75450 2.92142
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> ID (Intercept) 8.7395 2.9563
#> ID.1 ses 0.5101 0.7142
#> Residual 36.7703 6.0638
#> Number of obs: 7185, groups: ID, 160
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 12.6320 0.2461 153.8997 51.32 <2e-16 ***
#> I(ses - mean_group_ses) 2.1417 0.1233 162.9470 17.37 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr)
#> I(-mn_grp_) -0.002
- Within effect:
, meaning that with everything else hold at constant, student’s increase 2.1417 with 1-unit increase in , this is constant across different schools.
Noted that with cluster-mean centering,
Add group mean (grand-mean centered) as a between-level predictor we have
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_group_ses) + mean_group_ses_cen + (ses ||
#> ID)
#> Data: Barcikowsk
#>
#> REML criterion at convergence: 46562.4
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -3.15390 -0.72254 0.01774 0.75562 2.95378
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> ID (Intercept) 2.7129 1.6471
#> ID.1 ses 0.4835 0.6953
#> Residual 36.7793 6.0646
#> Number of obs: 7185, groups: ID, 160
#>
#> Fixed effects:
#> Estimate Std. Error df t value Pr(>|t|)
#> (Intercept) 12.6767 0.1510 153.1025 83.93 <2e-16 ***
#> I(ses - mean_group_ses) 2.1923 0.1227 182.5178 17.87 <2e-16 ***
#> mean_group_ses_cen 5.9685 0.3717 156.1926 16.06 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation of Fixed Effects:
#> (Intr) I(-m__
#> I(-mn_grp_) -0.003
#> mn_grp_ss_c 0.006 0.064
is the prediction of when ( is equal to the average of th school) and ( is equal to the average of all school),- Within effect:
, meaning that with everything else hold at constant, student’s increase 2.1923 with 1-unit increase in , which is the effect of on on individual level. - Between effect:
now becomes the between effect. It describes how impacts controlling , that is, if two student’s from two schools having identical s, the difference of their predicted s lies in the of their schools. But why is the between effect rather than the contextual effect? - Contextual effect: Because the identical
s of these two students only impliy that they have the same position with regard to ses within school (e.g., means their s are all 1 sd above the s of their schools), but their school can have different s. Therefore identical s mask the abosolute difference between school s, i.e. the contextual effect. But no worry, the contextual effect is captured by already, thus in this exmple, contextual effect is 5.9685 - 2.1923 = 3.7762.
If you added the group means as a between level predictor, then you will obtain exactly the same within-group effect estimate (and p-value) for your within level predictor regardless of which method of centering you use.
Now let’s explain the two puzzles:
- Why is
a smashed version of within- and between- effect when fitting multilevel linear model to raw data without group mean in between level?
Because
- Why is
an unambiguous estimate of within effect with or without group mean in between level?
Because
4.5.3 Summary
Within level predictor | Group mean as between level predictor | ||
---|---|---|---|
raw | no | Confound within- and between- effect | NA |
GMC | no | Confound within- and between- effect | NA |
CMC | no | Within effect | NA |
raw | yes | Within effect | Contextual effect |
GMC | yes | Within effect | Contextual effect |
CMC | yes | Within effect | Between effect |
- If raw data can equal to 0, no need for centering;
- In a two level model, if group mean variable contained considerable amount of heterogeneity, it should be a between-level predictor.
Reference:
- My advisor told me I should group-mean center my predictors in my multilevel model because it might “make my effects significant” but this doesn’t seem right to me. What exactly is involved in centering predictors within the multilevel model?
- Chapter 8 Centering Options and Interpretations
https://www.bristol.ac.uk/cmm/learning/support/datasets/