4  When βs vary among groups: multilevel linear model

Multilevel linear model is also known as random coefficients model (RCM), linear mixed model (LMM), and hierarchical linear model (HLM).

4.1 Multilevel data and heterogeneity (non-dependence)

Following are two real data examples.

  1. Repeated measure design.

The sleepstudy data set in the lme4 packages, containing the average reaction time per day (in milliseconds) for 18 subjects in a sleep deprivation study over 10 days (day 0-day 9):

ABCDEFGHIJ0123456789
 
 
Reaction
<dbl>
Days
<dbl>
Subject
<fct>
1249.56000308
2258.70471308
3250.80062308
4321.43983308
5356.85194308
6414.69015308

Overall, the average reaction time increase in days. However, we observe great heterogeneity. If we fitted multiple linear regression to the data of each individual, the resulting regression lines vary markedly in that different subject has different intercepts and slopes. More specifically, we observe wider variation in slopes than in intercepts.

  1. Hierarchical design: nested in cluster.

Data are from Barcikowski (), High School & Beyond Survey, a data set from a nationally representative sample of U.S. public and Catholic high schools. 7185 students nested within 160 schools.

  • DV: mathach, math achievement,
  • IV:
    • ses, social economic status,
    • sector, public school vs Catholic school.
    • ID, school.
ABCDEFGHIJ0123456789
 
 
ID
<int>
mathach
<dbl>
ses
<dbl>
sector
<chr>
112245.876-1.528Public
2122419.708-0.588Public
3122420.349-0.528Public
412248.781-0.668Public
5122417.898-0.158Public
612244.5830.022Public

It is clearly shown that different schools have different slopes and intercepts, implying school-level heterogeneity that is completely masked by multiple linear regression.

Heterogeneity sometimes refers to non-dependence. For example, students in the same school share similar environment (same math textbook, same math teacher, same study body, etc.), thus students tend to have math achievement similar to their peers from the same school than those from other school. Put another way, data are independent within school but not independent across school.

So why not fitting multiple linear regression to each group? If so, we wind up with smaller sample size and consequently lower statistical power for each regression models. Multilevel linear model uses all n observations and takes heterogeneity into account by modeling regression coefficients (intercept and slopes) as random variables. Not surprisingly, multilevel linear model enjoys better model fit due to incorporating underlying heterogeneity of data.

4.2 The basic multilevel linear model

  • Random intercept and random slopes

Level 1: yij=β0j+β1jx1ij++βpjxpij+ϵijLevel 2: β0j=γ00+u0jLevel 2: β1j=γ10+u1jLevel 2: βpj=γp0+upjMixed: yij=γ00+(γ10x1ij++γp0xpij)+(u1jx1ij++upjxpij)+u0j+ϵij, where blue terms are fixed effects, red terms are random effects, j=1,,m represents grouping variable (i.e., subject in sleepstudy data, and sector in Barcikowsk’s data), i=1,,nj represents observation (i.e., subject) within jth group, total sample size n=n1++nm, x1, , xp are p independent variables, y is dependent variable, ϵijiidN(0,σϵ2), (u0ju1jupj)MVN(0,Σu),Σu=[σu0j2σu1j,u0jσu1j2σupj,u0jσupj,u1jσupj2]. In this model,

  • Random intercept only

Level 1: yij=β0j+β1x1ij++βpxpij+ϵijLevel 2: β0j=γ00+u0jMixed: yij=γ00+β1x1ij++βpxpij+u0j+ϵij, where u0jN(0,σu0j2).

  • Random slopes only

Level 1: yij=β0+β1jx1ij++βpjxpij+ϵijLevel 2: β1j=γ10+u1jLevel 2: βpj=γp0+upjMixed: yij=β0+(γ10x1ij++γp0xpij)+(u1jx1ij++upjxpij)+ϵij, where (u1jupj)MVN(0,Σu),Σu=[σu1j2σupj,u1jσupj2].

Suppose we only have 1 x and 1 y:

  • Random intercept and random slope

Level 1: yij=β0j+β1jxij+ϵijLevel 2: β0j=γ00+u0jLevel 2: β1j=γ10+u1jMixed: yij=γ00+γ10xij+u1jxij+u0j+ϵij, where (u0ju1j)[σu0j2σu1j,u0jσu1j2].

  • Random intercept only

Level 1: yij=β0j+β1xij+ϵijLevel 2: β0j=γ00+u0jMixed: yij=γ00+β1xij+u0j+ϵij, where u0jN(0,σu0j2).

  • Random slope only

Level 1: yij=β0+β1jxij+ϵijLevel 2: β1j=γ10+u1jMixed: yij=β0+γ10xij+u1jxij+ϵij, where u1jN(0,σu1j2).

Quiz: why allowing the lower-triangular parts of Σu to be freely estimated?

4.3 Syntax style

  • Mplus: multilevel model
  • lme4: linear mixed model

4.4 Variance decomposition and intra-class correlation (ICC)

4.4.1 Variance decomposition

  • Random intercept and random slopes

Mixed: yij=γ00+(γ10x1ij++γp0xpij)+(u1jx1ij++upjxpij)+u0j+ϵijVar(yij)=Var(u1jx1ij++upjxpij+u0j+ϵij)=x1ij2σu1j2++xpij2σupj2+σu0j2+2k=k+1pk=1p1xkijxkijσukj,ukj+2kpxkijσukj,u0j+σϵij2=σB2+σW2. where σB2 is the between-group variance and σW2 is the within-group variance, since eij is random error, it is assumed to be independent to all other random terms, therefore, σu0j,eij=σukj,eij=0.

  • Random intercept only

Mixed: yij=γ00+β1x1ij++βpxpij+u0j+ϵijVar(yij)=Var(u0j+ϵij)=σu0j2+σϵij2.

  • Random slopes only

Mixed: yij=β0+(γ10x1ij++γp0xpij)+(u1jx1ij++upjxpij)+ϵijVar(yij)=Var(u1jx1ij++upjxpij+ϵij)=x1ij2σu1j2++xpij2σupj2+2k=k+1pk=1p1xkijxkijσukj,ukj+σϵij2. Suppose we only have 1 x and 1 y:

  • Random intercept and random slope

Mixed: yij=γ00+γ10xij+u1jxij+u0j+ϵijVar(yij)=Var(u1jxij+u0j+ϵij)=xij2σu1j2+σu0j2+2xijσu1j,u0j+σϵij2.

  • Random intercept only

Mixed: yij=γ00+β1xij+u0j+ϵijVar(yij)=Var(u0j+ϵij)=σu0j2+σϵij2.

  • Random random slope only

Mixed: yij=β0+γ10xij+u1jxij+ϵijVar(yij)=Var(u1jxij+ϵij)=xij2σu1j2+σϵij2.

4.4.2 Intra-class correlation (ICC)

An convenient summary of the “importance” of grouping variables is the proportion of the total variance accounted for, denoted as variance partition coefficient (VPC) VPC=σB2σB2+σW2. For multilevel linear model with random slopes, VPC becomes a function of xs. For example, by fitting a multilevel model with both random intercept and slope to the sleepstudy data while fixing the corvariance between the error term , we have Level 1: Reactionij=β0j+β1jDaysij+ϵijLevel 2: β0j=γ00+u0jLevel 2: β1j=γ10+u1j, σu0j,u1j=0. Base on the estimated parameters, we can draw the following scatter plot,

#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: Reaction ~ Days + (Days || Subject)
#>    Data: sleepstudy
#> 
#> REML criterion at convergence: 1743.7
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -3.9626 -0.4625  0.0204  0.4653  5.1860 
#> 
#> Random effects:
#>  Groups    Name        Variance Std.Dev.
#>  Subject   (Intercept) 627.57   25.051  
#>  Subject.1 Days         35.86    5.988  
#>  Residual              653.58   25.565  
#> Number of obs: 180, groups:  Subject, 18
#> 
#> Fixed effects:
#>             Estimate Std. Error      df t value Pr(>|t|)    
#> (Intercept)  251.405      6.885  18.156  36.513  < 2e-16 ***
#> Days          10.467      1.560  18.156   6.712 2.59e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>      (Intr)
#> Days -0.184
#>  Groups    Name        Std.Dev.
#>  Subject   (Intercept) 25.0513 
#>  Subject.1 Days         5.9882 
#>  Residual              25.5653

Intra-class correlation (ICC) is defined as ICC=σu0j2σu0j2+σW2. For a random intercept only model, ICC = VPC, otherwise ICC is a biased estimator of VPC, thus the common practice is reporting variance components.

Based on ICC we can calculate the reliability of jth group (, equation 3.5) λj=σu0j2σu0j2+σW2/nj.

Reference:

4.5 Effects decomposition and centering

4.5.1 Within effect, between effect, contextual effect

Within effect describes the relationship between within-level background variables and within-level outcome variable. For example, how does student’s ses impact their math achievement within a given school?

Contextual effect describes the impact of the context on the relationship between between-level background variable (i.e. group mean ses) and within-level outcome variable (i.e. math achievement). In the Barcikowski’s data example, contextual effect explores whether two students from different schools with the same level of average ses have different levels of math achievement. Another classic example of contextual effect is the Big Fish-Little Pond effect (). This effect describes the situation in which high achieving students in a school that is low achieving on average, will feel better about their abilities than high achieving students in a school with higher average achievement.

Examples from Rönkkö’s video on within effect, between effect, contextual effect, and population average effect
Y X Group Within effect Contextual effect
Health Vaccination (1 or 0) Community Positive: Within a specific community, getting vaccination decreases individual’s risk of contracting a disease Positive: Community with higher vaccination rate (average vaccination) decreases its dweller’s risk of contracting a disease
Profits Overfishing Lake Positive: Within a specific lake, if a professional fisherman exceeded his fishing quota, his profits will increase because of larger catch Negative: Lake with higer level of overfishing (average overfishing), the profits of all fishmen will decrease because of smaller catches
Competitive advantage Innovativeness Industry Positive: Within a industry, if a firm was innovative, it can develop valuable capabilities that lead to competitive advantage Negative: Industry with higher level of innovativeness (average innovativeness) have more innovations, but innovativeness is less likely to lead to competitive advantage
Performance Gender (1 or 0) Team Zero: Within a specific team, one’s gender has no effect one’s performance Inverted U-shapde: Team with half mean and half women (medium level of average gender) works best

Between effect = within effect + contextual effect. Between effect describes the relationship between the group-level average background variables and the group-level average outcome variable. For example, how does a school’s average ses affect the average math achievement of the students in this school?

Multiple linear regression mixes the above effects together. In multilevel linear model, we can extract the three effects and further explain our findings. However, the way how these three effects are decomposed in a multilevel liear model is influenced by whether and how do we center our data.

4.5.2 Centering or not? that is the question.

4.5.2.1 Raw data

Let’s use Barcikowsk’s data as an example. With a multilevel linear model with both random slope and random intercept, fix the covariance between random intercept and random slope at 0, we have Level 1: mathachij=β0j+β1jsesij+ϵijLevel 2: β0j=γ00+u0jLevel 2: β1j=γ10+u1jMixed: mathachij=γ00+γ10sesij+u1jsesij+u0j+ϵij,

#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ ses + (ses || ID)
#>    Data: Barcikowsk
#> 
#> REML criterion at convergence: 46640.7
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -3.12410 -0.73160  0.02253  0.75467  2.93201 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  ID       (Intercept)  4.853   2.2029  
#>  ID.1     ses          0.424   0.6511  
#>  Residual             36.822   6.0681  
#> Number of obs: 7185, groups:  ID, 160
#> 
#> Fixed effects:
#>             Estimate Std. Error       df t value Pr(>|t|)    
#> (Intercept)  12.6527     0.1903 146.2881   66.50   <2e-16 ***
#> ses           2.3955     0.1184 158.8487   20.23   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>     (Intr)
#> ses 0.000

In Barcikowsk’s data, γ10^ is 2.3955, meaning that student’s mathach increase 2.3955 with 1-unit increase in ses. However, this effect of ses on mathach is actually a confounded version of the within- and between- effects. We shall leave the explanation to the end of .

To properly decompose them, we need to use meansesj as a between level predictor, Level 1: mathachij=β0j+β1jsesij+ϵijLevel 2: β0j=γ00+b01meansesj+u0jLevel 2: β1j=γ10+u1jMixed: mathachij=γ00+γ10sesij+b01meansesj+u1jsesij+u0j+ϵij.

#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ ses + mean_group_ses + (ses || ID)
#>    Data: Barcikowsk
#> 
#> REML criterion at convergence: 46562.4
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -3.15390 -0.72254  0.01774  0.75562  2.95378 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  ID       (Intercept)  2.7129  1.6471  
#>  ID.1     ses          0.4835  0.6953  
#>  Residual             36.7793  6.0646  
#> Number of obs: 7185, groups:  ID, 160
#> 
#> Fixed effects:
#>                Estimate Std. Error       df t value Pr(>|t|)    
#> (Intercept)     12.6759     0.1510 153.1039  83.926   <2e-16 ***
#> ses              2.1923     0.1227 182.5178  17.867   <2e-16 ***
#> mean_group_ses   3.7762     0.3839 182.9834   9.836   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr) ses   
#> ses         -0.003       
#> mean_grp_ss  0.007 -0.258
  • γ00^=12.6759 is the predicted student’s mathach when ses=0 and meanses=0, however, it makes no sense for ses and meanses to be equal to 0, thus the intepretation of γ00^ is lacking of practical meaning, for which centering can be a helping hand,
  • γ10 is clearly the within effect. γ10^=2.1923, meaning that student’s predicted mathach increase 2.1923 with 1-unit increase in ses controlling meanses, that is, if two students from the same school, meanses remained constant, the difference of their predicted mathach only depend on their sess.
  • b01 represents the contextual effect. b01^=3.7762, meaning that student’s predicted mathach increase 3.7762 with 1-unit increase in school’s meanses controlling ses, that is, if two student having identical sess, the difference of their predicted mathach only depend on the meanses of their school, thus b01 represent the contextual effect.
  • between effect = 2.1923 + 3.7762 = 5.9685.

4.5.2.2 Centering in multiple linear regression

Centering change the scale of independent variable and facilitate the interpretation of intercept.

Note

Note that we only center background variables and leave the dependent variables in their orginal scale.

For example, suppose we fit a simple linear regression, with raw data we have yi=β0+β1xi+ϵi, of which the intercept is interpretated as the prediction of y when x=0. After centering x we have yi=β0+β1(xix¯)+ϵiyi=β0+β1(xci)+ϵi, of which the intercept is interpretated as the prediction of y when xci=0, i.e. xi=x¯.

In multilevel linear model, there are basically two types of centering strategies for the within level background variables, grand-mean centering (GMC) and group-mean centering, aka cluster-mean centering (CMC). CMC is used throughout this chapter to avoid identical initials.

4.5.2.3 Grand-mean centering (GMC)

Let’s fit a multilevel linear model with both random slope and random intercept to Barcikowsk’s data, and fix the covariance between random intercept and random slope at 0. ses now is grand-mean centered. Level 1: mathachij=β0j+β1jsesGMCij+ϵijLevel 2: β0j=γ00+u0jLevel 2: β1j=γ10+u1jMixed: yij=γ00+γ10sesGMCij+u1jsesGMCij+u0j+ϵij, γ00 represents the predicted mathach when ses equal to the grand mean, γ10 represents the change in mathach given 1-unit change in sesGMC.

#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_grand_ses) + (ses || ID)
#>    Data: Barcikowsk
#> 
#> REML criterion at convergence: 46640.7
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -3.12410 -0.73160  0.02253  0.75467  2.93201 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  ID       (Intercept)  4.853   2.2029  
#>  ID.1     ses          0.424   0.6511  
#>  Residual             36.822   6.0681  
#> Number of obs: 7185, groups:  ID, 160
#> 
#> Fixed effects:
#>                         Estimate Std. Error       df t value Pr(>|t|)    
#> (Intercept)              12.6531     0.1903 146.2881   66.50   <2e-16 ***
#> I(ses - mean_grand_ses)   2.3955     0.1184 158.8487   20.23   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr)
#> I(-mn_grn_) 0.000

Similarly, without meanses in the between level, we are unable to decompose within- and between- effect.

Note

Note that, it is usually recommended to grand-mean center the group mean variable before adding it to between level for better interpretation.

Level 1: mathachij=β0j+β1jsesGMCij+ϵijLevel 2: β0j=γ00+b01meansesj+u0jLevel 2: β1j=γ10+u1jMixed: yij=γ00+γ10sesGMCij+b01meansesGMCj+u1jsesGMCij+u0j+ϵij.

#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_grand_ses) + mean_group_ses_cen + (ses ||  
#>     ID)
#>    Data: Barcikowsk
#> 
#> REML criterion at convergence: 46562.4
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -3.15390 -0.72254  0.01774  0.75562  2.95378 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  ID       (Intercept)  2.7129  1.6471  
#>  ID.1     ses          0.4835  0.6953  
#>  Residual             36.7793  6.0646  
#> Number of obs: 7185, groups:  ID, 160
#> 
#> Fixed effects:
#>                         Estimate Std. Error       df t value Pr(>|t|)    
#> (Intercept)              12.6767     0.1510 153.1025  83.931   <2e-16 ***
#> I(ses - mean_grand_ses)   2.1923     0.1227 182.5178  17.867   <2e-16 ***
#> mean_group_ses_cen        3.7762     0.3839 182.9834   9.836   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr) I(-m__
#> I(-mn_grn_) -0.003       
#> mn_grp_ss_c  0.007 -0.258
  • γ00^=12.6767 is the prediction of mathach when sesGMC=0 (sesij is equal to the grand mean) and meansesGMC=0 (meanses is equal to the average meanses of all schools),
  • Within effect: γ10^=2.1923, meaning that student’s predicted mathach increase 2.1923 with 1-unit increase in sesGMC controlling meansesGMC, that is, if two students from the same school, meansesGMC remained constant, the difference of their predicted mathach only depend on their sesGMCs.
  • Contextual effect: b01^=3.7762, meaning that student’s predicted mathach increase 3.7762 with 1-unit increase in school’s meansesGMC controlling sesGMC, that is, if two student having identical sesGMCs, the difference of their predicted mathach only depend on the meansesGMC of their school, thus b01 represent the contextual effect.
  • Between effect = 2.1923 + 3.7762 = 5.9685.

4.5.2.4 Cluster-mean centering (CMC)

Centering ses with cluster-mean we have Level 1: mathachij=β0j+β1jsesCMCij+ϵijLevel 2: β0j=γ00+u0jLevel 2: β1j=γ10+u1jMixed: yij=γ00+γ10sesCMCij+u1jsesCMCij+u0j+ϵij,

#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_group_ses) + (ses || ID)
#>    Data: Barcikowsk
#> 
#> REML criterion at convergence: 46718.2
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -3.09777 -0.73211  0.01331  0.75450  2.92142 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  ID       (Intercept)  8.7395  2.9563  
#>  ID.1     ses          0.5101  0.7142  
#>  Residual             36.7703  6.0638  
#> Number of obs: 7185, groups:  ID, 160
#> 
#> Fixed effects:
#>                         Estimate Std. Error       df t value Pr(>|t|)    
#> (Intercept)              12.6320     0.2461 153.8997   51.32   <2e-16 ***
#> I(ses - mean_group_ses)   2.1417     0.1233 162.9470   17.37   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr)
#> I(-mn_grp_) -0.002
  • Within effect: γ10^=2.1417, meaning that with everything else hold at constant, student’s mathach increase 2.1417 with 1-unit increase in sesCMC, this is constant across different schools.
Note

Noted that with cluster-mean centering, γ10 is not a confounded version of the within- and between- effects anymore, instead it is an unambiguous estimate of within effect. Again, we shall leave the explanation to the end of .

Add group mean (grand-mean centered) as a between-level predictor we have Level 1: mathachij=β0j+β1jsesCMCij+ϵijLevel 2: β0j=γ00+b01meansesGMCj+u0jLevel 2: β1j=γ10+u1jMixed: yij=γ00+γ10sesCMCij+b01meansesGMCj+u1jsesCMCij+u0j+ϵij.

#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: mathach ~ I(ses - mean_group_ses) + mean_group_ses_cen + (ses ||  
#>     ID)
#>    Data: Barcikowsk
#> 
#> REML criterion at convergence: 46562.4
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -3.15390 -0.72254  0.01774  0.75562  2.95378 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  ID       (Intercept)  2.7129  1.6471  
#>  ID.1     ses          0.4835  0.6953  
#>  Residual             36.7793  6.0646  
#> Number of obs: 7185, groups:  ID, 160
#> 
#> Fixed effects:
#>                         Estimate Std. Error       df t value Pr(>|t|)    
#> (Intercept)              12.6767     0.1510 153.1025   83.93   <2e-16 ***
#> I(ses - mean_group_ses)   2.1923     0.1227 182.5178   17.87   <2e-16 ***
#> mean_group_ses_cen        5.9685     0.3717 156.1926   16.06   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr) I(-m__
#> I(-mn_grp_) -0.003       
#> mn_grp_ss_c  0.006  0.064
  • γ00^=12.6767 is the prediction of mathachij when sesCMCij=0 (sesij is equal to the average ses of jth school) and meansesGMCj=0 (meansesj is equal to the average meanses of all school),
  • Within effect: γ10^=2.1923, meaning that with everything else hold at constant, student’s mathach increase 2.1923 with 1-unit increase in sesCMC, which is the effect of ses on mathach on individual level.
  • Between effect: b01^=5.9685 now becomes the between effect. It describes how meanses impacts mathach controlling sesCMC, that is, if two student’s from two schools having identical sesCMCs, the difference of their predicted mathachs lies in the meanses of their schools. But why is b01 the between effect rather than the contextual effect?
  • Contextual effect: Because the identical sesCMCs of these two students only impliy that they have the same position with regard to ses within school (e.g., sesCMC=1 means their sess are all 1 sd above the meansess of their schools), but their school can have different meansess. Therefore identical sesCMCs mask the abosolute difference between school meansess, i.e. the contextual effect. But no worry, the contextual effect is captured by b01 already, Mixed: yij=γ00+γ10sesCMCij+b01meansesGMCj+u1jsesCMCij+u0j+ϵij=γ00+γ10sesCMCij+(b01γ10+γ10)meansesGMCj+u1jsesCMCij+u0j+ϵij=γ00+γ10(sesCMCij+meansesGMCj)+(b01γ10)meansesGMCj+u1jsesCMCij+u0j+ϵij=γ00+γ10sesGMCij+(b01γ10)meansesGMCj+u1jsesCMCij+u0j+ϵij, thus in this exmple, contextual effect is 5.9685 - 2.1923 = 3.7762.
Note

If you added the group means as a between level predictor, then you will obtain exactly the same within-group effect estimate (and p-value) for your within level predictor regardless of which method of centering you use.

Now let’s explain the two puzzles:

  1. Why is γ01 a smashed version of within- and between- effect when fitting multilevel linear model to raw data without group mean in between level?

Because sesij=(sesijmeansesj)+meansesj=sesCMCij+meansesj, thus the multilevel linear model with random intercept and random slope can be rewrriten as Mixed: mathachij=γ00+γ10sesij+u1jsesij+u0j+ϵijMixed: mathachij=γ00+γ10sesCMCij+γ10meansesj+u1jsesij+u0j+ϵij, it is easy to see that γ10 represents both within effect and between effect.

  1. Why is γ01 an unambiguous estimate of within effect with or without group mean in between level?

Because sesCMCij=(sesijmeansesj)Mixed: mathachij=γ00+γ10sesCMCij+u1jsesCMCij+u0j+ϵijMixed: mathachij=γ00+γ10sesijγ10meansesj+u1jsesCMCij+u0j+ϵij,

4.5.3 Summary

Effect decomposition and centering strategies
Within level predictor Group mean as between level predictor γ10 b01
raw no Confound within- and between- effect NA
GMC no Confound within- and between- effect NA
CMC no Within effect NA
raw yes Within effect Contextual effect
GMC yes Within effect Contextual effect
CMC yes Within effect Between effect
  • If raw data can equal to 0, no need for centering;
  • In a two level model, if group mean variable contained considerable amount of heterogeneity, it should be a between-level predictor.

Reference:

https://www.bristol.ac.uk/cmm/learning/support/datasets/