Supplemental

A Comparison to Latent Growth Curve Models

It is common in Structural Equation Modeling (SEM) to deal with longitudinal data via a Latent Growth Curve (LGC) model. It turns out that LGC are in a sense, just a different form of the very commonly used mixed model framework. In some ways they are more flexible, mostly in the standard structural equation modeling framework that allows for indirect, and other complex covariate relationships. In other ways, they are less flexible, e.g. with missing data, estimating nonlinear relationships, incorporating with many time points, dealing with time-varying covariates. With appropriate tools there is little one can’t do with the normal mixed model approach relative to the SEM approach, and one would likely have easier interpretation. As such I’d recommend sticking with the standard mixed model framework unless you really need to, but it is useful to have both tools.

To best understand a growth curve model, I still think it’s instructive to see it from the mixed model perspective, where things are mostly interpretable from what you know from a standard linear model. We will use our GPA example from before, and one can refer to the appendix for more detail.

Random effects as latent variables

As before we assume the following for the GPA model. As a simple starting point we merely model a trend of time (occasion- 6 semesters) and have random effects due to student for both intercept and occasion. In this setting we are treating time as numeric, but one could treat the occasion variable as categorical28.

\[\mathcal{GPA} = (b_{\mathrm{intercept}} + \mathrm{re}_{\mathrm{intercept}}) + (b_{\mathrm{occ}} + \mathrm{re}_{\mathrm{occasion}})\cdot \mathrm{occasion} + \epsilon\]

\[\mathrm{re}_{\mathrm{intercept}} \sim \mathcal{N}(0, \tau)\] \[\mathrm{re}_{\mathrm{occasion}} \sim \mathcal{N}(0, \varphi)\] \[\epsilon \sim \mathcal{N}(0, \sigma)\]

Thus the student effects for the intercept and slope are random, and specifically are normally distributed with mean of zero and some estimated standard deviation (\(\tau\), \(\varphi\) respectively)29. We consider these effects as coming from unspecified, or latent, causes due to student. In addition, we have the usual residual error \(\epsilon\), which can also be thought of as a per-observation random effect due to all other unknown causes.

The ‘multilevel model’ version might look like the following, but it is identical.

\[\mathcal{GPA} = b_{\mathrm{int\_student}} + b_{\mathrm{occ\_student}}\cdot \mathrm{occasion} + \epsilon\]

\[b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + \mathrm{re}_{\mathrm{intercept}}\] \[b_{\mathrm{occ\_student}} = b_{\mathrm{occ}} + \mathrm{re}_{\mathrm{occasion}}\]

The corresponding model may be run using lme4 as follows.

load('data/gpa.RData')   

# if you haven't downloaded the workshop RStudio project
# load(url('https://github.com/m-clark/mixed-models-with-R/raw/master/data/gpa.RData?raw=true'))

library(lme4)
mixed_init = lmer(gpa ~ occasion + (1 + occasion|student), data = gpa)
# summary(mixed_init)

I show a simplified output below, so make sure you can match the results to the summary printout. The fixed (population-average) effects are the \(b_{\mathrm{intercept}}\) and \(b_{\mathrm{occ}}\) in the previous model depiction. The standard deviations of the random effects are the \(\tau\), \(\varphi\) and \(\epsilon\).

term value se lower_2.5 upper_97.5
Intercept 2.60 0.02 2.56 2.63
occasion 0.11 0.01 0.10 0.12

We can also get estimates of the student level effects. These are the \(re_{intercept}\) and \(re_{occasion}\) from before.

Table 1: Per-student random effects (sample)
group_var effect group value se lower_2.5 upper_97.5
student Intercept 1 -0.202 0.113 -0.424 0.019
student Intercept 2 -0.211 0.113 -0.432 0.011
student Intercept 3 -0.007 0.113 -0.228 0.215
student Intercept 4 -0.093 0.113 -0.315 0.128
student Intercept 5 0.087 0.113 -0.134 0.309
student Intercept 6 -0.206 0.113 -0.427 0.016

Random effects in SEM

In SEM, we specify the latent linear, or common factor, model as follows.

\[Y = b_{\mathrm{intercept}} + \lambda F + \epsilon\] \[F \sim \mathcal{N}(0, \tau)\]

\[\epsilon \sim \mathcal{N}(0, \sigma)\]

In the above, \(Y\) is our observed variable, \(b_{intercept}\) is the intercept as in a standard linear regression model, \(\lambda\) is the coefficient (loading in factor analysis/SEM terminology) regarding the effect of the latent variable, represented as \(F\). The latent variable is assumed normally distributed, with zero mean, and some estimated variance, just like the random effects in mixed models.

Note that if \(\lambda = 1\), we then have the right hand side as \(b_{intercept} + F\), and this is indistinguishable from the random intercept portion of the mixed model (\(b_{\mathrm{intercept}} + \mathrm{re}_{\mathrm{intercept}}\)). Through this that we can maybe start to get a sense of random effects as latent variables (or vice versa). Indeed, mixed models have ties to many other kinds of models (e.g. spatial, additive), because those models also add a ‘random’ component to the model in some fashion.

Running a growth curve model

The graphical model for the standard LGC model resembles that of confirmatory factor analysis (CFA) with two latent variables/factors. The observed, or manifest, measures are the dependent variable values at each respective time point. However, for those familiar with structural equation modeling (SEM), growth curve models will actually look a bit different compared with typical SEM, because we have to fix the factor loadings to specific values in order to make it work for the LGC. As we will see, this also leads to non-standard output relative to other SEM models, as there is nothing to estimate for the many fixed parameters.

More specifically, we’ll have a latent variable representing the random intercepts, as well as one representing the random slopes for the longitudinal trend (time), which in the GPA data is the semester indicator. All loadings for the intercept factor are 1. The loadings for the effect of time are arbitrary, but should accurately reflect the time spacing, and typically it is good to start at zero, so that the zero has a meaningful interpretation.

Wide data

Given the above visualization, for the LGC our data needs to be in wide format, where each row represents the unit of interest, and we have separate columns for each time point of the target variable, as well as any other variable that varies over time. This is contrasted with the long format we use for the mixed model, where rows represent observations at a given time point. We can use the spread function from tidyr to help with that. We end up with a data frame of two-hundred observations and columns for each semester gpa (0 through 5 for six semesters) denoted by gpa_*.

gpa_wide = gpa %>% 
  select(student, sex, highgpa, occasion, gpa) %>% 
  pivot_wider(names_from = occasion, values_from = gpa) %>% 
  rename_at(vars(`0`,`1`,`2`,`3`,`4`,`5`), function(x) glue::glue('gpa_{x}')) %>% 
  mutate(female = as.numeric(sex)-1)   # convert to binary 0 = male 1 = female to be used later

We’ll use lavaan for our excursion into LGC. The syntax will require its own modeling code, but lavaan tries to keep to R regression model style. The names of intercept and occasion are arbitrary, and correspond to the intercepts and slopes factors of the previous visualization. The =~ is just denoting that the left-hand side is the latent variable, and the right-hand side are the observed/manifest variables. We use the standard fixed loadings for an LGC model.

lgc_init_model = '

  intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
  occasion  =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
  
'

Now we’re ready to run the model. Note that lavaan has a specific function, growth, to use for these models. It doesn’t spare us any effort for the model syntax, but does make it unnecessary to set various arguments for the more generic sem and lavaan functions.

library(lavaan)
lgc_init = growth(lgc_init_model, data = gpa_wide)
summary(lgc_init)
lavaan 0.6-9 ended normally after 73 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                        11
                                                      
  Number of observations                           200
                                                      
Model Test User Model:
                                                      
  Test statistic                                43.945
  Degrees of freedom                                16
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept =~                                        
    gpa_0             1.000                           
    gpa_1             1.000                           
    gpa_2             1.000                           
    gpa_3             1.000                           
    gpa_4             1.000                           
    gpa_5             1.000                           
  occasion =~                                         
    gpa_0             0.000                           
    gpa_1             1.000                           
    gpa_2             2.000                           
    gpa_3             3.000                           
    gpa_4             4.000                           
    gpa_5             5.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    occasion          0.002    0.002    1.629    0.103

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.000                           
   .gpa_1             0.000                           
   .gpa_2             0.000                           
   .gpa_3             0.000                           
   .gpa_4             0.000                           
   .gpa_5             0.000                           
    intercept         2.598    0.018  141.956    0.000
    occasion          0.106    0.005   20.338    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.080    0.010    8.136    0.000
   .gpa_1             0.071    0.008    8.799    0.000
   .gpa_2             0.054    0.006    9.039    0.000
   .gpa_3             0.029    0.003    8.523    0.000
   .gpa_4             0.015    0.002    5.986    0.000
   .gpa_5             0.016    0.003    4.617    0.000
    intercept         0.035    0.007    4.947    0.000
    occasion          0.003    0.001    5.645    0.000

Fixed effects

Most of the output is blank, which is needless clutter, but we do get the same five parameter values we are interested in though.

We’ll start with the ‘intercepts’:

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)

    intercept         2.598    0.018  141.956    0.000
    occasion          0.106    0.005   20.338    0.000

It might be odd to call your fixed effects ‘intercepts,’ but it makes sense if we are thinking of it as a multilevel model as depicted previously, where we actually broke out the random effects as a separate model. These are the population average of the random intercepts and slopes for occasion. The estimates here are pretty much spot on with our mixed model estimates. To make the estimation approach as similar as possible, I’ve switched to standard maximum likelihood via REML = FALSE.

library(lme4)

gpa_mixed = lmer(
  gpa ~ occasion + (1 + occasion | student), 
  data = gpa, 
  REML = FALSE
)

summary(gpa_mixed, cor=F)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: gpa ~ occasion + (1 + occasion | student)
   Data: gpa

     AIC      BIC   logLik deviance df.resid 
   258.2    288.8   -123.1    246.2     1194 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2747 -0.5381 -0.0128  0.5327  3.1945 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.044859 0.21180       
          occasion    0.004469 0.06685  -0.10
 Residual             0.042387 0.20588       
Number of obs: 1200, groups:  student, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept)  2.59921    0.01831  141.94
occasion     0.10631    0.00587   18.11

Random effects

Now let’s look at the variance estimates, where we see some differences between the LGC and mixed model approach. LGC by default assumes heterogeneous variance for each time point. Mixed models by default assume the same variance for each time point, but can allow them to be estimated separately in most modeling packages. Likewise, we could fix the LGC variances to be identical here. Just know that’s why the results are not identical (to go along with their respective estimation approaches, which are also different by default).

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    occasion          0.002    0.002    1.629    0.103
    
Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.080    0.010    8.136    0.000
   .gpa_1             0.071    0.008    8.799    0.000
   .gpa_2             0.054    0.006    9.039    0.000
   .gpa_3             0.029    0.003    8.523    0.000
   .gpa_4             0.015    0.002    5.986    0.000
   .gpa_5             0.016    0.003    4.617    0.000
    intercept         0.035    0.007    4.947    0.000
    occasion          0.003    0.001    5.645    0.000
print(VarCorr(gpa_mixed), comp = 'Var')  # using print to show variance
 Groups   Name        Variance  Corr  
 student  (Intercept) 0.0448593       
          occasion    0.0044694 -0.096
 Residual             0.0423874       

Random intercepts

How can we put these models on the same footing? Let’s take a step back and do a model with only random intercepts. In this case, time is an observed measure, and has no person-specific variability. Our graphical model now looks like the following. Time, or time point (i.e. semester in our example), is now represented with a square to denote it is no longer affiliated with a latent variable.

We can do this by fixing the slope ‘factor’ to have zero variance. However, note also that in the LGC, at each time point of the gpa outcome, we have a unique (residual) variance associated with it. Conversely, this is constant in the mixed model setting, i.e. we only have one estimate for the residual variance that does not vary by occasion. We deal with this in the LGC by giving the parameter an arbitrary name, resid, and then applying it to each time point.

lgc_ran_int_model = '

 intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
 slope     =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
 
 slope     ~~ 0*slope    # slope variance is zero
 intercept ~~ 0*slope    # no covariance with intercept factor
 
 
 gpa_0 ~~ resid*gpa_0    # same residual variance for each time point
 gpa_1 ~~ resid*gpa_1
 gpa_2 ~~ resid*gpa_2
 gpa_3 ~~ resid*gpa_3
 gpa_4 ~~ resid*gpa_4
 gpa_5 ~~ resid*gpa_5
 
'

Now each time point will have one variance estimate. Let’s run the LGC.

lgc_ran_int = growth(lgc_ran_int_model, data = gpa_wide)

# increase the number of digits shown, remove some output unnecessary to demo
summary(lgc_ran_int, nd = 4, header = FALSE)  

Parameter Estimates:

  Standard errors                                 Standard
  Information                                     Expected
  Information saturated (h1) model              Structured

Latent Variables:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept =~                                            
    gpa_0             1.0000                              
    gpa_1             1.0000                              
    gpa_2             1.0000                              
    gpa_3             1.0000                              
    gpa_4             1.0000                              
    gpa_5             1.0000                              
  slope =~                                                
    gpa_0             0.0000                              
    gpa_1             1.0000                              
    gpa_2             2.0000                              
    gpa_3             3.0000                              
    gpa_4             4.0000                              
    gpa_5             5.0000                              

Covariances:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept ~~                                            
    slope             0.0000                              

Intercepts:
                    Estimate   Std.Err   z-value   P(>|z|)
   .gpa_0             0.0000                              
   .gpa_1             0.0000                              
   .gpa_2             0.0000                              
   .gpa_3             0.0000                              
   .gpa_4             0.0000                              
   .gpa_5             0.0000                              
    intercept         2.5992    0.0217  120.0471    0.0000
    slope             0.1063    0.0041   26.1094    0.0000

Variances:
                    Estimate   Std.Err   z-value   P(>|z|)
    slope             0.0000                              
   .gpa_0   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_1   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_2   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_3   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_4   (resd)    0.0580    0.0026   22.3607    0.0000
   .gpa_5   (resd)    0.0580    0.0026   22.3607    0.0000
    intrcpt           0.0634    0.0073    8.6605    0.0000

Compare it to the corresponding mixed model.

mixed_ran_int = lmer(gpa ~ occasion + (1 | student), data = gpa, REML = FALSE)
summary(mixed_ran_int, cor = FALSE)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: gpa ~ occasion + (1 | student)
   Data: gpa

     AIC      BIC   logLik deviance df.resid 
   401.6    422.0   -196.8    393.6     1196 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.6188 -0.6370 -0.0002  0.6366  2.8330 

Random effects:
 Groups   Name        Variance Std.Dev.
 student  (Intercept) 0.06336  0.2517  
 Residual             0.05803  0.2409  
Number of obs: 1200, groups:  student, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept) 2.599214   0.021652  120.05
occasion    0.106314   0.004072   26.11

Now we have essentially identical results to mixed_ran_int. The default estimation process is different for the two, resulting in some differences starting several decimal places out, but these are not meaningful differences. We can actually use the same estimator, but the results will still differ slightly due to the data differences.

Random intercepts and slopes

Now let’s let the slope for occasion vary. We can just delete or comment out the syntax related to the (co-) variance. By default slopes and intercepts are allowed to correlate as in the mixed model. We will continue to keep the variance constant.

lgc_ran_int_ran_slope_model = '

 intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
 slope     =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
 
 # slope     ~~ 0*slope  # slope variance is zero
 # intercept ~~ 0*slope  # no covariance
 
 
 gpa_0 ~~ resid*gpa_0    # same residual variance for each time point
 gpa_1 ~~ resid*gpa_1
 gpa_2 ~~ resid*gpa_2
 gpa_3 ~~ resid*gpa_3
 gpa_4 ~~ resid*gpa_4
 gpa_5 ~~ resid*gpa_5
 
'
lgc_ran_int_ran_slope = growth(lgc_ran_int_ran_slope_model, data = gpa_wide)
summary(lgc_ran_int_ran_slope, nd = 4, header = FALSE)  

Parameter Estimates:

  Standard errors                                 Standard
  Information                                     Expected
  Information saturated (h1) model              Structured

Latent Variables:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept =~                                            
    gpa_0             1.0000                              
    gpa_1             1.0000                              
    gpa_2             1.0000                              
    gpa_3             1.0000                              
    gpa_4             1.0000                              
    gpa_5             1.0000                              
  slope =~                                                
    gpa_0             0.0000                              
    gpa_1             1.0000                              
    gpa_2             2.0000                              
    gpa_3             3.0000                              
    gpa_4             4.0000                              
    gpa_5             5.0000                              

Covariances:
                    Estimate   Std.Err   z-value   P(>|z|)
  intercept ~~                                            
    slope            -0.0014    0.0016   -0.8337    0.4045

Intercepts:
                    Estimate   Std.Err   z-value   P(>|z|)
   .gpa_0             0.0000                              
   .gpa_1             0.0000                              
   .gpa_2             0.0000                              
   .gpa_3             0.0000                              
   .gpa_4             0.0000                              
   .gpa_5             0.0000                              
    intercept         2.5992    0.0183  141.9471    0.0000
    slope             0.1063    0.0059   18.1113    0.0000

Variances:
                    Estimate   Std.Err   z-value   P(>|z|)
   .gpa_0   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_1   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_2   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_3   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_4   (resd)    0.0424    0.0021   20.0000    0.0000
   .gpa_5   (resd)    0.0424    0.0021   20.0000    0.0000
    intrcpt           0.0449    0.0068    6.5992    0.0000
    slope             0.0045    0.0007    6.3874    0.0000

Again, we compare the mixed model to show identical output.

mixed_ran_int_ran_slope = lmer(gpa ~ occasion + (1 + occasion|student), data = gpa)
summary(mixed_ran_int_ran_slope, cor = FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + (1 + occasion | student)
   Data: gpa

REML criterion at convergence: 261

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2695 -0.5377 -0.0128  0.5326  3.1939 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.045193 0.21259       
          occasion    0.004504 0.06711  -0.10
 Residual             0.042388 0.20588       
Number of obs: 1200, groups:  student, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept) 2.599214   0.018357  141.59
occasion    0.106314   0.005885   18.07

In addition, the estimated random coefficients estimates from the mixed model perfectly correlate with those of the latent variables.

student Int_mixed Slope_mixed Int_LGC Slope_LGC
1 2.397 0.166 2.398 0.166
2 2.389 0.105 2.389 0.104
3 2.593 0.149 2.593 0.149
4 2.506 0.064 2.506 0.064
5 2.686 0.080 2.686 0.081
6 2.394 0.063 2.394 0.063

Note that the intercept-slope relationship in the LGC is expressed as a covariance. If we want correlation, we just ask for standardized output. I show only the line output of interest.

summary(lgc_ran_int_ran_slope, nd = 4, std = TRUE, header = FALSE)
Covariances:
                    Estimate   Std.Err   z-value   P(>|z|)    Std.lv   Std.all
  intercept ~~                                                                
    slope            -0.0014    0.0016   -0.8337    0.4045   -0.0963   -0.0963

The std.all is what we typically will look at.

Random effects with heterogeneous variances

We have demonstrated heterogeneous variances [previously][Heterogeneous Variance]. But to revisit here, lme4 does not provide an easy way to have separate variance at each time point, sacrificing various model complexities for computational advantages. However, nlme provides an easy, though not straightforward way to get at these estimates. See the previous section for details.

library(nlme)

mixed_ran_int_ran_slope_hetero_var = lme(
  gpa ~ occasion,
  random = ~ 1 + occasion | student,
  data = gpa,
  weights = varIdent(form = ~1|occasion)
)

mixedup::summarise_model(mixed_ran_int_ran_slope_hetero_var)

Variance Components:
    Group    Effect Variance   SD SD_2.5 SD_97.5 Var_prop
  student Intercept     0.04 0.19   0.15    0.23     0.30
  student  occasion     0.00 0.06   0.05    0.07     0.03
 Residual               0.08 0.28   0.24    0.33     0.67

Fixed Effects:
      Term Value   SE      Z P_value Lower_2.5 Upper_97.5
 Intercept  2.60 0.02 141.60    0.00      2.56       2.63
  occasion  0.11 0.01  20.29    0.00      0.10       0.12
Table 2: Residual variance at each time point
Semester Variance
0 0.080
1 0.071
2 0.054
3 0.029
4 0.015
5 0.016

Compare to the LGC (our lgc_init model).

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.080    0.010    8.136    0.000
   .gpa_1             0.071    0.008    8.799    0.000
   .gpa_2             0.054    0.006    9.039    0.000
   .gpa_3             0.029    0.003    8.523    0.000
   .gpa_4             0.015    0.002    5.986    0.000
   .gpa_5             0.016    0.003    4.617    0.000

Other covariates

Within these models we can have cluster level covariates which are constant over time, or covariates that vary over time. We will examine each in turn.

Cluster level covariates

Mixed model

To add a cluster-level covariate, for a mixed model, it looks something like this (ignoring lowest level subscript, \(b_0\) = intercept):

standard random intercept

\[\mathcal{GPA} = b_{\mathrm{int\_student}} + b_{occ}\cdot\mathrm{time} + \epsilon \]

\[b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + \mathrm{re}_{\mathrm{intercept}}\]

Plugging in becomes:

\[\mathcal{GPA} = b_{\mathrm{intercept}} + b_{occ}\cdot\mathrm{occasion} + \mathrm{re}_{\mathrm{intercept}} + \epsilon \]

subject level covariate added

\[b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + b_{sex}\cdot\mathrm{sex} + \mathrm{re}_{\mathrm{intercept}}\]

But if we plug that into our level 1 model, it just becomes:

\[\mathcal{GPA} = b_{\mathrm{intercept}} + b_{sex}\cdot\mathrm{sex} + b_{occ}\cdot\mathrm{occasion} + \mathrm{re}_{\mathrm{intercept}} + \epsilon \]

In our previous modeling syntax it would look like this:

gpa_mixed = lmer(gpa ~ sex + occasion + (1|student), data = gpa)

We’d have a fixed effect for sex and interpret it just like in the standard regression setting.

LGC

With LGC, there is a tendency to interpret the model as an SEM, with the language of effects on latent variables, and certainly one can. For example, we can talk about the (implicitly causal) effect of sex on the intercepts factor, which represents GPA at the first semester. However, adding additional covariates typically causes confusion for those not familiar with mixed models. We literally do have to regress the intercept and slope latent variables on cluster level covariates as follows.

\[\mathcal{GPA} = b_{\mathrm{int\_student}} + b_{\mathrm{occ\_student}}\cdot \mathrm{occasion} + \epsilon\] \[b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + b_{sex}\cdot\mathrm{sex} + \mathrm{re}_{\mathrm{intercept}}\] Furthermore, people almost automatically put in an effect for the cluster level covariate on the slope factor also. In the mixed model this would result in the following:

subject level covariate added added for slopes

\[b_{\mathrm{occ\_student}} = b_{\mathrm{occ}} + \gamma\cdot\mathrm{sex} + \mathrm{re}_{\mathrm{occasion}}\]

And after plugging in:

\[\mathcal{GPA} = \color{#b2001d}{b_{\mathrm{intercept}} + b_{sex}\cdot\mathrm{sex} + b_{occ}\cdot\mathrm{occasion} + \mathbf{\gamma\cdot\mathrm{sex}\cdot\mathrm{occasion}}} + \color{#001eb2}{\mathrm{re}_{\mathrm{intercept}} + \mathrm{re}_{\mathrm{occasion}}\cdot\mathrm{occasion}} + e\]

The fixed effects are in red, while the random effects are in blue. Focusing on the fixed effects, we can see that this warrants an interaction between sex and occasion. This is not required, but one should add it if they actually are interested in the interaction. Our graphical model looks like the following using the above notation.

We are now ready to run the LGC for comparison.

lgc_cluster_level_model <- '

  intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
  occasion  =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
  
  # regressions
  intercept ~ female
  occasion  ~ female 
  
  gpa_0 ~~ resid*gpa_0    # same residual variance for each time point
  gpa_1 ~~ resid*gpa_1
  gpa_2 ~~ resid*gpa_2
  gpa_3 ~~ resid*gpa_3
  gpa_4 ~~ resid*gpa_4
  gpa_5 ~~ resid*gpa_5
  
'

lgc_cluster_level = growth(lgc_cluster_level_model, data = gpa_wide)
summary(lgc_cluster_level, std = TRUE, header = FALSE)

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept =~                                                          
    gpa_0             1.000                               0.212    0.717
    gpa_1             1.000                               0.212    0.710
    gpa_2             1.000                               0.212    0.671
    gpa_3             1.000                               0.212    0.613
    gpa_4             1.000                               0.212    0.551
    gpa_5             1.000                               0.212    0.492
  occasion =~                                                           
    gpa_0             0.000                               0.000    0.000
    gpa_1             1.000                               0.067    0.224
    gpa_2             2.000                               0.134    0.424
    gpa_3             3.000                               0.201    0.581
    gpa_4             4.000                               0.267    0.695
    gpa_5             5.000                               0.334    0.776

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept ~                                                           
    female            0.076    0.036    2.083    0.037    0.357    0.178
  occasion ~                                                            
    female            0.029    0.012    2.499    0.012    0.433    0.216

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .intercept ~~                                                          
   .occasion         -0.002    0.002   -1.184    0.237   -0.140   -0.140

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gpa_0             0.000                               0.000    0.000
   .gpa_1             0.000                               0.000    0.000
   .gpa_2             0.000                               0.000    0.000
   .gpa_3             0.000                               0.000    0.000
   .gpa_4             0.000                               0.000    0.000
   .gpa_5             0.000                               0.000    0.000
   .intercept         2.560    0.026   97.376    0.000   12.085   12.085
   .occasion          0.091    0.008   10.865    0.000    1.363    1.363

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .gpa_0   (resd)    0.042    0.002   20.000    0.000    0.042    0.486
   .gpa_1   (resd)    0.042    0.002   20.000    0.000    0.042    0.476
   .gpa_2   (resd)    0.042    0.002   20.000    0.000    0.042    0.425
   .gpa_3   (resd)    0.042    0.002   20.000    0.000    0.042    0.355
   .gpa_4   (resd)    0.042    0.002   20.000    0.000    0.042    0.287
   .gpa_5   (resd)    0.042    0.002   20.000    0.000    0.042    0.229
   .intrcpt           0.043    0.007    6.525    0.000    0.968    0.968
   .occasin           0.004    0.001    6.273    0.000    0.953    0.953

Applied researchers commonly have difficulty interpreting the model due to past experience with SEM. While these are latent variables, they aren’t typical latent variables that represent underlying theoretical constructs. It doesn’t help that the output can be confusing, because now one has an ‘intercept for your intercepts’ and an ‘intercept for your slopes.’ In the multilevel context it makes sense, but there you know ‘intercept’ is just ‘fixed effect.’

This is the corresponding mixed model for comparison:

mixed_cluster_level_cov = lmer(
  gpa ~ sex + occasion + sex:occasion + (1 + occasion|student), 
  data = gpa
  )

summary(mixed_cluster_level_cov, cor = FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ sex + occasion + sex:occasion + (1 + occasion | student)
   Data: gpa

REML criterion at convergence: 256.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2556 -0.5409 -0.0142  0.5407  3.2263 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.044096 0.20999       
          occasion    0.004328 0.06579  -0.14
 Residual             0.042388 0.20588       
Number of obs: 1200, groups:  student, 200

Fixed effects:
                   Estimate Std. Error t value
(Intercept)        2.559549   0.026418  96.888
sexfemale          0.075553   0.036460   2.072
occasion           0.091128   0.008429  10.811
sexfemale:occasion 0.028927   0.011634   2.486

Time-varying covariates

Mixed model

If we had a time varying covariate, it’d look like the following. The gpa data doesn’t really come with a useful time-varying covariate, so I’ve added one just for demonstration, average weekly hours spent in the library (lib_hours).

summary(gpa_mixed_tvc, cor = FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + lib_hours + (1 + occasion | student)
   Data: gpa

REML criterion at convergence: 48.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.4105 -0.5185 -0.0023  0.5202  2.9575 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.033134 0.18203       
          occasion    0.002817 0.05307  -0.13
 Residual             0.037591 0.19388       
Number of obs: 1200, groups:  student, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept) 2.385494   0.021079  113.17
occasion    0.082838   0.005196   15.94
lib_hours   0.032216   0.002024   15.92

Note that we could have a random slope for library hours if we wanted. The fixed effect for the covariate is still as it would be for standard regression interpretation.

LGC

With time varying covariates, the syntax starts to get tedious for the LGC. Here we add lib_hours to the model, but we need to convert it to wide format and add it to our previous data. Thus lib_hours_* represent the average weekly hours spent in the library for each each student at each semester.

lgc_tvc <- growth(lgc_tvc_model, data = gpa_wide)
summary(lgc_tvc, header = FALSE)

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept =~                                        
    gpa_0             1.000                           
    gpa_1             1.000                           
    gpa_2             1.000                           
    gpa_3             1.000                           
    gpa_4             1.000                           
    gpa_5             1.000                           
  occasion =~                                         
    gpa_0             0.000                           
    gpa_1             1.000                           
    gpa_2             2.000                           
    gpa_3             3.000                           
    gpa_4             4.000                           
    gpa_5             5.000                           

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  gpa_0 ~                                             
    lib_hours_0       0.045    0.004   10.701    0.000
  gpa_1 ~                                             
    lib_hours_1       0.039    0.003   13.514    0.000
  gpa_2 ~                                             
    lib_hours_2       0.033    0.002   13.752    0.000
  gpa_3 ~                                             
    lib_hours_3       0.028    0.003   11.271    0.000
  gpa_4 ~                                             
    lib_hours_4       0.024    0.003    8.527    0.000
  gpa_5 ~                                             
    lib_hours_5       0.022    0.003    6.348    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    occasion         -0.001    0.001   -0.656    0.512

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.000                           
   .gpa_1             0.000                           
   .gpa_2             0.000                           
   .gpa_3             0.000                           
   .gpa_4             0.000                           
   .gpa_5             0.000                           
    intercept         2.300    0.030   76.682    0.000
    occasion          0.122    0.011   11.123    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0   (resd)    0.036    0.002   20.000    0.000
   .gpa_1   (resd)    0.036    0.002   20.000    0.000
   .gpa_2   (resd)    0.036    0.002   20.000    0.000
   .gpa_3   (resd)    0.036    0.002   20.000    0.000
   .gpa_4   (resd)    0.036    0.002   20.000    0.000
   .gpa_5   (resd)    0.036    0.002   20.000    0.000
    intrcpt           0.030    0.005    6.019    0.000
    occasin           0.003    0.001    5.975    0.000

However, this result is not the same as our mixed model. Here is the corresponding graphical model. The * represents a coefficient that is freely estimated.

The problem here is similar to that seen with the residual variances. Unless we fix the coefficient to be constant, this is akin to having an interaction of the time-varying covariate with a categorical form of time. So in the same model, we flip from considering time as a numeric and linear effect on the outcome, to one that is categorical. This is rarely done in typical mixed or other regression models, though for some reason is the standard for the LGC setting. The following will get us back to the comparable mixed model.

lgc_tvc_model <- '

  intercept =~ 1*gpa_0 + 1*gpa_1 + 1*gpa_2 + 1*gpa_3 + 1*gpa_4 + 1*gpa_5
  occasion  =~ 0*gpa_0 + 1*gpa_1 + 2*gpa_2 + 3*gpa_3 + 4*gpa_4 + 5*gpa_5
  
  # time-varying covariates
  gpa_0 ~ lh_coef*lib_hours_0
  gpa_1 ~ lh_coef*lib_hours_1
  gpa_2 ~ lh_coef*lib_hours_2
  gpa_3 ~ lh_coef*lib_hours_3
  gpa_4 ~ lh_coef*lib_hours_4
  gpa_5 ~ lh_coef*lib_hours_5
    
  gpa_0 ~~ resid*gpa_0    # same residual variance for each time point
  gpa_1 ~~ resid*gpa_1
  gpa_2 ~~ resid*gpa_2
  gpa_3 ~~ resid*gpa_3
  gpa_4 ~~ resid*gpa_4
  gpa_5 ~~ resid*gpa_5  
  
'

lgc_tvc <- growth(lgc_tvc_model, data=gpa_wide)
summary(lgc_tvc, header = FALSE)

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept =~                                        
    gpa_0             1.000                           
    gpa_1             1.000                           
    gpa_2             1.000                           
    gpa_3             1.000                           
    gpa_4             1.000                           
    gpa_5             1.000                           
  occasion =~                                         
    gpa_0             0.000                           
    gpa_1             1.000                           
    gpa_2             2.000                           
    gpa_3             3.000                           
    gpa_4             4.000                           
    gpa_5             5.000                           

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  gpa_0 ~                                             
    lb_hr_0 (lh_c)    0.032    0.002   15.951    0.000
  gpa_1 ~                                             
    lb_hr_1 (lh_c)    0.032    0.002   15.951    0.000
  gpa_2 ~                                             
    lb_hr_2 (lh_c)    0.032    0.002   15.951    0.000
  gpa_3 ~                                             
    lb_hr_3 (lh_c)    0.032    0.002   15.951    0.000
  gpa_4 ~                                             
    lb_hr_4 (lh_c)    0.032    0.002   15.951    0.000
  gpa_5 ~                                             
    lb_hr_5 (lh_c)    0.032    0.002   15.951    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    occasion         -0.001    0.001   -0.973    0.331

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0             0.000                           
   .gpa_1             0.000                           
   .gpa_2             0.000                           
   .gpa_3             0.000                           
   .gpa_4             0.000                           
   .gpa_5             0.000                           
    intercept         2.385    0.021  113.389    0.000
    occasion          0.083    0.005   15.983    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .gpa_0   (resd)    0.038    0.002   20.000    0.000
   .gpa_1   (resd)    0.038    0.002   20.000    0.000
   .gpa_2   (resd)    0.038    0.002   20.000    0.000
   .gpa_3   (resd)    0.038    0.002   20.000    0.000
   .gpa_4   (resd)    0.038    0.002   20.000    0.000
   .gpa_5   (resd)    0.038    0.002   20.000    0.000
    intrcpt           0.033    0.005    6.148    0.000
    occasin           0.003    0.001    5.523    0.000

Compare again to the mixed model result.

summary(gpa_mixed_tvc, cor = FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: gpa ~ occasion + lib_hours + (1 + occasion | student)
   Data: gpa

REML criterion at convergence: 48.5

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.4105 -0.5185 -0.0023  0.5202  2.9575 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 student  (Intercept) 0.033134 0.18203       
          occasion    0.002817 0.05307  -0.13
 Residual             0.037591 0.19388       
Number of obs: 1200, groups:  student, 200

Fixed effects:
            Estimate Std. Error t value
(Intercept) 2.385494   0.021079  113.17
occasion    0.082838   0.005196   15.94
lib_hours   0.032216   0.002024   15.92

Now imagine having just a few of those kinds of variables as would be common in most longitudinal settings. In the mixed model framework one would add them in as any covariate in a regression model, and each covariate would be associated with a single fixed effect. In the LGC framework, one has to regress each time point for the target variable on its corresponding predictor time point. It might take a few paragraphs to explain the coefficients for just a handful of covariates. If you fix them to a single value, you would duplicate the mixed model, but the syntax requires even more tedium30.

Some differences between mixed models and growth curves

Wide vs. long

The SEM framework is inherently multivariate, i.e. assuming multiple outcomes, so your data will need to be in wide format. In the R world, this is ‘untidy’ data, and makes other data processing and visualization more tedious.

Random slopes

One difference seen in comparing LGC models vs. mixed models is that in the former, random slopes are always assumed, whereas in the latter, one would typically see if it’s worth adding random slopes in the first place, or simply not assume them.

Other random effects

Just about any LGC you come across in the wild has only one clustering level of interest. However, it’s very common to have multiple and non-hierarchical random effects structure or additional random effects beyond the time covariate. In our example, these might include school, district, or other complicated structure, or the library hours from the time-varying covariate example. Tools like lme4 handle random effects and complicated structure easily. SEM tools do not do this easily, and resort to the multilevel (long-format) approach, which more or less defeats the purpose of using them, as they merely mimic the standard mixed model approach, albeit with yet another and different type of syntax31. However, if you have other latent variables or complicated indirect effects, this may be the way to go.

Sample size

SEM is inherently a large sample technique. The growth curve model does not require as much for standard approaches, but may require a lot more depending on the model one tries to estimate. In my own simulations, I haven’t seen too much difference compared to mixed models even for notably small sample sizes, but those were for very simple models.

Number of time points

A basic growth curve model requires four time points to incorporate the flexibility that would make it worthwhile. Mixed models don’t have the restriction (outside of the obvious need of two). In addition, mixed models can handle any number of time points without changing the syntax at all, while LGC is rarely applied to more than a handful of time points. Even then, when you have many time-varying covariates, which is common, the model syntax is tedious, and you end up having the number of parameters to estimate climb rapidly, as the default model assumes interactions with time as a categorical variable.

Balance

Mixed models can run even if some clusters have a single value. SEM requires balanced data and so one will always have to estimate missing values or drop them. Whether this missingness can be ignored in the standard mixed model framework is a matter of some debate. Most disciplines typically ignore the missingness, which for mixed models means assuming the observations are missing at random (MAR). With the LGC, the default is simply to drop any observation with missing, and so the assumption there is missing completely at random (MCAR), a stronger assumption.

Numbering the time points

Numbering your time from zero makes sense in both worlds. This leads to the natural interpretation that the intercept is the mean of the outcome for your first time point. In other cases having a centered value would make sense, or numbering from 0 to a final value of 1, which would mean the slope coefficient represents the change over the whole time span.

Summary of LGC

Latent Growth Curve modeling is an alternative way to do what is very commonly accomplished through mixed models, and allow for more complex models than typically seen for standard mixed models. One’s default should probably be to use the far more commonly used, and probably more flexible (in most situations), mixed modeling tools, where there are packages in R that could handle nonlinear effects, mediation and multivariate outcomes for mixed models. I have other documents regarding mixed models on my website and code at GitHub, including a document that does more comparison to growth curve models. However, the latent variable approach may provide what you need in some circumstances, and at the very least gives you a fresh take on the standard mixed model perspective.

Correlation Structure Revisited

Let’s revisit the notion of autocorrelation/autoregressive (AR) residual structure. We’ll start by recalling the AR structure we noted before, with \(\rho\) our estimate of the covariance/correlation. For the following depiction, we have three observations per cluster, and observations are correlated in time. However the residual correlation decreases the further in time the observations are apart from one another.

\[\Sigma = \sigma^2 \left[ \begin{array}{cccc} 1 & \rho & \rho^2 & \rho^3 \\ \rho & 1 & \rho & \rho^2 \\ \rho^2 & \rho & 1 & \rho \\ \rho^3 & \rho^2 & \rho & 1 \\ \end{array}\right]\]

How does this get into our model? We can find out via simulation. The next bit of code follows lme4 developer Ben Bolker’s example. Here we create a variable which is a multivariate draw with the specified correlational structure. In addition, we’ll have a single covariate, call it x. The linear predictor is based on an intercept value of 5 and a coefficient for x of .5.

simGroup <- function(g, n = 5, rho = 0.7, sigma = .5) {
  # create time points and group id
  times <- factor(1:n)
  group <- factor(rep(g, n))
  
  # create the ar structure
  cor_struct <- rho^as.matrix(dist(1:n))   
  
  # if you want to play around with the estimated variance of glmmTMB you can,
  # but this will change what the expected correlation should be; use cov2cor on
  # cor_struct after adding the diagonal to see that value
  # diag(cor_struct) = 2.5
  
  # Simulate the process
  x_ar <- MASS::mvrnorm(mu = rep(0, n), Sigma = cor_struct) 
  
  # add another covariate
  x <- rnorm(n)
  
  # linear predictor
  mu <- 5 + .5*x
  
  # Add measurement noise and create target variable
  y <-  mu + x_ar + rnorm(n, sd = sigma)                      
  
  data.frame(y, times, x, group)
}

simGroup(1)
         y times          x group
1 4.239340     1 -0.5119390     1
2 4.927270     2 -0.6460704     1
3 4.328641     3 -0.9695196     1
4 4.126718     4 -0.8317013     1
5 5.389997     5  1.0799571     1

Now let’s do this for 500 groups or clusters, each with 10 observations, sigma equal to 1.5 and \(\rho\) set at 0.8.

set.seed(1234)

test_df = map_df(1:500, simGroup, n = 10, rho = .8, sigma = 1.5) 

head(test_df)
             y times           x group
1...1 6.924564     1 -0.47719270     1
2...2 5.349769     2 -0.99838644     1
3...3 4.673456     3 -0.77625389     1
4...4 5.626127     4  0.06445882     1
5...5 4.831887     5  0.95949406     1
6...6 3.813962     6 -0.11028549     1

Now we run and summarize the model with glmmTMB.

library(glmmTMB)

model_tmb = glmmTMB(y ~ x + ar1(times + 0 | group), data = test_df)

summary(model_tmb)
 Family: gaussian  ( identity )
Formula:          y ~ x + ar1(times + 0 | group)
Data: test_df

     AIC      BIC   logLik deviance df.resid 
 19586.5  19619.1  -9788.3  19576.5     4995 

Random effects:

Conditional model:
 Groups   Name   Variance Std.Dev. Corr      
 group    times1 1.007    1.003    0.79 (ar1)
 Residual        2.180    1.477              
Number of obs: 5000, groups:  group, 500

Dispersion estimate for gaussian family (sigma^2): 2.18 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  5.04663    0.03842  131.35   <2e-16 ***
x            0.54392    0.02393   22.73   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We recover our estimates. The ar1 value is very close to the true value of 0.8 we specified, while the residual standard deviation is close to 1.5, and our fixed effects are also as expected. We have an additional value approaching 1 for the times1 variance, which is the diagonal of the AR correlation matrix, which in our code is separate from what we draw for the observation level residual variance.

Likewise, brms has similar syntax in that we add the AR component to the formula.

library(brms)

model_brm = brm(
  y ~ x + ar(times, group),
  data = test_df,
  cores = 4,
  seed = 1234
)
summary(model_brm)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ x + ar(times, group) 
   Data: test_df (Number of observations: 5000) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Correlation Structures:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
ar[1]     0.25      0.01     0.22     0.27 1.00     4777     3262

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept     5.04      0.03     4.98     5.11 1.00     4929     3310
x             0.55      0.02     0.51     0.60 1.00     5076     3307

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     1.73      0.02     1.70     1.77 1.00     4763     3192

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

While the focus here is on the other packages, we could also use the gls function in nlme, which gives the same result as brms.

gls(
  y ~ x, 
  data = test_df, 
  correlation = corAR1(form = ~as.numeric(times)|group)
)
Generalized least squares fit by REML
  Model: y ~ x 
  Data: test_df 
  Log-restricted-likelihood: -9853.791

Coefficients:
(Intercept)           x 
  5.0443811   0.5514778 

Correlation Structure: AR(1)
 Formula: ~as.numeric(times) | group 
 Parameter estimate(s):
      Phi 
0.2453209 
Degrees of freedom: 5000 total; 4998 residual
Residual standard error: 1.784043 

It may seem at first blush that glmmTMB and brms have come to different conclusions about the correlation and variance estimates. However, close inspection reveals they are in fact providing the same information from different viewpoints. The simulation code shows how we start with our linear predictor, which includes the fixed effects, then adds the random effect with autoregressive structure, and finally adds our residual variance. But the way brms (and nlme) is estimating it is how we’ve shown it in the matrix formulation above, as a single residual/random effect.


\[\Sigma_{brms} = \sigma^2_{brms} \left[ \begin{array}{cccc} 1 & \rho_{brms} & \rho_{brms}^2 & \rho_{brms}^3 \\ \rho_{brms} & 1 & \rho_{brms} & \rho_{brms}^2 \\ \rho_{brms}^2 & \rho_{brms} & 1 & \rho_{brms} \\ \rho_{brms}^3 & \rho_{brms}^2 & \rho_{brms} & 1 \\ \end{array}\right]\]


\[\Sigma_{tmb} = \left[ \begin{array}{cccc} \sigma^2_{ar} & \rho_{tmb} & \rho_{tmb}^2 & \rho_{tmb}^3 \\ \rho_{tmb} & \sigma^2_{ar} & \rho_{tmb} & \rho_{tmb}^2 \\ \rho_{tmb}^2 & \rho_{tmb} & \sigma^2_{ar} & \rho_{tmb} \\ \rho_{tmb}^3 & \rho_{tmb}^2 & \rho_{tmb} & \sigma^2_{ar} \\ \end{array}\right] + \left[ \begin{array}{cccc} \sigma^2_{tmb} & 0 & 0 & 0 \\ 0 & \sigma^2_{tmb} & 0 & 0 \\ 0 & 0 & \sigma^2_{tmb} & 0 \\ 0 & 0 & 0 & \sigma^2_{tmb} \\ \end{array}\right]\]

So if we take the glmmTMB ar1 estimate and variance estimates we can recover the brms estimate.

ar_tmb * ar_var_tmb/(ar_var_tmb + res_var_tmb)
[1] 0.2494005
ar_brm
[1] 0.247
# if we assume ar var = 1
ar_brm*res_var_brms
[1] 0.7434206
ar_tmb
[1] 0.7895
# if we knew the ar variance, we could use brms to get to tmb's estimate
ar_brm*(ar_var_tmb + res_var_tmb)/ar_var_tmb
[1] 0.7819009

The glmmTMB approach is interesting in how it explicitly separates out the ar component from the rest of the residual component. This seems non-standard, as I don’t recall papers reporting the AR standard deviation for example, and every depiction I come across in the mixed model literature is the one that underlies brms. However, it seems like it might be useful and/or interesting from some settings, or maybe even preferable as an additional interpretation for a random effect, similar to the ones we commonly use.

We can change our function to force glmmTMB to come to the same conclusion as brms by not distinguishing the variance components. In this case, glmmTMB will move all residual variance to the AR estimate, and the estimated correlation is the same as what brms reports.

simGroup_alt <- function(g, n = 5, rho = 0.7, sigma = .5) {
  # create time points and group id
  times <- factor(1:n)
  group <- factor(rep(g, n))
  
  # create the ar structure
  cor_struct <- rho^as.matrix(dist(1:n))   
  
  # combine the residual variance
  resid_struct <- cor_struct*sigma^2
  
  # Simulate the process; note the difference
  x_ar <- MASS::mvrnorm(mu = rep(0, n), Sigma = resid_struct) 
  
  # add another covariate
  x <- rnorm(n)
  
  # linear predictor
  mu <- 5 + .5*x
  
  # Create target variable; residual already incorporated into x_ar
  y <-  mu + x_ar           
  
  data.frame(y, times, x, group)
}


set.seed(1234)

test_df_alt = map_df(1:500, simGroup_alt, n = 10, rho = ar_brm, sigma = 1.5) 

model_tmb_alt = glmmTMB(y ~ x + ar1(times + 0 | group), data = test_df_alt)

summary(model_tmb_alt)
 Family: gaussian  ( identity )
Formula:          y ~ x + ar1(times + 0 | group)
Data: test_df_alt

     AIC      BIC   logLik deviance df.resid 
 17774.8  17807.4  -8882.4  17764.8     4995 

Random effects:

Conditional model:
 Groups   Name   Variance Std.Dev. Corr      
 group    times1 2.17049  1.4733   0.27 (ar1)
 Residual        0.01834  0.1354             
Number of obs: 5000, groups:  group, 500

Dispersion estimate for gaussian family (sigma^2): 0.0183 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  4.96241    0.02665  186.20   <2e-16 ***
x            0.47687    0.01974   24.15   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Summary of residual correlation structure

What both of these syntactical approaches make clear is that, in specifying a specific correlational structure, we can think of it as adding a latent variable, i.e. a random effect, to our standard model, just as we have done with other random effects. This particular random effect has correlated observations within a group as specified by the structure. The same thing can apply in the case of heterogenous variances, just that no specific correlation is assumed in that case. As such, it may make more sense to think of it as an additional component to the model structure/formula, as opposed to an function argument separate from your theoretical focus.


  1. I’m omitting the observation level subscript, so this can work for the single observation or entire data set.↩︎

  2. Usually we would draw both random effects from a multivariate normal distribution with some covariance.↩︎

  3. To be fair, Mplus (and presumably lavaan at some point in the future) has shortcuts to make the syntax easier, but it also can make for more esoteric and less understandable syntax.↩︎

  4. Honestly, for the same types of models I find the multilevel syntax of Mplus ridiculously complex relative to R packages.↩︎