Module 2 Linear Mixed Models

Reading

Basic Concepts

Examples of clustered data
  • Longitudinal data: Repeated measures data on each subject
    • yit denotes the measure at time t for individual i.
  • Clustered or multilevel data: Observations within nested group structures.
    • A collection of units from a population of similar units, e.g., yij denotes measure from person i in demographic group j.

Motivating Example

Example of both longitudinal data and grouped/nested data:

The Treatment of Lead-Exposed Children (TLC) was examined using a placebo-controlled randomized study of succimer (a chelating agent) in children with blood lead levels of 20-44 micrograms/dL. These data consist of four repeated measurements of blood lead levels obtained at baseline (or week 0) , week 1, week 4, and week 6 on 100 children who were randomly assigned to chelation treament with succimer or placebo.

Variables:

  • ID: Child ID number

  • Treatment Group: Placebo vs. Succimer

  • Week: week X (x=0, 1, 4, 6)

  • Lead: lead level

tlcdata <- readRDS("data/tlc.RDS")
head(tlcdata)
    id treatment week lead
1    1   placebo    0 30.8
101  1   placebo    1 26.9
201  1   placebo    4 25.8
301  1   placebo    6 23.8
2    2  succimer    0 26.5
102  2  succimer    1 14.8
library(ggplot2)
library(dplyr)
library(gridExtra) 

placebo <- tlcdata %>% 
  filter(treatment == "placebo") %>%
  ggplot() +
  geom_line(aes(x = week, y = lead, col = id)) +
  labs(x = "Week", y="Lead", title = "Placebo") +
  theme(legend.position = "none")

succimer <- tlcdata %>% 
  filter(treatment == "succimer") %>%
  ggplot() +
  geom_line(aes(x = week, y = lead, col = id, linetype = treatment)) +
  labs(x = "Week", y="Lead",title = "Succimer") +
  theme(legend.position = "none")

gridExtra::grid.arrange(placebo, succimer)

  • What are some scientific questions we may be interested in studying?

The Wrong Approach : Ignoring clustered or longitudinal structure= The wrong way to do it!!!

yij=β0+β^1xij+ϵijϵijiidN(0,σ2)

fit.lm = lm(lead ~ week, data = tlcdata)
summary(fit.lm)

Call:
lm(formula = lead ~ week, data = tlcdata)

Residuals:
    Min      1Q  Median      3Q     Max 
-19.775  -3.749   0.328   4.454  43.330 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  22.9760     0.6079  37.793   <2e-16 ***
week         -0.4010     0.1670  -2.401   0.0168 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.966 on 398 degrees of freedom
Multiple R-squared:  0.01428,   Adjusted R-squared:  0.0118 
F-statistic: 5.765 on 1 and 398 DF,  p-value: 0.01681
  • What assumptions of OLS are we violating?

  • If observations were uncorrelated, i.e., if we had measured different subjects at 0, 1, 4, and 6 weeks post-exposure, then we could fit the standard linear model:

    yit=β0+β1I(t=1)+β2I(t=4)+β3I(t=6)+eit

    • There is an indicator for each time point, where t=0 is treated as the reference group.

    • BUT the data are correlated. How do we handle this?

    • We extend the linear model to allow correlation (covariance).

    • A Better Approach: Linear Mixed Models

Approach 2: Mixed Models

Linear Mixed Models or LMMs
  • LMMs are extensions of regression in which intercepts and/or slopes are allowed to vary across individuals.

  • Mixed models were developed to deal with correlated data.

  • They allow for the inclusion of both fixed and random effects.

    • Fixed effect: a global or constant parameter that does not vary.

    • Random effect: Parameters that are random variables and are assigned a distributional assumption, e.g., βN(μ,σ2).

Fixed effects parametrization

  • Taking our motivating example, we can assign a separate intercept for each child:

    yij=β0i+β1xij+ϵij,ϵijiidN(0,σ2)

Interpretation:

  • β0i is the child-specific weight at zero and β1 is the constant slope.
  • σ2 captures within child variability.
  • Limitations:

    • Estimating lots of parameters: subject-specific coefficients do not leverage overall population information and have less precision because of smaller sample size.

    • Cannot forecast the grow curve of a new child.

Fitting a fixed effects model in R:

We treat the “group” or “nested” unit as a factor/dummy variable. Therefore, we DO NOT include the intercept (see Module 1). Including the “cluster” as a factor allows for individual intercepts across children. Use the factor(id) function to create a factor variable for child ID.

fit.strat = lm(lead ~week + factor(id)-1, data =tlcdata) #do not include the intercept!!
summary(fit.strat$coefficients[1])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -0.401  -0.401  -0.401  -0.401  -0.401  -0.401 
summary(fit.strat$coefficients[2])
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  27.93   27.93   27.93   27.93   27.93   27.93 
  • Data are balanced (same number of observations for each child).

  • The slope of week is the same in the model with a single intercept and the model with child-specific intercepts.

  • Controlling for group-specific intercepts gives a smaller standard error for the slope of week because week is the same for all children.

  • Note that oftentimes, the standard error will be larger if there is sparse or unbalanced data.

In class exercise

What is the interpretation of βj^ in this model?

Random Intercept Model

Consider the random intercept model with a vector of predictors xij:

yij=μ+θi+xijTβ+ϵijϵijiidN(0,σ2)θiiidN(0,τ2)θiϵij

  • μ denotes the overall intercept (population mean when xij=0.

  • θi denotes the child specific difference from μ.

  • β0i=μ+θi participant specific intercept or child specific average.

  • β time coefficient that does not vary across participants.

  • τ2 denotes the between participant variability and here is the random effect variance.

  • σ2 denotes residual variance or within group variability “measurement error”.

What makes this a mixed model?

β0i is random vs. β is global or fixed.

The following two models are equivalent:

Model 1: yij=(μ+θi)+xijTβ+ϵij,θiN(0,τ2)andϵijN(0,σ2)

Model 2: The hierarchical model framework

yij=β0i+xijβ+ϵijβ0iN(μ,τ2)ϵijN(0,σ2)

  • Model 2 is an example of a hierarchical model in which there are hierarchical structures placed on random effects. Here the random coeffcient β0i have a higher level mean μ.

  • Assumptions:

    • ϵijθi for all i and j.

    • θi are indentical and independent and normal for all i.

    • ϵij are identical and independent and normal for all i and j.

Properties of the Random Intercept Model

In-class exercise
  1. What is the overall (average) trend?

  2. What is total variability around the overall trend?

  3. What is the conditional (group-specific) trend?

  4. What is the conditional (within-group) residual variance?

Fitting a mixed effects model with a random intercept in R:

library(lmerTest)
Loading required package: lme4
Loading required package: Matrix

Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':

    lmer
The following object is masked from 'package:stats':

    step
fit.randomeffects = lmer(lead ~ week + (1|id), data = tlcdata)
summary(fit.randomeffects)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: lead ~ week + (1 | id)
   Data: tlcdata

REML criterion at convergence: 2695.1

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.6337 -0.3461  0.0601  0.4131  6.1167 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 29.63    5.444   
 Residual             33.98    5.829   
Number of obs: 400, groups:  id, 100

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  22.9760     0.7030 161.6442  32.683  < 2e-16 ***
week         -0.4010     0.1222 299.0000  -3.281  0.00116 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
week -0.478
  • Compared to a model fitted with group dummy variables, the week slope estimate and SE are identical.

Model Interpretation

Interpretation: The fixed effect captures the global or average change in lead level for every one unit change in time, i.e., from week 1 to week 2 β^1=0.410 meaning that there is an average 0.4 decrease in lead for every unit increase in week which was found to be statistically significant p<0.01.

Now the intercept 22.97 corresponds to the population-average lead at week 0.

The SE of the slope se(β1^)=0.122 capturing uncertainty associated with the estimate.

The random effect of child β0i allows for child-specific random intercepts, where the variance of the intercepts is 29.63.

#Extracting the random effects theta_i's
random.eff = ranef(fit.randomeffects)$id
hist(random.eff$`(Intercept)`)

Let’s fit an alternative model to the data including a treatment effect:

library(lmerTest)
fit.randomeffects = lmer(lead ~ week + treatment +  (1|id), data = tlcdata)
summary(fit.randomeffects)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: lead ~ week + treatment + (1 | id)
   Data: tlcdata

REML criterion at convergence: 2670.2

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4885 -0.3686 -0.0176  0.4514  6.3428 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 22.09    4.700   
 Residual             33.98    5.829   
Number of obs: 400, groups:  id, 100

Fixed effects:
                  Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)        25.7648     0.8512 136.0165  30.269  < 2e-16 ***
week               -0.4010     0.1222 299.0000  -3.281  0.00116 ** 
treatmentsuccimer  -5.5775     1.1060  98.0000  -5.043  2.1e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) week  
week        -0.395       
trtmntsccmr -0.650  0.000
In class exercise

What are the overall findings from this model? And how does this model differ from the previous model?

Covariance Structure

  • A random intercept model is also known as a two-level variance component model:

yij=μ+θi+βxij+ϵijθiiidN(0,τ2)ϵijiidN(0,σ2)

Random Intercept Model in Matrix Form

Random intercept model

Consider the mixed model with random intercepts for n groups and define N=i=1nri.

y=Zθ+Xβ+ϵ

where

  • y=N×1 vector of response

  • Z=N×n design matrix of indicator variables for each group.

  • θ=n×1 vector of random intercepts.

  • X=N×p design matrix of fixed effects (including overall intercept).

  • β=p×1 vector of fixed effects.

  • ϵ=N×1 vector of residual error.

Assumptions of the above model
  • θN(0,τ2In×n.

  • ϵN(0,σ2IN×N).

Summary I

What is the model?
  • In the random intercept model, we regress a vector of outcomes yij for individual i and within measure j on values on a vector of predictor values xij with:
  • Y=μ+θi+xijβ+ϵij
  • μ is the population mean or overall average across all subjects-times.
Model Assumptions
  • ϵijN(0,σ2)
  • Var(ϵ)=σ2 constant within subject variance
  • θiN(0,τ2).
  • Var(θi)=τ2 across subject variance.
  • θiϵij
Interpretation
  • θi *individual deviations away from the population mean μ.
  • β1 the global value of the change of Y for every one unit change of X.
  • σ2+τ2 is the total variance.

Intraclass Correlation

  • Note that the within-group covariance is: Cov(yij,yij)=τ2 .

  • So the correlation between observations within within the same group is

ρ=Corr(yij,yij)=τ2τ2+σ2 for all jj.

  • The value ρ is often called the intraclass correlation. It measures the degree of similarity among same-group observations compared to the residual error σ2.
  • Describes how strongly units in the same group resemble eachother.
  • ρ0 when τ20 (i.e. same intercept)
  • ρ0 when σ2 (i.e., growing measurement error).
  • ρ1 when τ2 (i.e., large separation in intercepts).
  • ρ1 when σ20 (i.e., zero measurement error).
  • The above ICC has an exchangeable structure because the correlation is constant between any pair of within-group observations.

Simulation Varying Between Subject Variability

Simulation Varying Measurement Error

Shrinkage and Random Effects

  • To simplify the derivation first consider a random effects model without fixed effects:

yij=θi+ϵij,θiiidN(0,τ2),ϵijiidN(0,σ2) - The join density of the data and random effects is given by:

i,jf(yij,θi)=i,jf(yij|θi)ig(θi)

  • Now consider the matrix formulation y=Zθ+ϵ where Z= deisgn matrix of indicator variables denoting the ijth observation belongs to group i.

  • Given the values of σ2 and τ2, it is easy to find the closed form solution to this.

  • Note that:

    • θ^i0 when τ20 (shrinks all random intercepts to zero).
    • θ^iy¯i. when τ2 (no shrinkage = raw group mean estimates).
    • θ^iy¯i. when σ20 (no shrinkage = raw group mean estimates).
    • θ^iy¯i. when r (no shrinkage = raw group mean estimates).

    τ2 controls the amount of shrinkage and how much information to borrow across groups.

Hierarchical Formulation

  • Now we assume the random effects are centered around a common mean. μ. Let’s take look at the hierarchical formulation.

yij=θi+ϵij,θiN(μ,τ2),ϵijN(0,σ2)

  • The joint density of the data and random effects is a multivariate normal given by:

i,jf(yij,θi)=i,jf(yij|θi)ig(θi)exp(12σ2[(yZθ)T(yZθ)+σ2τ2(θμ)T(θμ)])

Taking the first derivative, setting to 0, and solve! (On your own time).

Borrowing of Information
  • We can express θ^i as

θ^i=(1/τ2)μ+(ri/σ2)y¯i.1/τ2+(ri/σ2)

  • Since σ2/ri is the sample variance of the estimated sample mean y¯i., the above form shows that random effects can be viewed as weighted average of:

    • subject-specific mean: y¯i..
    • overall mean μ
  • We can also express θ^i in terms of the intraclass correlation ρ=τ2/(τ2+σ2).

    θ^i=ρ1μ+ri(1ρ)1y¯i.ρ1+ri(1ρ)1

  • Less shrinkage is expected for ρ1.

  • For unknown variance parameters θ^i is the BLUP: Best Linear Unbiased Predictor.

    • Unbiased: E(θ^i)=μ.
    • Best Linear Unbiased Prediction (BLUP) = Best means unbiased and minimum variance: E(θ^iθi)2 is minimized.
  • Note that

    • θ^iμ when you τ20 (shrink ALL random intercepts to a common mean μ).
    • θ^iy¯i. when τ2 (no shrinkage = raw mean estimates).

Parameter Estimation

y=Zθ+Xβ+ϵ - Since θ and ϵ are random effects, we can rewrite the above as

y=Xβ+ϵϵ=Zθ+ϵ - This is equivalent to integrating out the random effects. Then the marginal model is:

yN(Xβ,V)

  • The MLE approach from simple linear regression does not work because data are now correlated. We extend the simple OLS estimate to include covariance V:
MLE estimation

β^=(XTV1X)1XTV1y

  • The MLE estimate of the variances is biased! (Similar to the estimate in simple linear regression).

  • An alternative is the restricted maximum likelihood (REML):

σ^MLE2=1ni=1n(xix¯)2

vs.

σ^REML2=1n1i=1n(xix¯)2

Random Intercept and Random Slope Model

  • What are some scientific questions we may be interested in studying?
  • Consider the mixed effects model with a vector of predictors xij:

yij=(β0+θ0i)+(β1+θ1i)xij+ϵij,ϵijiidN(0,σ2)

  • $_{0i} = _0 + $ participant specific intercept or child specific average.

  • β1i=β1+θ1i participant specific rate of change or slope.

  • β0,β1 the intercept and slope that does not vary across participants.

  • σ2 within subject variability ie measurement error.

In class exercise

What do we do for θ1i?

Fitting the Random Intercept and Slope model in R

Let’s fit the model using the tlc data:

placebo_data = tlcdata %>% filter(treatment == "placebo")
fit_random <- lmer(lead ~ week + (week|id), data = placebo_data)
summary(fit_random)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: lead ~ week + (week | id)
   Data: placebo_data

REML criterion at convergence: 1053.3

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.20808 -0.49837 -0.02268  0.47100  2.30541 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 id       (Intercept) 23.3113  4.8282       
          week         0.1581  0.3976   0.02
 Residual              4.5016  2.1217       
Number of obs: 200, groups:  id, 50

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept) 25.68536    0.72018 49.01076  35.665  < 2e-16 ***
week        -0.37213    0.08437 49.00066  -4.411 5.64e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
     (Intr)
week -0.164

Interpretation:

  • Child-specific intercept: (β0+θ0i)N(25.68,23.31).
  • Child-specific slope: (β1+θ1i)N(0.372,0.158).
  • The correlation rho between (β0iβ1i)=0.02.
  • The 95% CIs of the subject-specific slopes does not include zero in the distribution. This we reject the null that a child’s lead levels do not increase with time.

Comparing Models

  • Is the random intercept plus slope preferred over random intercept only?
AIC
  • AIC: Akaike’s Information Criterion.

  • AIC is often used in model selection. It is a popular approach to choosing which variables should be in the model. AIC=2l(θ)+2p where l(θ) is the log-likelihood for all parameters θ and p is the number of parameters.

  • The MLE estimate of the variances is biased! (Similar to the estimate in simple linear regression).

  • The general rule of thumb is a differenc of 2+ is substantially better. A lower AIC is better.

  • The AIC uses the maximum likelihood so use REML=FALSE to set the parameter estimation to MLE.

BREAKOUT SESSION 2
  1. Read in the Nepalese Children data from Canvas.

Motivating Example

Study Design:

  1. Time-varying variables of 200 children collected up to 5 time points about 4 months apart.
  • age (month), indicator for current breastfeeding status, arm circumference (cm), height (cm), weight(kg).
  1. Time invariant baseline information:
  • sex of the child, mother’s age at birth, indicator of mother’s literacy.
load("/Users/emilypeterson/Library/CloudStorage/OneDrive-EmoryUniversity/BIOS 526/BIOS526_Book/data/Nepal200.RData")
  1. Review the data to make sure you understand the variables. \
  2. Fit the random intercept model using the code given above where arm circumference is regressed on age .
  3. Interpret the findings from this model.
  4. Fit the same model, but using MLE instead. Do this by setting REML=FALSE as a parameter in your code.
  5. Identify the differences in coefficient estimates.
  6. Fit the random intercept and slope model using the code given above where arm circumference is regressed on age.
  7. Interpret the findings of this model.
  8. Compare the random intercept only, and the random intercept plus slope model using AIC metrics. REML=FALSE must be included in the fit<lmer(y...,REML=F) function call. To get the AIC from a model fit using the function AIC(fit).
  9. Based on the AIC findings, which model is better?