8.8 Application: Mixed Models in Practice

Several R packages are available for fitting mixed-effects models, each with unique strengths:

  • nlme
    • Supports nested and crossed random effects.
    • Flexible for complex covariance structures.
    • Less intuitive syntax compared to lme4.
  • lme4
    • Computationally efficient and widely used.
    • User-friendly formula syntax.
    • Can handle non-normal responses (e.g., GLMMs).
    • For detailed documentation, refer to D. Bates et al. (2014).
  • Others:
    • Bayesian Mixed Models: MCMCglmm, brms.
    • Genetics/Plant Breeding: ASReml.

8.8.1 Example 1: Pulp Brightness Analysis

8.8.1.1 Model Specification

We start with a random-intercepts model for pulp brightness:

\[ y_{ij} = \mu + \alpha_i + \epsilon_{ij} \]

Where:

  • \(i = 1, \dots, a\) = Groups for random effect \(\alpha_i\).

  • \(j = 1, \dots, n\) = Observations per group.

  • \(\mu\) = Overall mean brightness (fixed effect).

  • \(\alpha_i \sim N(0, \sigma^2_\alpha)\) = Group-specific random effect.

  • \(\epsilon_{ij} \sim N(0, \sigma^2_\epsilon)\) = Residual error.

This implies a compound symmetry structure, where the intraclass correlation coefficient is:

\[ \rho = \frac{\sigma^2_\alpha}{\sigma^2_\alpha + \sigma^2_\epsilon} \]

  • If \(\sigma^2_\alpha \to 0\): Low correlation within groups (\(\rho \to 0\)) (i.e., when factor \(a\) doesn’t explain much variation).
  • If \(\sigma^2_\alpha \to \infty\): High correlation within groups (\(\rho \to 1\)).

8.8.1.2 Data Exploration

data(pulp, package = "faraway")
library(ggplot2)
library(dplyr)

# Visualize brightness by operator
ggplot(pulp, aes(x = operator, y = bright)) +
    geom_boxplot(fill = "lightblue") +
    labs(title = "Pulp Brightness by Operator",
         x = "Operator",
         y = "Brightness") +
    theme_minimal()


# Group-wise summary
pulp %>%
    group_by(operator) %>%
    summarise(average_brightness = mean(bright), .groups = 'drop')
#> # A tibble: 4 × 2
#>   operator average_brightness
#>   <fct>                 <dbl>
#> 1 a                      60.2
#> 2 b                      60.1
#> 3 c                      60.6
#> 4 d                      60.7

8.8.1.3 Fitting the Mixed Model with lme4

library(lme4)

# Random intercepts model: operator as a random effect
mixed_model <- lmer(bright ~ 1 + (1 | operator), data = pulp)

# Model summary
summary(mixed_model)
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: bright ~ 1 + (1 | operator)
#>    Data: pulp
#> 
#> REML criterion at convergence: 18.6
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.4666 -0.7595 -0.1244  0.6281  1.6012 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  operator (Intercept) 0.06808  0.2609  
#>  Residual             0.10625  0.3260  
#> Number of obs: 20, groups:  operator, 4
#> 
#> Fixed effects:
#>             Estimate Std. Error      df t value Pr(>|t|)    
#> (Intercept)  60.4000     0.1494  3.0000   404.2 3.34e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Fixed effects (overall mean)
fixef(mixed_model)
#> (Intercept) 
#>        60.4

# Random effects (BLUPs)
ranef(mixed_model)
#> $operator
#>   (Intercept)
#> a  -0.1219403
#> b  -0.2591231
#> c   0.1676679
#> d   0.2133955
#> 
#> with conditional variances for "operator"

# Variance components
VarCorr(mixed_model)
#>  Groups   Name        Std.Dev.
#>  operator (Intercept) 0.26093 
#>  Residual             0.32596
re_dat <- as.data.frame(VarCorr(mixed_model))

# Intraclass Correlation Coefficient
rho <- re_dat[1, 'vcov'] / (re_dat[1, 'vcov'] + re_dat[2, 'vcov'])
rho
#> [1] 0.3905354

8.8.1.4 Inference with lmerTest

To obtain p-values for fixed effects using Satterthwaite’s approximation:

library(lmerTest)

# Model with p-values
summary(lmer(bright ~ 1 + (1 | operator), data = pulp))$coefficients
#>             Estimate Std. Error df  t value     Pr(>|t|)
#> (Intercept)     60.4  0.1494434  3 404.1664 3.340265e-08

# Confidence interval for the fixed effect
confint(mixed_model)[3,]
#>   2.5 %  97.5 % 
#> 60.0713 60.7287

In this example, we can see that the confidence interval computed by confint from the lme4 package is very close to the one computed by confint from the lmerTest package.

8.8.1.5 Bayesian Mixed Model with MCMCglmm

library(MCMCglmm)

# Bayesian mixed model
mixed_model_bayes <- MCMCglmm(
  bright ~ 1,
  random = ~ operator,
  data = pulp,
  verbose = FALSE
)

# Posterior summaries
summary(mixed_model_bayes)$solutions
#>             post.mean l-95% CI u-95% CI eff.samp pMCMC
#> (Intercept)  60.40789  60.1488 60.69595     1000 0.001

Bayesian credible intervals may differ slightly from frequentist confidence intervals due to prior assumptions.

8.8.1.6 Predictions

# Random effects predictions (BLUPs)
ranef(mixed_model)$operator
#>   (Intercept)
#> a  -0.1219403
#> b  -0.2591231
#> c   0.1676679
#> d   0.2133955

# Predictions per operator
fixef(mixed_model) + ranef(mixed_model)$operator
#>   (Intercept)
#> a    60.27806
#> b    60.14088
#> c    60.56767
#> d    60.61340

# Equivalent using predict()
predict(mixed_model, newdata = data.frame(operator = c('a', 'b', 'c', 'd')))
#>        1        2        3        4 
#> 60.27806 60.14088 60.56767 60.61340

For bootstrap confidence intervals, use:

bootMer(mixed_model, FUN = fixef, nsim = 100)
#> 
#> PARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> bootMer(x = mixed_model, FUN = fixef, nsim = 100)
#> 
#> 
#> Bootstrap Statistics :
#>     original        bias    std. error
#> t1*     60.4 -0.0005452538    0.156374

8.8.2 Example 2: Penicillin Yield (GLMM with Blocking)

data(penicillin, package = "faraway")
library(ggplot2)

# Visualize yield by treatment and blend
ggplot(penicillin, aes(y = yield, x = treat, shape = blend, color = blend)) +
  geom_point(size = 3) +
  labs(title = "Penicillin Yield by Treatment and Blend") +
  theme_minimal()


# Mixed model: blend as random effect, treatment as fixed
mixed_model <- lmer(yield ~ treat + (1 | blend), data = penicillin)
summary(mixed_model)
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: yield ~ treat + (1 | blend)
#>    Data: penicillin
#> 
#> REML criterion at convergence: 103.8
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.4152 -0.5017 -0.1644  0.6830  1.2836 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  blend    (Intercept) 11.79    3.434   
#>  Residual             18.83    4.340   
#> Number of obs: 20, groups:  blend, 5
#> 
#> Fixed effects:
#>             Estimate Std. Error     df t value Pr(>|t|)    
#> (Intercept)   84.000      2.475 11.075  33.941 1.51e-12 ***
#> treatB         1.000      2.745 12.000   0.364   0.7219    
#> treatC         5.000      2.745 12.000   1.822   0.0935 .  
#> treatD         2.000      2.745 12.000   0.729   0.4802    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>        (Intr) treatB treatC
#> treatB -0.555              
#> treatC -0.555  0.500       
#> treatD -0.555  0.500  0.500

# BLUPs for each blend
ranef(mixed_model)$blend
#>        (Intercept)
#> Blend1   4.2878788
#> Blend2  -2.1439394
#> Blend3  -0.7146465
#> Blend4   1.4292929
#> Blend5  -2.8585859

Testing for Treatment Effect

# ANOVA for fixed effects
anova(mixed_model)
#> Type III Analysis of Variance Table with Satterthwaite's method
#>       Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#> treat     70  23.333     3    12  1.2389 0.3387

Since the \(p\)-value \(> .05\), we fail to reject the null hypothesis (no treatment effect).

Model Comparison with Kenward-Roger Approximation

library(pbkrtest)

# Full model vs. null model
full_model <-
    lmer(yield ~ treat + (1 | blend), penicillin, REML = FALSE)
null_model <-
    lmer(yield ~ 1 + (1 | blend), penicillin, REML = FALSE)

# Kenward-Roger approximation
KRmodcomp(full_model, null_model)
#> large : yield ~ treat + (1 | blend)
#> small : yield ~ 1 + (1 | blend)
#>          stat     ndf     ddf F.scaling p.value
#> Ftest  1.2389  3.0000 12.0000         1  0.3387

The results are consistent with the earlier ANOVA: no significant treatment effect.

8.8.3 Example 3: Growth in Rats Over Time

rats <- read.csv(
    "images/rats.dat",
    header = FALSE,
    sep = ' ',
    col.names = c('Treatment', 'rat', 'age', 'y')
)

# Log-transformed time variable
rats$t <- log(1 + (rats$age - 45) / 10)

Model Fitting

# Treatment as fixed effect, random intercepts for rats
rat_model <- lmer(y ~ t:Treatment + (1 | rat), data = rats)
summary(rat_model)
#> Linear mixed model fit by REML. t-tests use Satterthwaite's method [
#> lmerModLmerTest]
#> Formula: y ~ t:Treatment + (1 | rat)
#>    Data: rats
#> 
#> REML criterion at convergence: 932.4
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -2.25574 -0.65898 -0.01163  0.58356  2.88309 
#> 
#> Random effects:
#>  Groups   Name        Variance Std.Dev.
#>  rat      (Intercept) 3.565    1.888   
#>  Residual             1.445    1.202   
#> Number of obs: 252, groups:  rat, 50
#> 
#> Fixed effects:
#>                Estimate Std. Error       df t value Pr(>|t|)    
#> (Intercept)     68.6074     0.3312  89.0275  207.13   <2e-16 ***
#> t:Treatmentcon   7.3138     0.2808 247.2762   26.05   <2e-16 ***
#> t:Treatmenthig   6.8711     0.2276 247.7097   30.19   <2e-16 ***
#> t:Treatmentlow   7.5069     0.2252 247.5196   33.34   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr) t:Trtmntc t:Trtmnth
#> t:Tretmntcn -0.327                    
#> t:Tretmnthg -0.340  0.111             
#> t:Tretmntlw -0.351  0.115     0.119
anova(rat_model)
#> Type III Analysis of Variance Table with Satterthwaite's method
#>             Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
#> t:Treatment 3181.9  1060.6     3 223.21  734.11 < 2.2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value is significant, we conclude that the treatment effect varies over time.

8.8.4 Example 4: Tree Water Use (Agridat)

library(agridat)
dat <- harris.wateruse

# Visualizing water use by species and age
library(latticeExtra)
useOuterStrips(
  xyplot(water ~ day | species * age, 
         dat, group = tree,
         type = c('p', 'smooth'), 
         main = "harris.wateruse 2 species, 2 ages (10 trees each)",
         as.table = TRUE)
)

Remove outlier

dat <- subset(dat, day!=268)

Plot between water use and day for one age and species group

xyplot(
    water ~ day | tree,
    dat,
    subset   = age == "A2" & species == "S2",
    as.table = TRUE,
    type     = c('p', 'smooth'),
    ylab     = "Water use profiles of individual trees",
    main     = "harris.wateruse (Age 2, Species 2)"
)

# Rescale day for nicer output, and convergence issues
dat <- transform(dat, ti = day / 100)
# add quadratic term
dat <- transform(dat, ti2 = ti * ti)

# Start with a subgroup: age 2, species 2
d22 <- droplevels(subset(dat, age == "A2" & species == "S2"))

Fitting with nlme using lme

library(nlme)

## We use pdDiag() to get uncorrelated random effects
m1n <- lme(
    water ~ 1 + ti + ti2,
    #intercept, time and time-squared = fixed effects
    data = d22,
    na.action = na.omit,
    random = list(tree = pdDiag(~ 1 + ti + ti2)) 
    # random intercept, time 
    # and time squared per tree = random effects
)

# for all trees
# m1n <- lme(
#   water ~ 1 + ti + ti2,
#   random = list(tree = pdDiag(~ 1 + ti + ti2)),
#   data = dat,
#   na.action = na.omit
# )

summary(m1n)
#> Linear mixed-effects model fit by REML
#>   Data: d22 
#>        AIC     BIC    logLik
#>   276.5142 300.761 -131.2571
#> 
#> Random effects:
#>  Formula: ~1 + ti + ti2 | tree
#>  Structure: Diagonal
#>         (Intercept)           ti          ti2  Residual
#> StdDev:   0.5187869 1.631223e-05 4.374982e-06 0.3836614
#> 
#> Fixed effects:  water ~ 1 + ti + ti2 
#>                  Value Std.Error  DF   t-value p-value
#> (Intercept) -10.798799 0.8814666 227 -12.25094       0
#> ti           12.346704 0.7827112 227  15.77428       0
#> ti2          -2.838503 0.1720614 227 -16.49704       0
#>  Correlation: 
#>     (Intr) ti    
#> ti  -0.979       
#> ti2  0.970 -0.997
#> 
#> Standardized Within-Group Residuals:
#>         Min          Q1         Med          Q3         Max 
#> -3.07588246 -0.58531056  0.01210209  0.65402695  3.88777402 
#> 
#> Number of Observations: 239
#> Number of Groups: 10

ranef(m1n)
#>     (Intercept)            ti           ti2
#> T04   0.1985796  2.070606e-09  6.397103e-10
#> T05   0.3492827  3.199664e-10 -6.211457e-11
#> T19  -0.1978989 -9.879555e-10 -2.514502e-10
#> T23   0.4519003 -4.206418e-10 -3.094113e-10
#> T38  -0.6457494 -2.069198e-09 -4.227912e-10
#> T40   0.3739432  4.199061e-10 -3.260161e-11
#> T49   0.8620648  1.160387e-09 -6.925457e-12
#> T53  -0.5655049 -1.064849e-09 -5.870462e-11
#> T67  -0.4394623 -4.482549e-10  2.752922e-11
#> T71  -0.3871552  1.020034e-09  4.767595e-10

fixef(m1n)
#> (Intercept)          ti         ti2 
#>  -10.798799   12.346704   -2.838503

Fitting with lme4 using lmer

library(lme4)

m1lmer <-
    lmer(water ~ 1 + ti + ti2 + (ti + ti2 ||
                                     tree),
         data = d22,
         na.action = na.omit)

# for all trees
# m1lmer <- lmer(water ~ 1 + ti + ti2 + (ti + ti2 || tree),
#                data = dat, na.action = na.omit)

# summary(m1lmer)
ranef(m1lmer)
#> $tree
#>     (Intercept) ti ti2
#> T04   0.1985796  0   0
#> T05   0.3492827  0   0
#> T19  -0.1978989  0   0
#> T23   0.4519003  0   0
#> T38  -0.6457494  0   0
#> T40   0.3739432  0   0
#> T49   0.8620648  0   0
#> T53  -0.5655049  0   0
#> T67  -0.4394623  0   0
#> T71  -0.3871552  0   0
#> 
#> with conditional variances for "tree"

fixef(m1lmer)
#> (Intercept)          ti         ti2 
#>  -10.798799   12.346704   -2.838503

Notes:

  • || double pipes = uncorrelated random effects
  • To remove the intercept term:
    • (0+ti|tree)
    • (ti-1|tree)
m1l <-
    lmer(water ~ 1 + ti + ti2 
         + (1 | tree) + (0 + ti | tree) 
         + (0 + ti2 | tree), data = d22)
ranef(m1l)
#> $tree
#>     (Intercept) ti ti2
#> T04   0.1985796  0   0
#> T05   0.3492827  0   0
#> T19  -0.1978989  0   0
#> T23   0.4519003  0   0
#> T38  -0.6457494  0   0
#> T40   0.3739432  0   0
#> T49   0.8620648  0   0
#> T53  -0.5655049  0   0
#> T67  -0.4394623  0   0
#> T71  -0.3871552  0   0
#> 
#> with conditional variances for "tree"
fixef(m1l)
#> (Intercept)          ti         ti2 
#>  -10.798799   12.346704   -2.838503

Adding Correlation Structure

m2n <- lme(
    water ~ 1 + ti + ti2,
    data = d22,
    random = ~ 1 | tree,
    cor = corExp(form =  ~ day | tree),
    na.action = na.omit
)

# for all trees
# m2n <- lme(
#   water ~ 1 + ti + ti2,
#   random = ~ 1 | tree,
#   cor = corExp(form = ~ day | tree),
#   data = dat,
#   na.action = na.omit
# )

summary(m2n)
#> Linear mixed-effects model fit by REML
#>   Data: d22 
#>        AIC      BIC   logLik
#>   263.3081 284.0911 -125.654
#> 
#> Random effects:
#>  Formula: ~1 | tree
#>         (Intercept)  Residual
#> StdDev:   0.5154042 0.3925777
#> 
#> Correlation Structure: Exponential spatial correlation
#>  Formula: ~day | tree 
#>  Parameter estimate(s):
#>    range 
#> 3.794624 
#> Fixed effects:  water ~ 1 + ti + ti2 
#>                  Value Std.Error  DF   t-value p-value
#> (Intercept) -11.223310 1.0988725 227 -10.21348       0
#> ti           12.712094 0.9794235 227  12.97916       0
#> ti2          -2.913682 0.2148551 227 -13.56115       0
#>  Correlation: 
#>     (Intr) ti    
#> ti  -0.985       
#> ti2  0.976 -0.997
#> 
#> Standardized Within-Group Residuals:
#>         Min          Q1         Med          Q3         Max 
#> -3.04861039 -0.55703950  0.00278101  0.62558762  3.80676991 
#> 
#> Number of Observations: 239
#> Number of Groups: 10
ranef(m2n)
#>     (Intercept)
#> T04   0.1929971
#> T05   0.3424631
#> T19  -0.1988495
#> T23   0.4538660
#> T38  -0.6413664
#> T40   0.3769378
#> T49   0.8410043
#> T53  -0.5528236
#> T67  -0.4452930
#> T71  -0.3689358
fixef(m2n)
#> (Intercept)          ti         ti2 
#>  -11.223310   12.712094   -2.913682

Key Takeaways

  • lme4 is preferred for general mixed models due to efficiency and simplicity.

  • nlme is powerful for complex correlation structures and nested designs.

  • Bayesian models (e.g., MCMCglmm) offer flexible inference under uncertainty.

  • Always consider model diagnostics and random effects structure carefully.

References

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2014. “Fitting Linear Mixed-Effects Models Using Lme4.” https://arxiv.org/abs/1406.5823.