9.6 Application: Nonlinear and Generalized Linear Mixed Models

9.6.1 Binomial Data: CBPP Dataset

We will use the CBPP dataset from the lme4 package to demonstrate different estimation approaches for binomial mixed models.

library(lme4)
data(cbpp, package = "lme4")
head(cbpp)
#>   herd incidence size period
#> 1    1         2   14      1
#> 2    1         3   12      2
#> 3    1         4    9      3
#> 4    1         0    5      4
#> 5    2         3   22      1
#> 6    2         1   18      2

The data contain information about contagious bovine pleuropneumonia (CBPP) cases across different herds and periods.

Penalized Quasi-Likelihood

Pros:

Linearizes the response to create a pseudo-response, similar to linear mixed models.
Computationally efficient.

Cons:

Biased for binary or Poisson data with small counts.
Random effects must be interpreted on the link scale.
AIC/BIC values are not interpretable since PQL does not rely on full likelihood.

library(MASS)
pql_cbpp <- glmmPQL(
    cbind(incidence, size - incidence) ~ period,
    random  = ~ 1 | herd,
    data    = cbpp,
    family  = binomial(link = "logit"),
    verbose = FALSE
)
summary(pql_cbpp)
#> Linear mixed-effects model fit by maximum likelihood
#>   Data: cbpp 
#>   AIC BIC logLik
#>    NA  NA     NA
#> 
#> Random effects:
#>  Formula: ~1 | herd
#>         (Intercept) Residual
#> StdDev:   0.5563535 1.184527
#> 
#> Variance function:
#>  Structure: fixed weights
#>  Formula: ~invwt 
#> Fixed effects:  cbind(incidence, size - incidence) ~ period 
#>                 Value Std.Error DF   t-value p-value
#> (Intercept) -1.327364 0.2390194 38 -5.553372  0.0000
#> period2     -1.016126 0.3684079 38 -2.758156  0.0089
#> period3     -1.149984 0.3937029 38 -2.920944  0.0058
#> period4     -1.605217 0.5178388 38 -3.099839  0.0036
#>  Correlation: 
#>         (Intr) perid2 perid3
#> period2 -0.399              
#> period3 -0.373  0.260       
#> period4 -0.282  0.196  0.182
#> 
#> Standardized Within-Group Residuals:
#>        Min         Q1        Med         Q3        Max 
#> -2.0591168 -0.6493095 -0.2747620  0.5170492  2.6187632 
#> 
#> Number of Observations: 56
#> Number of Groups: 15

Interpretation

exp(0.556)
#> [1] 1.743684

The above result shows how herd-specific odds vary, accounting for random effects.

The fixed effects are interpreted similarly to logistic regression. For example, with the logit link:

The log odds of having a case in period 2 are -1.016 less than in period 1 (baseline).

summary(pql_cbpp)$tTable
#>                 Value Std.Error DF   t-value      p-value
#> (Intercept) -1.327364 0.2390194 38 -5.553372 2.333216e-06
#> period2     -1.016126 0.3684079 38 -2.758156 8.888179e-03
#> period3     -1.149984 0.3937029 38 -2.920944 5.843007e-03
#> period4     -1.605217 0.5178388 38 -3.099839 3.637000e-03

Numerical Integration with glmer

Pros:

More accurate estimation since the method directly integrates over random effects.

Cons:

Computationally more expensive, especially with high-dimensional random effects.
May struggle with convergence for complex models.

numint_cbpp <- glmer(
    cbind(incidence, size - incidence) ~ period + (1 | herd),
    data = cbpp,
    family = binomial(link = "logit")
)
summary(numint_cbpp)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(incidence, size - incidence) ~ period + (1 | herd)
#>    Data: cbpp
#> 
#>       AIC       BIC    logLik -2*log(L)  df.resid 
#>     194.1     204.2     -92.0     184.1        51 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -2.3816 -0.7889 -0.2026  0.5142  2.8791 
#> 
#> Random effects:
#>  Groups Name        Variance Std.Dev.
#>  herd   (Intercept) 0.4123   0.6421  
#> Number of obs: 56, groups:  herd, 15
#> 
#> Fixed effects:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -1.3983     0.2312  -6.048 1.47e-09 ***
#> period2      -0.9919     0.3032  -3.272 0.001068 ** 
#> period3      -1.1282     0.3228  -3.495 0.000474 ***
#> period4      -1.5797     0.4220  -3.743 0.000182 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>         (Intr) perid2 perid3
#> period2 -0.363              
#> period3 -0.340  0.280       
#> period4 -0.260  0.213  0.198

Comparing PQL and Numerical Integration

For small datasets, the difference between PQL and numerical integration may be minimal.

library(rbenchmark)
benchmark(
    "PQL (MASS)" = {
        glmmPQL(
            cbind(incidence, size - incidence) ~ period,
            random = ~ 1 | herd,
            data = cbpp,
            family = binomial(link = "logit"),
            verbose = FALSE
        )
    },
    "Numerical Integration (lme4)" = {
        glmer(
            cbind(incidence, size - incidence) ~ period + (1 | herd),
            data = cbpp,
            family = binomial(link = "logit")
        )
    },
    replications = 50,
    columns = c("test", "replications", "elapsed", "relative"),
    order = "relative"
)
#>                           test replications elapsed relative
#> 1                   PQL (MASS)           50    4.48    1.000
#> 2 Numerical Integration (lme4)           50    9.16    2.045

Improving Accuracy with Gauss-Hermite Quadrature

Setting nAGQ > 1 increases the accuracy of the likelihood approximation:

numint_cbpp_GH <- glmer(
    cbind(incidence, size - incidence) ~ period + (1 | herd),
    data = cbpp,
    family = binomial(link = "logit"),
    nAGQ = 20
)
summary(numint_cbpp_GH)$coefficients[, 1] - 
    summary(numint_cbpp)$coefficients[, 1]
#>   (Intercept)       period2       period3       period4 
#> -0.0008808634  0.0005160912  0.0004066218  0.0002644629

Bayesian Approach with MCMCglmm

Pros:

Incorporates prior information and handles complex models with intractable likelihoods.
Provides full posterior distributions for parameters.

Cons:

Computationally intensive, especially with large datasets or complex hierarchical structures.

library(MCMCglmm)
Bayes_cbpp <- MCMCglmm(
    cbind(incidence, size - incidence) ~ period,
    random = ~ herd,
    data = cbpp,
    family = "multinomial2",
    verbose = FALSE
)
summary(Bayes_cbpp)
#> 
#>  Iterations = 3001:12991
#>  Thinning interval  = 10
#>  Sample size  = 1000 
#> 
#>  DIC: 537.6541 
#> 
#>  G-structure:  ~herd
#> 
#>      post.mean  l-95% CI u-95% CI eff.samp
#> herd   0.01121 1.144e-16   0.0425    71.03
#> 
#>  R-structure:  ~units
#> 
#>       post.mean l-95% CI u-95% CI eff.samp
#> units     1.128   0.2475    2.123    285.4
#> 
#>  Location effects: cbind(incidence, size - incidence) ~ period 
#> 
#>             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
#> (Intercept)   -1.5410  -2.1748  -0.8358    797.6 <0.001 ***
#> period2       -1.2464  -2.3298  -0.2327    708.7  0.014 *  
#> period3       -1.3737  -2.4446  -0.2574    719.4  0.018 *  
#> period4       -1.9347  -3.2662  -0.7931    580.2  0.002 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

MCMCglmm fits a residual variance component (useful with dispersion issues).

Variance Component Analysis

# explains less variability
apply(Bayes_cbpp$VCV, 2, sd)
#>       herd      units 
#> 0.05822934 0.51923602

Posterior Summaries

summary(Bayes_cbpp)$solutions
#>             post.mean  l-95% CI   u-95% CI eff.samp pMCMC
#> (Intercept) -1.541044 -2.174758 -0.8357819 797.6463 0.001
#> period2     -1.246428 -2.329763 -0.2327008 708.6562 0.014
#> period3     -1.373747 -2.444636 -0.2574092 719.3625 0.018
#> period4     -1.934747 -3.266209 -0.7930924 580.1757 0.002

MCMC Diagnostics

library(lattice)
xyplot(as.mcmc(Bayes_cbpp$Sol), layout = c(2, 2))

Figure 9.1: MCMC Diagnostic

There is no trend (i.e., well-mixed).

xyplot(as.mcmc(Bayes_cbpp$VCV), layout = c(2, 1))

Figure 9.2: Variable MCMC Diagnostic

For the herd variable, many of the values are 0, which suggests a problem. To address the instability in the herd effect sampling, we can either:

Modify prior distributions,
Increase the number of iterations

Bayes_cbpp2 <- MCMCglmm(
    cbind(incidence, size - incidence) ~ period,
    random = ~ herd,
    data = cbpp,
    family = "multinomial2",
    nitt = 20000,
    burnin = 10000,
    prior = list(G = list(list(V = 1, nu = 0.1))),
    verbose = FALSE
)
xyplot(as.mcmc(Bayes_cbpp2$VCV), layout = c(2, 1))

Figure 9.3: Variable MCMC Diagnostic (Alternative)

To change the shape of priors, in MCMCglmm use:

V controls for the location of the distribution (default = 1)
nu controls for the concentration around V (default = 0)

9.6.2 Count Data: Owl Dataset

We’ll now model count data using the Owl dataset

library(glmmTMB)
library(dplyr)
data(Owls, package = "glmmTMB")
Owls <- Owls %>% rename(Ncalls = SiblingNegotiation)

Poisson GLMM

Modeling call counts with a Poisson distribution:

In a typical Poisson model, the Poisson mean $\lambda$ is modeled as: $\log(\lambda) = x' \beta$ However, if the response variable represents a rate (e.g., counts per BroodSize), we can model it as: $\log\left(\frac{\lambda}{b}\right) = x' \beta$ This is equivalent to: $\log(\lambda) = \log(b) + x' \beta$ where $b$ represents BroodSize. In this formulation, we “offset” the mean by including the logarithm of $b$ as an offset term in the model. This adjustment accounts for the varying exposure or denominator in rate-based responses.

owls_glmer <- glmer(
    Ncalls ~ offset(log(BroodSize)) + FoodTreatment * SexParent + (1 | Nest),
    family = poisson,
    data = Owls
)
summary(owls_glmer)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: poisson  ( log )
#> Formula: Ncalls ~ offset(log(BroodSize)) + FoodTreatment * SexParent +  
#>     (1 | Nest)
#>    Data: Owls
#> 
#>       AIC       BIC    logLik -2*log(L)  df.resid 
#>    5212.8    5234.8   -2601.4    5202.8       594 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -3.5529 -1.7971 -0.6842  1.2689 11.4312 
#> 
#> Random effects:
#>  Groups Name        Variance Std.Dev.
#>  Nest   (Intercept) 0.2063   0.4542  
#> Number of obs: 599, groups:  Nest, 27
#> 
#> Fixed effects:
#>                                     Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                          0.65585    0.09567   6.855 7.12e-12 ***
#> FoodTreatmentSatiated               -0.65612    0.05606 -11.705  < 2e-16 ***
#> SexParentMale                       -0.03705    0.04501  -0.823   0.4104    
#> FoodTreatmentSatiated:SexParentMale  0.13135    0.07036   1.867   0.0619 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr) FdTrtS SxPrnM
#> FdTrtmntStt -0.225              
#> SexParentMl -0.292  0.491       
#> FdTrtmS:SPM  0.170 -0.768 -0.605

Nest explains a relatively large proportion of the variability (its standard deviation is larger than some coefficients).
The model fit isn’t great (deviance of 5202 on 594 df).

Negative Binomial Model

Addressing overdispersion using the negative binomial distribution:

owls_glmerNB <- glmer.nb(
    Ncalls ~ offset(log(BroodSize)) + FoodTreatment * SexParent + (1 | Nest),
    data = Owls
)
summary(owls_glmerNB)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: Negative Binomial(0.8423)  ( log )
#> Formula: Ncalls ~ offset(log(BroodSize)) + FoodTreatment * SexParent +  
#>     (1 | Nest)
#>    Data: Owls
#> 
#>       AIC       BIC    logLik -2*log(L)  df.resid 
#>    3495.6    3522.0   -1741.8    3483.6       593 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -0.8859 -0.7737 -0.2701  0.4443  6.1432 
#> 
#> Random effects:
#>  Groups Name        Variance Std.Dev.
#>  Nest   (Intercept) 0.1245   0.3529  
#> Number of obs: 599, groups:  Nest, 27
#> 
#> Fixed effects:
#>                                     Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                          0.69005    0.13400   5.150 2.61e-07 ***
#> FoodTreatmentSatiated               -0.76657    0.16509  -4.643 3.43e-06 ***
#> SexParentMale                       -0.02605    0.14575  -0.179    0.858    
#> FoodTreatmentSatiated:SexParentMale  0.15680    0.20512   0.764    0.445    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation of Fixed Effects:
#>             (Intr) FdTrtS SxPrnM
#> FdTrtmntStt -0.602              
#> SexParentMl -0.683  0.553       
#> FdTrtmS:SPM  0.450 -0.744 -0.671

There is an improvement using negative binomial considering over-dispersion

library(ggplot2)
ggplot(Owls, aes(x = Ncalls)) +
  geom_histogram(
    bins = 30,
    fill = "steelblue",
    color = "white",
    alpha = 0.8
  ) +
  labs(title = "Distribution of Owl Call Counts",
       x = "Number of Calls",
       y = "Frequency") +
  theme_minimal(base_size = 14)

Figure 9.4: Distribution of Owl Call Counts

Zero-Inflated Model

Handling excess zeros with a zero-inflated Poisson model:

library(glmmTMB)
owls_glmm <-
    glmmTMB(
        Ncalls ~ FoodTreatment * SexParent + offset(log(BroodSize)) +
            (1 | Nest),
        ziformula =  ~ 0,
        family = nbinom2(link = "log"),
        data = Owls
    )
owls_glmm_zi <-
    glmmTMB(
        Ncalls ~ FoodTreatment * SexParent + offset(log(BroodSize)) +
            (1 | Nest),
        ziformula =  ~ 1,
        family = nbinom2(link = "log"),
        data = Owls
    )

# Scale Arrival time to use as a covariate for zero-inflation parameter
Owls$ArrivalTime <- scale(Owls$ArrivalTime)
owls_glmm_zi_cov <- glmmTMB(
    Ncalls ~ FoodTreatment * SexParent +
        offset(log(BroodSize)) +
        (1 | Nest),
    ziformula =  ~ ArrivalTime,
    family = nbinom2(link = "log"),
    data = Owls
)

as.matrix(anova(owls_glmm, owls_glmm_zi))
#>              Df      AIC      BIC    logLik deviance    Chisq Chi Df
#> owls_glmm     6 3495.610 3521.981 -1741.805 3483.610       NA     NA
#> owls_glmm_zi  7 3431.646 3462.413 -1708.823 3417.646 65.96373      1
#>                Pr(>Chisq)
#> owls_glmm              NA
#> owls_glmm_zi 4.592983e-16


as.matrix(anova(owls_glmm_zi, owls_glmm_zi_cov))
#>                  Df      AIC      BIC    logLik deviance    Chisq Chi Df
#> owls_glmm_zi      7 3431.646 3462.413 -1708.823 3417.646       NA     NA
#> owls_glmm_zi_cov  8 3422.532 3457.694 -1703.266 3406.532 11.11411      1
#>                    Pr(>Chisq)
#> owls_glmm_zi               NA
#> owls_glmm_zi_cov 0.0008567362


summary(owls_glmm_zi_cov)
#>  Family: nbinom2  ( log )
#> Formula:          
#> Ncalls ~ FoodTreatment * SexParent + offset(log(BroodSize)) +      (1 | Nest)
#> Zero inflation:          ~ArrivalTime
#> Data: Owls
#> 
#>       AIC       BIC    logLik -2*log(L)  df.resid 
#>    3422.5    3457.7   -1703.3    3406.5       591 
#> 
#> Random effects:
#> 
#> Conditional model:
#>  Groups Name        Variance Std.Dev.
#>  Nest   (Intercept) 0.07487  0.2736  
#> Number of obs: 599, groups:  Nest, 27
#> 
#> Dispersion parameter for nbinom2 family (): 2.22 
#> 
#> Conditional model:
#>                                     Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)                          0.84778    0.09961   8.511  < 2e-16 ***
#> FoodTreatmentSatiated               -0.39529    0.13742  -2.877  0.00402 ** 
#> SexParentMale                       -0.07025    0.10435  -0.673  0.50079    
#> FoodTreatmentSatiated:SexParentMale  0.12388    0.16449   0.753  0.45138    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Zero-inflation model:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  -1.3018     0.1261  -10.32  < 2e-16 ***
#> ArrivalTime   0.3545     0.1074    3.30 0.000966 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

glmmTMB can handle ZIP GLMMs since it adds automatic differentiation to existing estimation strategies.

We can see ZIP GLMM with an arrival time covariate on the zero is best.

Arrival time has a positive effect on observing a nonzero number of calls
Interactions are non significant, the food treatment is significant (fewer calls after eating)
Nest variability is large in magnitude (without this, the parameter estimates change)

9.6.3 Binomial Example: Gotway Hessian Fly Data

We will analyze the Gotway Hessian Fly dataset from the agridat package to model binomial outcomes using both frequentist and Bayesian approaches.

9.6.3.1 Data Visualization

library(agridat)
library(ggplot2)
library(lme4)
library(spaMM)

data(gotway.hessianfly)
dat <- gotway.hessianfly
dat$prop <- dat$y / dat$n  # Proportion of successes

ggplot(dat, aes(x = lat, y = long, fill = prop)) +
    geom_tile() +
    scale_fill_gradient(low = 'white', high = 'black') +
    geom_text(aes(label = gen, color = block)) +
    ggtitle('Gotway Hessian Fly: Proportion of Infestation')

Figure 9.5: Gotway Hessian Fly: Proportion of Infestation

9.6.3.2 Model Specification

Fixed Effects ( $\boldsymbol{\beta}$ ): Genotype (gen)
Random Effects ( $\boldsymbol{\alpha}$ ): Block (block), accounting for spatial or experimental design variability

Frequentist Approach with glmer

flymodel <- glmer(
    cbind(y, n - y) ~ gen + (1 | block),
    data   = dat,
    family = binomial,
    nAGQ   = 5  # Using adaptive Gauss-Hermite quadrature for accuracy
)
summary(flymodel)
#> Generalized linear mixed model fit by maximum likelihood (Adaptive
#>   Gauss-Hermite Quadrature, nAGQ = 5) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: cbind(y, n - y) ~ gen + (1 | block)
#>    Data: dat
#> 
#>       AIC       BIC    logLik -2*log(L)  df.resid 
#>     162.2     198.9     -64.1     128.2        47 
#> 
#> Scaled residuals: 
#>      Min       1Q   Median       3Q      Max 
#> -2.38644 -1.01188  0.09631  1.03468  2.75479 
#> 
#> Random effects:
#>  Groups Name        Variance Std.Dev.
#>  block  (Intercept) 0.001022 0.03196 
#> Number of obs: 64, groups:  block, 4
#> 
#> Fixed effects:
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)   1.5035     0.3914   3.841 0.000122 ***
#> genG02       -0.1939     0.5302  -0.366 0.714645    
#> genG03       -0.5408     0.5103  -1.060 0.289263    
#> genG04       -1.4342     0.4714  -3.043 0.002346 ** 
#> genG05       -0.2037     0.5429  -0.375 0.707487    
#> genG06       -0.9783     0.5046  -1.939 0.052534 .  
#> genG07       -0.6041     0.5111  -1.182 0.237237    
#> genG08       -1.6774     0.4907  -3.418 0.000630 ***
#> genG09       -1.3984     0.4725  -2.960 0.003079 ** 
#> genG10       -0.6817     0.5333  -1.278 0.201183    
#> genG11       -1.4630     0.4843  -3.021 0.002522 ** 
#> genG12       -1.4591     0.4918  -2.967 0.003010 ** 
#> genG13       -3.5528     0.6600  -5.383 7.31e-08 ***
#> genG14       -2.5073     0.5264  -4.763 1.90e-06 ***
#> genG15       -2.0872     0.4851  -4.302 1.69e-05 ***
#> genG16       -2.9697     0.5383  -5.517 3.46e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation:

The fixed effects (gen) indicate how different genotypes influence the infestation probability.
The random effect for block captures variability due to experimental blocks, improving model robustness.
Odds Ratios: Exponentiating coefficients helps interpret the impact on infestation odds.

Bayesian Approach with MCMCglmm

library(MCMCglmm)
library(coda)

Bayes_flymodel <- MCMCglmm(
    cbind(y, n - y) ~ gen,
    random  = ~ block,
    data    = dat,
    family  = "multinomial2",
    verbose = FALSE
)
summary(Bayes_flymodel)
#> 
#>  Iterations = 3001:12991
#>  Thinning interval  = 10
#>  Sample size  = 1000 
#> 
#>  DIC: 877.3603 
#> 
#>  G-structure:  ~block
#> 
#>       post.mean l-95% CI u-95% CI eff.samp
#> block   0.02354 6.92e-17  0.05404    798.1
#> 
#>  R-structure:  ~units
#> 
#>       post.mean l-95% CI u-95% CI eff.samp
#> units    0.9667   0.3118    1.821    380.8
#> 
#>  Location effects: cbind(y, n - y) ~ gen 
#> 
#>             post.mean l-95% CI u-95% CI eff.samp  pMCMC    
#> (Intercept)   1.94362  0.60729  3.33723    815.1  0.002 ** 
#> genG02       -0.40383 -2.36504  1.26737   1016.4  0.694    
#> genG03       -0.72396 -2.72598  0.95420    847.0  0.418    
#> genG04       -1.81903 -3.70454 -0.09682    907.4  0.044 *  
#> genG05       -0.37390 -2.39632  1.32895    865.5  0.708    
#> genG06       -1.24261 -3.10044  0.37768    845.7  0.140    
#> genG07       -0.76143 -2.57149  1.03524    831.5  0.390    
#> genG08       -2.08644 -3.67326 -0.17025    825.0  0.012 *  
#> genG09       -1.82256 -3.57204 -0.16156    899.8  0.026 *  
#> genG10       -0.79585 -2.66493  1.07283   1121.7  0.400    
#> genG11       -1.95648 -3.58516 -0.15710    884.8  0.018 *  
#> genG12       -1.92374 -3.89023 -0.29812    813.6  0.030 *  
#> genG13       -4.40453 -6.55523 -2.39214    666.9 <0.001 ***
#> genG14       -3.20863 -5.22849 -1.44975    912.6 <0.001 ***
#> genG15       -2.76870 -4.65470 -1.04934    825.6 <0.001 ***
#> genG16       -3.81851 -5.74736 -1.98829   1035.0 <0.001 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

MCMC Diagnostics

Trace Plot: Checks for chain mixing and convergence.
Autocorrelation Plot: Evaluates dependency between MCMC samples.

# Trace plot for the first fixed effect
plot(Bayes_flymodel$Sol[, 1],
     main = colnames(Bayes_flymodel$Sol)[1])

Two-panel figure showing data related to the intercept. The left panel is a time series plot displaying the intercept values over iterations ranging from 4000 to 12000, with values fluctuating between 0 and 4. The right panel is a density plot of the intercept, with a peak around 2 and a range from -1 to 5. The density plot includes a note:

Figure 9.6: Trace Plot

# Autocorrelation plot
autocorr.plot(Bayes_flymodel$Sol[, 1],
              main = colnames(Bayes_flymodel$Sol)[1])

Figure 9.7: Autocorrelation Plot

Bayesian Interpretation:

Posterior Means: Represent the central tendency of the parameter estimates.
Credible Intervals: Unlike frequentist confidence intervals, they can be interpreted directly as the probability that the parameter lies within the interval.

9.6.4 Nonlinear Mixed Model: Yellow Poplar Data

This dataset comes from Schabenberger and Pierce (2001)

9.6.4.1 Data Preparation

dat2 <- read.table("images/YellowPoplarData_r.txt")
names(dat2) <-
    c('tn', 'k', 'dbh', 'totht', 'dob', 'ht', 'maxd', 'cumv')
dat2$t <- dat2$dob / dat2$dbh
dat2$r <- 1 - dat2$dob / dat2$totht

9.6.4.2 Data Visualization

library(dplyr)

dat2 <- dat2 %>% group_by(tn) %>% mutate(
    z = case_when(
        totht < 74 ~ 'a: 0-74ft',
        totht < 88 ~ 'b: 74-88',
        totht < 95 ~ 'c: 88-95',
        totht < 99 ~ 'd: 95-99',
        totht < 104 ~ 'e: 99-104',
        totht < 109 ~ 'f: 104-109',
        totht < 115 ~ 'g: 109-115',
        totht < 120 ~ 'h: 115-120',
        totht < 140 ~ 'i: 120-150',
        TRUE ~ 'j: 150+'
    )
)

ggplot(dat2, aes(x = r, y = cumv)) +
  geom_point(size = 0.5) +
  facet_wrap(vars(z)) +
  labs(title = "Cumulative Volume vs. Relative Height by Tree Height Group")

Scatter plot panels showing cumulative volume versus relative height, categorized by tree height groups labeled a to i. Each panel displays data points with varying density and distribution patterns. The x-axis represents relative height (r) ranging from 0.7 to 1.0, and the y-axis represents cumulative volume (cumv) from 0 to 200. Tree height groups are: a: 0–74ft, b: 74–88, c: 88–95, d: 95–99, e: 99–104, f: 104–109, g: 109–115, h: 115–120, i: 120–150.

Figure 9.8: Cumulative Volume vs. Relative Height by Tree Height Group

9.6.4.3 Model Specification

The proposed Nonlinear Mixed Model is: $V_{ij} = \left(\beta_0 + (\beta_1 + b_{1i})\frac{D_i^2 H_i}{1000}\right) \exp\left[-(\beta_2 + b_{2i}) t_{ij} \exp(\beta_3 t_{ij})\right] + e_{ij}$ Where:

$b_{1i}, b_{2i}$ are random effects for tree $i$ .
$e_{ij}$ are residual errors.

9.6.4.4 Fitting the Nonlinear Mixed Model

library(nlme)

tmp <- nlme(
    cumv ~ (b0 + (b1 + u1) * (dbh^2 * totht / 1000)) * 
        exp(-(b2 + u2) * (t / 1000) * exp(b3 * t)), 
    data = dat2,
    fixed = b0 + b1 + b2 + b3 ~ 1,
    random = pdDiag(u1 + u2 ~ 1),  # Uncorrelated random effects
    groups = ~ tn,                 # Grouping by tree
    start = list(fixed = c(b0 = 0.25, b1 = 2.3, b2 = 2.87, b3 = 6.7))
)
summary(tmp)
#> Nonlinear mixed-effects model fit by maximum likelihood
#>   Model: cumv ~ (b0 + (b1 + u1) * (dbh^2 * totht/1000)) * exp(-(b2 + u2) *      (t/1000) * exp(b3 * t)) 
#>   Data: dat2 
#>        AIC      BIC    logLik
#>   31103.73 31151.33 -15544.86
#> 
#> Random effects:
#>  Formula: list(u1 ~ 1, u2 ~ 1)
#>  Level: tn
#>  Structure: Diagonal
#>                u1       u2 Residual
#> StdDev: 0.1508094 0.447829 2.226361
#> 
#> Fixed effects:  b0 + b1 + b2 + b3 ~ 1 
#>       Value  Std.Error   DF  t-value p-value
#> b0 0.249386 0.12894686 6297   1.9340  0.0532
#> b1 2.288832 0.01266804 6297 180.6777  0.0000
#> b2 2.500497 0.05606686 6297  44.5985  0.0000
#> b3 6.848871 0.02140677 6297 319.9395  0.0000
#>  Correlation: 
#>    b0     b1     b2    
#> b1 -0.639              
#> b2  0.054  0.056       
#> b3 -0.011 -0.066 -0.850
#> 
#> Standardized Within-Group Residuals:
#>           Min            Q1           Med            Q3           Max 
#> -6.694575e+00 -3.081861e-01 -8.907041e-05  3.469469e-01  7.855665e+00 
#> 
#> Number of Observations: 6636
#> Number of Groups: 336
nlme::intervals(tmp)
#> Approximate 95% confidence intervals
#> 
#>  Fixed effects:
#>           lower      est.     upper
#> b0 -0.003317833 0.2493858 0.5020894
#> b1  2.264006069 2.2888323 2.3136585
#> b2  2.390620116 2.5004971 2.6103742
#> b3  6.806919325 6.8488712 6.8908232
#> 
#>  Random Effects:
#>   Level: tn 
#>            lower      est.     upper
#> sd(u1) 0.1376068 0.1508094 0.1652787
#> sd(u2) 0.4056207 0.4478290 0.4944295
#> 
#>  Within-group standard error:
#>    lower     est.    upper 
#> 2.187259 2.226361 2.266161

9.6.4.5 Interpretation:

Fixed Effects ( $\beta$ ): Describe the average growth pattern across all trees.
Random Effects ( $b_i$ ): Capture tree-specific deviations from the average trend.

This result is a bit different from the original study because of different implementation of nonlinear mixed models.

9.6.4.6 Visualizing Model Predictions

library(cowplot)

# Prediction function
nlmmfn <- function(fixed, rand, dbh, totht, t) {
    (fixed[1] + (fixed[2] + rand[1]) * (dbh ^ 2 * totht / 1000)) *
        exp(-(fixed[3] + rand[2]) * (t / 1000) * exp(fixed[4] * t))
}

# Function to generate plots for selected trees
plot_tree <- function(tree_id) {
    pred <- data.frame(dob = seq(1, max(dat2$dob), length.out = 100))
    pred$tn <- tree_id
    pred$dbh <- unique(dat2$dbh[dat2$tn == tree_id])
    pred$t <- pred$dob / pred$dbh
    pred$totht <- unique(dat2$totht[dat2$tn == tree_id])
    pred$r <- 1 - pred$dob / pred$totht
    
    pred$with_random <- predict(tmp, pred)
    pred$without_random <-
        nlmmfn(tmp$coefficients$fixed,
               c(0, 0),
               pred$dbh,
               pred$totht,
               pred$t)
    
    ggplot(pred) +
        geom_line(aes(x = r, y = with_random, color = 'With Random Effects')) +
        geom_line(aes(x = r, y = without_random, color = 'Without Random Effects')) +
        geom_point(data = dat2[dat2$tn == tree_id,], aes(x = r, y = cumv)) +
        labs(title = paste('Tree', tree_id), colour = "") +
        theme(legend.position = "bottom")
}

# Plotting for selected trees
p1 <- plot_tree(1)
p2 <- plot_tree(151)
p3 <- plot_tree(279)

plot_grid(p1, p2, p3)

Figure 9.9: Individual Tree Scatter Plot

Red Line: Model predictions with tree-specific random effects.
Teal Line: Model predictions based only on fixed effects (ignoring tree-specific variation).
Dots: Observed cumulative volume for each tree.

References

Schabenberger, Oliver, and Francis J Pierce. 2001. Contemporary Statistical Models for the Plant and Soil Sciences. CRC press.