9.1 Estimation

In linear mixed models, the marginal likelihood for \(\mathbf{y}\) is the integration of the random effects from the hierarchical formulation

\[ f(\mathbf{y}) = \int f(\mathbf{y}| \alpha) f(\alpha) d \alpha \]

For linear mixed models, we assumed that the 2 component distributions were Gaussian with linear relationships, which implied the marginal distribution was also linear and Gaussian and allows us to solve this integral analytically.

On the other hand, GLMMs, the distribution for \(f(\mathbf{y} | \alpha)\) is not Gaussian in general, and for NLMMs, the functional form between the mean response and the random (and fixed) effects is nonlinear. In both cases, we can’t perform the integral analytically, which means we have to solve it

9.1.1 Estimation by Numerical Integration

The marginal likelihood is

\[ L(\beta; \mathbf{y}) = \int f(\mathbf{y} | \alpha) f(\alpha) d \alpha \]

Estimation fo the fixed effects requires \(\frac{\partial l}{\partial \beta}\), where \(l\) is the log-likelihood

One way to obtain the marginal inference is to numerically integrate out the random effects through

  • numerical quadrature

  • Laplace approximation

  • Monte Carlo methods

When the dimension of \(\mathbf{\alpha}\) is relatively low, this is easy. But when the dimension of \(\alpha\) is high, additional approximation is required.

9.1.2 Estimation by Linearization

Idea: Linearized version of the response (known as working response, or pseudo-response) called \(\tilde{y}_i\) and then the conditional mean is

\[ E(\tilde{y}_i | \alpha) = \mathbf{x}_i' \beta + \mathbf{z}_i' \alpha \]

and also estimate \(var(\tilde{y}_i | \alpha)\). then, apply Linear Mixed Models estimation as usual.

The difference is only in how the linearization is done (i.e., how to expand \(f(\mathbf{x, \theta, \alpha})\) or the inverse link function Penalized quasi-likelihood


This is the more popular method

\[ \tilde{y}_i^{(k)} = \hat{\eta}_i^{(k-1)} + ( y_i - \hat{\mu}_i^{(k-1)})\frac{d \eta}{d \mu}| \hat{\eta}_i^{(k-1)} \]


  • \(\eta_i = g(\mu_i)\) is the linear predictor

  • \(k\) = iteration of the optimization algorithm

The algorithm updates \(\tilde{y}_i\) after each linear mixed model fit using \(E(\tilde{y}_i | \alpha)\) and \(var(\tilde{y}_i | \alpha)\)


  • Easy to implement

  • Inference is only asymptotically correct due to the linearizaton

  • Biased estimates are likely for binomial response with small groups and worst for Bernoulli response. Similarly for Poisson models with small counts. (Faraway 2016)

  • Hypothesis testing and confidence intervals also have problems. Generalized Estimating Equations


Let a marginal generalized linear model for the mean of y as a function of the predictors, which means we linearize the mean response function and assume a dependent error structure

Binary data:

\[ logit (E(\mathbf{y})) = \mathbf{X} \beta \]

If we assume a “working covariance matrix”, \(\mathbf{V}\) the the elements of \(\mathbf{y}\), then the maximum likelihood equations for estimating \(\beta\) is

\[ \mathbf{X'V^{-1}y} = \mathbf{X'V^{-1}} E(\mathbf{y}) \]

If \(\mathbf{V}\) is correct, then unbiased estimating equations

We typically define \(\mathbf{V} = \mathbf{I}\). Solutions to unbiased estimating equation give consistent estimators.

In practice, we assume a covariance structure, and then do a logistic regression, and calculate its large sample variance

Let \(y_{ij} , j = 1,..,n_i, i = 1,..,K\) be the j-th measurement on the \(i\)-th subject.

\[ \mathbf{y}_i = \left( \begin{array} {c} y_{i1} \\ . \\ y_{in_i} \end{array} \right) \]

with mean

\[ \mathbf{\mu}_i = \left( \begin{array} {c} \mu_{i1} \\ . \\ \mu_{in_i} \end{array} \right) \]


\[ \mathbf{x}_{ij} = \left( \begin{array} {c} X_{ij1} \\ . \\ X_{ijp} \end{array} \right) \]

Let \(\mathbf{V}_i = cov(\mathbf{y}_i)\), then based on(Liang and Zeger 1986) GEE estimates for \(\beta\) can be obtained from solving the equation:

\[ S(\beta) = \sum_{i=1}^K \frac{\partial \mathbf{\mu}_i'}{\partial \beta} \mathbf{V}^{-1}(\mathbf{y}_i - \mathbf{\mu}_i) = 0 \]

Let \(\mathbf{R}_i (\mathbf{c})\) be an \(n_i \times n_i\) “working” correlation matrix specified up to some parameters \(\mathbf{c}\). Then, \(\mathbf{V}_i = a(\phi) \mathbf{B}_i^{1/2}\mathbf{R}(\mathbf{c}) \mathbf{B}_i^{1/2}\), where \(\mathbf{B}_i\) is an \(n_i \times n_i\) diagonal matrix with \(V(\mu_{ij})\) on the j-th diagonal

If \(\mathbf{R}(\mathbf{c})\) is the true correlation matrix of \(\mathbf{y}_i\), then \(\mathbf{V}_i\) is the true covariance matrix

The working correlation matrix must be estimated iteratively by a fitting algorithm:

  1. Compute the initial estimate of \(\beta\) (using GLM under the independence assumption)

  2. Compute the working correlation matrix \(\mathbf{R}\) based upon studentized residuals

  3. Compute the estimate covariance \(\hat{\mathbf{V}}_i\)

  4. Update \(\beta\) according to

    \[ \beta_{r+1} = \beta_r + (\sum_{i=1}^K \frac{\partial \mathbf{\mu}'_i}{\partial \beta} \hat{\mathbf{V}}_i^{-1} \frac{\partial \mathbf{\mu}_i}{\partial \beta}) \]

  5. Iterate until the algorithm converges

Note: Inference based on likelihoods is not appropriate because this is not a likelihood estimator

9.1.3 Estimation by Bayesian Hierarchical Models

Bayesian Estimation

\[ f(\mathbf{\alpha}, \mathbf{\beta} | \mathbf{y}) \propto f(\mathbf{y} | \mathbf{\alpha}, \mathbf{\beta}) f(\mathbf{\alpha})f(\mathbf{\beta}) \]

Numerical techniques (e.g., MCMC) can be used to find posterior distribution. This method is best in terms of not having to make simplifying approximation and fully accounting for uncertainty in estimation and prediction, but it could be complex, time-consuming, and computationally intensive.

Implementation Issues:

  • No valid joint distribution can be constructed from the given conditional model and random parameters

  • The mean/ variance relationship and the random effects lead to constraints on the marginal covariance model

  • Difficult to fit computationally

2 types of estimation approaches:

  1. Approximate the objective function (marginal likelihood) through integral approximation

    1. Laplace methods

    2. Quadrature methods

    3. Monte Carlo integration

  2. Approximate the model (based on Taylor series linearization)

Packages in R

  • GLMM: MASS:glmmPQL lme4::glmer glmmTMB

  • NLMM: nlme::nlme; lme4::nlmer brms::brm

  • Bayesian: MCMCglmm ; brms:brm

Example: Non-Gaussian Repeated measurements


Faraway, Julian J. 2016. Extending the Linear Model with r: Generalized Linear, Mixed Effects and Nonparametric Regression Models. CRC press.
Liang, Kung-Yee, and Scott L Zeger. 1986. “Longitudinal Data Analysis Using Generalized Linear Models.” Biometrika 73 (1): 13–22.