9.4 Marginal Properties of GLMMs

9.4.1 Marginal Mean of $y_i$

The marginal mean is obtained by integrating over the distribution of the random effects:

$E(y_i) = E_{\boldsymbol{\alpha}}(E(y_i \mid \boldsymbol{\alpha})) = E_{\boldsymbol{\alpha}}(\mu_i) = E\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right)$

Since $g^{-1}(\cdot)$ is nonlinear, this expectation cannot be simplified further without specific distributional assumptions.

9.4.1.1 Special Case: Log Link Function

For a log-link function, $g(\mu) = \log(\mu)$ , the inverse link is the exponential function:

$E(y_i) = E\left(\exp(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right)$

Using properties of the moment-generating function (MGF):

$E(y_i) = \exp(\mathbf{x}_i' \boldsymbol{\beta}) \cdot E\left(\exp(\mathbf{z}_i' \boldsymbol{\alpha})\right)$

Here, $E(\exp(\mathbf{z}_i' \boldsymbol{\alpha}))$ is the MGF of $\boldsymbol{\alpha}$ evaluated at $\mathbf{z}_i$ .

9.4.2 Marginal Variance of $y_i$

The variance decomposition formula applies:

$\begin{aligned} \operatorname{Var}(y_i) &= \operatorname{Var}_{\boldsymbol{\alpha}}\left(E(y_i \mid \boldsymbol{\alpha})\right) + E_{\boldsymbol{\alpha}}\left(\operatorname{Var}(y_i \mid \boldsymbol{\alpha})\right) \\ &= \operatorname{Var}(\mu_i) + E\left(a(\phi) V(\mu_i)\right) \end{aligned}$

Expressed explicitly:

$\operatorname{Var}(y_i) = \operatorname{Var}\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right) + E\left(a(\phi) V\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha})\right)\right)$

Without specific assumptions about $g(\cdot)$ and the distribution of $\boldsymbol{\alpha}$ , this is the most general form.

9.4.3 Marginal Covariance of $\mathbf{y}$

Random effects induce correlation between observations within the same cluster. The covariance between $y_i$ and $y_j$ is:

$\begin{aligned} \operatorname{Cov}(y_i, y_j) &= \operatorname{Cov}_{\boldsymbol{\alpha}}\left(E(y_i \mid \boldsymbol{\alpha}), E(y_j \mid \boldsymbol{\alpha})\right) + E_{\boldsymbol{\alpha}}\left(\operatorname{Cov}(y_i, y_j \mid \boldsymbol{\alpha})\right) \\ &= \operatorname{Cov}(\mu_i, \mu_j) + E(0) \\ &= \operatorname{Cov}\left(g^{-1}(\mathbf{x}_i' \boldsymbol{\beta} + \mathbf{z}_i' \boldsymbol{\alpha}), g^{-1}(\mathbf{x}_j' \boldsymbol{\beta} + \mathbf{z}_j' \boldsymbol{\alpha})\right) \end{aligned}$

The second term vanishes when $y_i$ and $y_j$ are conditionally independent given $\boldsymbol{\alpha}$ . This dependency structure is a hallmark of mixed models.

Example: Repeated Measurements with a Poisson GLMM

Consider repeated count measurements for subjects:

Let $y_{ij}$ be the $j$ -th count for subject $i$ .
Assume $y_{ij} \mid \alpha_i \sim \text{independent } \text{Poisson}(\mu_{ij})$ .

The model is specified as:

$\log(\mu_{ij}) = \mathbf{x}_{ij}' \boldsymbol{\beta} + \alpha_i$

where:

$\alpha_i \sim \text{i.i.d. } N(0, \sigma^2_{\alpha})$ represents subject-specific random effects,
This is a log-link GLMM with random intercepts for subjects.

The inclusion of $\alpha_i$ accounts for subject-level heterogeneity, capturing unobserved variability across individuals.