8.7 Unbalanced or Unequally Spaced Data

In many real-world applications, data are unbalanced (i.e., different numbers of observations per subject) or unequally spaced over time. This is common in longitudinal studies, clinical trials, and business analytics where subjects may be observed at irregular intervals or miss certain time points.

Mixed-effects models are flexible enough to handle such data structures, especially when we carefully model the variance-covariance structure of repeated measurements.

Consider the following mixed-effects model:

\[ Y_{ikt} = \beta_0 + \beta_{0i} + \beta_{1} t + \beta_{1i} t + \beta_{2} t^2 + \beta_{2i} t^2 + \epsilon_{ikt} \]

Where:

  • \(Y_{ikt}\) = Response for the \(k\)-th subject in the \(i\)-th group at time \(t\).

  • \(i = 1, 2\) = Groups (e.g., treatment vs. control).

  • \(k = 1, \dots, n_i\) = Individuals within group \(i\).

  • \(t = (t_1, t_2, t_3, t_4)\) = Time points (which may be unequally spaced).

Model Components:

  • Fixed Effects:
    • \(\beta_0\) = Overall intercept (baseline).
    • \(\beta_1\) = Common linear time trend.
    • \(\beta_2\) = Common quadratic time trend.
  • Random Effects:
    • \(\beta_{0i}\) = Random intercept for group \(i\) (captures group-specific baseline variation).
    • \(\beta_{1i}\) = Random slope for time in group \(i\) (captures group-specific linear trends).
    • \(\beta_{2i}\) = Random quadratic effect for group \(i\) (captures group-specific curvature over time).
  • Residual Error:
    • \(\epsilon_{ikt} \sim N(0, \sigma^2)\) = Measurement error, assumed independent of the random effects.

8.7.1 Variance-Covariance Structure: Power Model

Since observations are taken at unequally spaced time points, we cannot rely on simple structures like compound symmetry or AR(1). Instead, we use a power covariance model, which allows the correlation to depend on the distance between time points.

The variance-covariance matrix of the repeated measurements for subject \(k\) in group \(i\) is:

\[ \mathbf{\Sigma}_{ik} = \sigma^2 \begin{pmatrix} 1 & \rho^{|t_2 - t_1|} & \rho^{|t_3 - t_1|} & \rho^{|t_4 - t_1|} \\ \rho^{|t_2 - t_1|} & 1 & \rho^{|t_3 - t_2|} & \rho^{|t_4 - t_2|} \\ \rho^{|t_3 - t_1|} & \rho^{|t_3 - t_2|} & 1 & \rho^{|t_4 - t_3|} \\ \rho^{|t_4 - t_1|} & \rho^{|t_4 - t_2|} & \rho^{|t_4 - t_3|} & 1 \end{pmatrix} \]

Where:

  • \(\sigma^2\) = Residual variance.

  • \(\rho\) = Correlation parameter (\(0 < |\rho| < 1\)), controlling how correlation decays with increasing time gaps.

  • \(|t_j - t_i|\) = Absolute time difference between measurements at times \(t_i\) and \(t_j\).

Key Characteristics:

  • The correlation between observations decreases as the time difference increases, similar to AR(1), but flexible enough to handle unequal time intervals.
  • This structure is sometimes referred to as a continuous-time autoregressive model or power covariance model.

After fitting the full model, we can evaluate whether all terms are necessary, focusing on the random effects:

  • \(\beta_{0i}\) (Random Intercepts):
    Is there significant baseline variability between groups?

  • \(\beta_{1i}\) (Random Slopes for Time):
    Do groups exhibit different linear trends over time?

  • \(\beta_{2i}\) (Random Quadratic Terms):
    Is there group-specific curvature in the response over time?

Model Comparison Approach:

  1. Fit the Full Model:
    Includes all random effects.

  2. Fit Reduced Models:
    Systematically remove random effects (e.g., quadratic terms) to create simpler models.

  3. Compare Models Using:

    • Likelihood Ratio Tests (LRT):
      Test whether the more complex model significantly improves fit.

    • Information Criteria (AIC, BIC):
      Lower values indicate a better trade-off between fit and complexity.

  4. Assess Random Effects:

    • Use the exactRLRT test to determine if random effects are significant.
    • Check variance estimates: if the variance of a random effect is near zero, it may not be necessary.

In matrix notation, the model can be written as:

\[ \mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \mathbf{b} + \boldsymbol{\epsilon} \]

Where:

  • \(\mathbf{Y}\) = Vector of observed responses.

  • \(\mathbf{X}\) = Design matrix for fixed effects (intercept, time, time², and group interactions).

  • \(\boldsymbol{\beta}\) = Vector of fixed-effect coefficients.

  • \(\mathbf{Z}\) = Design matrix for random effects (random intercepts, slopes, etc.).

  • \(\mathbf{b} \sim N(0, \mathbf{G})\) = Vector of random effects with covariance matrix \(\mathbf{G}\).

  • \(\boldsymbol{\epsilon} \sim N(0, \mathbf{R})\) = Vector of residual errors with covariance matrix \(\mathbf{R}\), where \(\mathbf{R}\) follows the power covariance structure.