8.7 Unbalanced or Unequally Spaced Data
In many real-world applications, data are unbalanced (i.e., different numbers of observations per subject) or unequally spaced over time. This is common in longitudinal studies, clinical trials, and business analytics where subjects may be observed at irregular intervals or miss certain time points.
Mixed-effects models are flexible enough to handle such data structures, especially when we carefully model the variance-covariance structure of repeated measurements.
Consider the following mixed-effects model:
Yikt=β0+β0i+β1t+β1it+β2t2+β2it2+ϵikt
Where:
Yikt = Response for the k-th subject in the i-th group at time t.
i=1,2 = Groups (e.g., treatment vs. control).
k=1,…,ni = Individuals within group i.
t=(t1,t2,t3,t4) = Time points (which may be unequally spaced).
Model Components:
- Fixed Effects:
- β0 = Overall intercept (baseline).
- β1 = Common linear time trend.
- β2 = Common quadratic time trend.
- Random Effects:
- β0i = Random intercept for group i (captures group-specific baseline variation).
- β1i = Random slope for time in group i (captures group-specific linear trends).
- β2i = Random quadratic effect for group i (captures group-specific curvature over time).
- Residual Error:
- ϵikt∼N(0,σ2) = Measurement error, assumed independent of the random effects.
8.7.1 Variance-Covariance Structure: Power Model
Since observations are taken at unequally spaced time points, we cannot rely on simple structures like compound symmetry or AR(1). Instead, we use a power covariance model, which allows the correlation to depend on the distance between time points.
The variance-covariance matrix of the repeated measurements for subject k in group i is:
Σik=σ2(1ρ|t2−t1|ρ|t3−t1|ρ|t4−t1|ρ|t2−t1|1ρ|t3−t2|ρ|t4−t2|ρ|t3−t1|ρ|t3−t2|1ρ|t4−t3|ρ|t4−t1|ρ|t4−t2|ρ|t4−t3|1)
Where:
σ2 = Residual variance.
ρ = Correlation parameter (0<|ρ|<1), controlling how correlation decays with increasing time gaps.
|tj−ti| = Absolute time difference between measurements at times ti and tj.
Key Characteristics:
- The correlation between observations decreases as the time difference increases, similar to AR(1), but flexible enough to handle unequal time intervals.
- This structure is sometimes referred to as a continuous-time autoregressive model or power covariance model.
After fitting the full model, we can evaluate whether all terms are necessary, focusing on the random effects:
β0i (Random Intercepts):
Is there significant baseline variability between groups?β1i (Random Slopes for Time):
Do groups exhibit different linear trends over time?β2i (Random Quadratic Terms):
Is there group-specific curvature in the response over time?
Model Comparison Approach:
Fit the Full Model:
Includes all random effects.Fit Reduced Models:
Systematically remove random effects (e.g., quadratic terms) to create simpler models.Compare Models Using:
Likelihood Ratio Tests (LRT):
Test whether the more complex model significantly improves fit.Information Criteria (AIC, BIC):
Lower values indicate a better trade-off between fit and complexity.
Assess Random Effects:
- Use the exactRLRT test to determine if random effects are significant.
- Check variance estimates: if the variance of a random effect is near zero, it may not be necessary.
In matrix notation, the model can be written as:
\mathbf{Y} = \mathbf{X} \boldsymbol{\beta} + \mathbf{Z} \mathbf{b} + \boldsymbol{\epsilon}
Where:
\mathbf{Y} = Vector of observed responses.
\mathbf{X} = Design matrix for fixed effects (intercept, time, time², and group interactions).
\boldsymbol{\beta} = Vector of fixed-effect coefficients.
\mathbf{Z} = Design matrix for random effects (random intercepts, slopes, etc.).
\mathbf{b} \sim N(0, \mathbf{G}) = Vector of random effects with covariance matrix \mathbf{G}.
\boldsymbol{\epsilon} \sim N(0, \mathbf{R}) = Vector of residual errors with covariance matrix \mathbf{R}, where \mathbf{R} follows the power covariance structure.