8.1 Dependent Data
Forms of dependent data:
- Multivariate measurements on different individuals: (e.g., a person’s blood pressure, fat, etc are correlated)
- Clustered measurements: (e.g., blood pressure measurements of people in the same family can be correlated).
- Repeated measurements: (e.g., measurement of cholesterol over time can be correlated) “If data are collected repeatedly on experimental material to which treatments were applied initially, the data is a repeated measure.” (Schabenberger and Pierce 2001)
- Longitudinal data: (e.g., individual’s cholesterol tracked over time are correlated): “data collected repeatedly over time in an observational study are termed longitudinal.” (Schabenberger and Pierce 2001)
- Spatial data: (e.g., measurement of individuals living in the same neighborhood are correlated)
Hence, we like to account for these correlations.
Linear Mixed Model (LMM), also known as Mixed Linear Model has 2 components:
Fixed effect (e.g, gender, age, diet, time)
Random effects representing individual variation or auto correlation/spatial effects that imply dependent (correlated) errors
Review Two-Way Mixed Effects ANOVA
We choose to model the random subject-specific effect instead of including dummy subject covariates in our model because:
- reduction in the number of parameters to estimate
- when you do inference, it would make more sense that you can infer from a population (i.e., random effect).
LLM Motivation
In a repeated measurements analysis where \(Y_{ij}\) is the response for the \(i\)-th individual measured at the \(j\)-th time,
\(i =1,…,N\) ; \(j = 1,…,n_i\)
\[ \mathbf{Y}_i = \left( \begin{array} {c} Y_{i1} \\ . \\ .\\ .\\ Y_{in_i} \end{array} \right) \]
is all measurements for subject \(i\).
Stage 1: (Regression Model) how the response changes over time for the \(i\)-th subject
\[ \mathbf{Y_i = Z_i \beta_i + \epsilon_i} \]
where
- \(Z_i\) is an \(n_i \times q\) matrix of known covariates
- \(\beta_i\) is an unknown \(q \times 1\) vector of subjective -specific coefficients (regression coefficients different for each subject)
- \(\epsilon_i\) are the random errors (typically \(\sim N(0, \sigma^2 I)\))
We notice that there are two many \(\beta\) to estimate here. Hence, this is the motivation for the second stage
Stage 2: (Parameter Model)
\[ \mathbf{\beta_i = K_i \beta + b_i} \]
where
- \(K_i\) is a \(q \times p\) matrix of known covariates
- \(\beta\) is a \(p \times 1\) vector of unknown parameter
- \(\mathbf{b}_i\) are independent \(N(0,D)\) random variables
This model explain the observed variability between subjects with respect to the subject-specific regression coefficients, \(\beta_i\). We model our different coefficient (\(\beta_i\)) with respect to \(\beta\).
Example:
Stage 1:
\[ Y_{ij} = \beta_{1i} + \beta_{2i}t_{ij} + \epsilon_{ij} \]
where
- \(j = 1,..,n_i\)
In the matrix notation,
\[ \mathbf{Y_i} = \left( \begin{array} {c} Y_{i1} \\ .\\ Y_{in_i} \end{array} \right); \mathbf{Z}_i = \left( \begin{array} {cc} 1 & t_{i1} \\ . & . \\ 1 & t_{in_i} \end{array} \right) \]
\[ \beta_i = \left( \begin{array} {c} \beta_{1i} \\ \beta_{2i} \end{array} \right); \epsilon_i = \left( \begin{array} {c} \epsilon_{i1} \\ . \\ \epsilon_{in_i} \end{array} \right) \]
Thus,
\[ \mathbf{Y_i = Z_i \beta_i + \epsilon_i} \]
Stage 2:
\[ \begin{aligned} \beta_{1i} &= \beta_0 + b_{1i} \\ \beta_{2i} &= \beta_1 L_i + \beta_2 H_i + \beta_3 C_i + b_{2i} \end{aligned} \]
where \(L_i, H_i, C_i\) are indicator variables defined to 1 as the subject falls into different categories.
Subject specific intercepts do not depend upon treatment, with \(\beta_0\) (the average response at the start of treatment), and \(\beta_1 , \beta_2, \beta_3\) (the average time effects for each of three treatment groups).
\[ \begin{aligned} \mathbf{K}_i &= \left( \begin{array} {cccc} 1 & 0 & 0 & 0 \\ 0 & L_i & H_i & C_i \end{array} \right) \\ \beta &= (\beta_0 , \beta_1, \beta_2, \beta_3)' \\ \mathbf{b}_i &= \left( \begin{array} {c} b_{1i} \\ b_{2i} \\ \end{array} \right) \\ \beta_i &= \mathbf{K_i \beta + b_i} \end{aligned} \]
To get \(\hat{\beta}\), we can fit the model sequentially:
- Estimate \(\hat{\beta_i}\) in the first stage
- Estimate \(\hat{\beta}\) in the second stage by replacing \(\beta_i\) with \(\hat{\beta}_i\)
However, problems arise from this method:
- information is lost by summarizing the vector \(\mathbf{Y}_i\) solely by \(\hat{\beta}_i\)
- we need to account for variability when replacing \(\beta_i\) with its estimate
- different subjects might have different number of observations.
To address these problems, we can use Linear Mixed Model (Laird and Ware 1982)
Substituting stage 2 into stage 1:
\[ \mathbf{Y}_i = \mathbf{Z}_i \mathbf{K}_i \beta + \mathbf{Z}_i \mathbf{b}_i + \mathbf{\epsilon}_i \]
Let \(\mathbf{X}_i = \mathbf{Z}_i \mathbf{K}_i\) be an \(n_i \times p\) matrix . Then, the LMM is
\[ \mathbf{Y}_i = \mathbf{X}_i \beta + \mathbf{Z}_i \mathbf{b}_i + \mathbf{\epsilon}_i \]
where
- \(i = 1,..,N\)
- \(\beta\) are the fixed effects, which are common to all subjects
- \(\mathbf{b}_i\) are the subject specific random effects. \(\mathbf{b}_i \sim N_q (\mathbf{0,D})\)
- \(\mathbf{\epsilon}_i \sim N_{n_i}(\mathbf{0,\Sigma_i})\)
- \(\mathbf{b}_i\) and \(\epsilon_i\) are independent
- \(\mathbf{Z}_{i(n_i \times q})\) and \(\mathbf{X}_{i(n_i \times p})\) are matrices of known covariates.
Equivalently, in the hierarchical form, we call conditional or hierarchical formulation of the linear mixed model
\[ \begin{aligned} \mathbf{Y}_i | \mathbf{b}_i &\sim N(\mathbf{X}_i \beta+ \mathbf{Z}_i \mathbf{b}_i, \mathbf{\Sigma}_i) \\ \mathbf{b}_i &\sim N(\mathbf{0,D}) \end{aligned} \]
for \(i = 1,..,N\). denote the respective functions by \(f(\mathbf{Y}_i |\mathbf{b}_i)\) and \(f(\mathbf{b}_i)\)
In general,
\[ \begin{aligned} f(A,B) &= f(A|B)f(B) \\ f(A) &= \int f(A,B)dB = \int f(A|B) f(B) dB \end{aligned} \]
In the LMM, the marginal density of \(\mathbf{Y}_i\) is
\[ f(\mathbf{Y}_i) = \int f(\mathbf{Y}_i | \mathbf{b}_i) f(\mathbf{b}_i) d\mathbf{b}_i \]
which can be shown
\[ \mathbf{Y}_i \sim N(\mathbf{X_i \beta, Z_i DZ'_i + \Sigma_i}) \]
This is the marginal formulation of the linear mixed model
Notes:
We no longer have \(Z_i b_i\) in the mean, but add error in the variance (marginal dependence in Y). kinda of averaging out the common effect. Technically, we shouldn’t call it averaging the error b (adding it to the variance covariance matrix), it should be called adding random effect
Continue with our example
\[ Y_{ij} = (\beta_0 + b_{1i}) + (\beta_1L_i + \beta_2 H_i + \beta_3 C_i + b_{2i})t_{ij} + \epsilon_{ij} \]
for each treatment group
\[ Y_{ik}= \begin{cases} \beta_0 + b_{1i} + (\beta_1 + \ b_{2i})t_{ij} + \epsilon_{ij} & L \\ \beta_0 + b_{1i} + (\beta_2 + \ b_{2i})t_{ij} + \epsilon_{ij} & H\\ \beta_0 + b_{1i} + (\beta_3 + \ b_{2i})t_{ij} + \epsilon_{ij} & C \end{cases} \]
- Intercepts and slopes are all subject specific
- Different treatment groups have different slops, but the same intercept.
In the hierarchical model form
\[ \begin{aligned} \mathbf{Y}_i | \mathbf{b}_i &\sim N(\mathbf{X}_i \beta + \mathbf{Z}_i \mathbf{b}_i, \mathbf{\Sigma}_i)\\ \mathbf{b}_i &\sim N(\mathbf{0,D}) \end{aligned} \]
X will be in the form of
\[ \beta = (\beta_0, \beta_1, \beta_2, \beta_3)' \]
\[ \begin{aligned} \mathbf{X}_i &= \mathbf{Z}_i \mathbf{K}_i \\ &= \left[ \begin{array} {cc} 1 & t_{i1} \\ 1 & t_{i2} \\ . & . \\ 1 & t_{in_i} \end{array} \right] \times \left[ \begin{array} {cccc} 1 & 0 & 0 & 0 \\ 0 & L_i & H_i & C_i \\ \end{array} \right] \\ &= \left[ \begin{array} {cccc} 1 & t_{i1}L_i & t_{i1}H_i & T_{i1}C_i \\ 1 & t_{i2}L_i & t_{i2}H_i & T_{i2}C_i \\ . &. &. &. \\ 1 & t_{in_i}L_i & t_{in_i}H_i & T_{in_i}C_i \\ \end{array} \right]\end{aligned} \]
\[ \mathbf{b}_i = \left( \begin{array} {c} b_{1i} \\ b_{2i} \end{array} \right) \]
\[ D = \left( \begin{array} {cc} d_{11} & d_{12}\\ d_{12} & d_{22} \end{array} \right) \]
Assuming \(\mathbf{\Sigma}_i = \sigma^2 \mathbf{I}_{n_i}\), which is called conditional independence, meaning the response on subject i are independent conditional on \(\mathbf{b}_i\) and \(\beta\)
In the marginal model form
\[ Y_{ij} = \beta_0 + \beta_1 L_i t_{ij} + \beta_2 H_i t_{ij} + \beta_3 C_i t_{ij} + \eta_{ij} \]
where \(\eta_i \sim N(\mathbf{0},\mathbf{Z}_i\mathbf{DZ}_i'+ \mathbf{\Sigma}_i)\)
Equivalently,
\[ \mathbf{Y_i \sim N(X_i \beta, Z_i DZ_i' + \Sigma_i}) \]
In this case that \(n_i = 2\)
\[ \begin{aligned} \mathbf{Z_iDZ_i'} &= \left( \begin{array} {cc} 1 & t_{i1} \\ 1 & t_{i2} \end{array} \right) \left( \begin{array} {cc} d_{11} & d_{12} \\ d_{12} & d_{22} \end{array} \right) \left( \begin{array} {cc} 1 & 1 \\ t_{i1} & t_{i2} \end{array} \right) \\ &= \left( \begin{array} {cc} d_{11} + 2d_{12}t_{i1} + d_{22}t_{i1}^2 & d_{11} + d_{12}(t_{i1} + t_{i2}) + d_{22}t_{i1}t_{i2} \\ d_{11} + d_{12}(t_{i1} + t_{i2}) + d_{22} t_{i1} t_{i2} & d_{11} + 2d_{12}t_{i2} + d_{22}t_{i2}^2 \end{array} \right) \end{aligned} \]
\[ var(Y_{i1}) = d_{11} + 2d_{12}t_{i1} + d_{22} t_{i1}^2 + \sigma^2 \]
On top of correlation in the errors, the marginal implies that the variance function of the response is quadratic over time, with positive curvature \(d_{22}\)
8.1.1 Random-Intercepts Model
If we remove the random slopes,
- the assumption is that all variability in subject-specific slopes can be attributed to treatment differences
- the model is random-intercepts model. This has subject specific intercepts, but the same slopes within each treatment group.
\[ \begin{aligned} \mathbf{Y}_i | b_i &\sim N(\mathbf{X}_i \beta + 1 b_i , \Sigma_i) \\ b_i &\sim N(0,d_{11}) \end{aligned} \]
The marginal model is then (\(\mathbf{\Sigma}_i = \sigma^2 \mathbf{I}\))
\[ \mathbf{Y}_i \sim N(\mathbf{X}_i \beta, 11'd_{11} + \sigma^2 \mathbf{I}) \]
The marginal covariance matrix is
\[ \begin{aligned} cov(\mathbf{Y}_i) &= 11'd_{11} + \sigma^2I \\ &= \left( \begin{array} {cccc} d_{11}+ \sigma^2 & d_{11} & ... & d_{11} \\ d_{11} & d_{11} + \sigma^2 & d_{11} & ... \\ . & . & . & . \\ d_{11} & ... & ... & d_{11} + \sigma^2 \end{array} \right) \end{aligned} \]
the associated correlation matrix is
\[ corr(\mathbf{Y}_i) = \left( \begin{array} {cccc} 1 & \rho & ... & \rho \\ \rho & 1 & \rho & ... \\ . & . & . & . \\ \rho & ... & ... & 1 \\ \end{array} \right) \]
where \(\rho \equiv \frac{d_{11}}{d_{11} + \sigma^2}\)
Thu, we have
- constant variance over time
- equal, positive correlation between any two measurements from the same subject
- a covariance structure that is called compound symmetry, and \(\rho\) is called the intra-class correlation
- that when \(\rho\) is large, the inter-subject variability (\(d_{11}\)) is large relative to the intra-subject variability (\(\sigma^2\))
8.1.2 Covariance Models
If the conditional independence assumption, (\(\mathbf{\Sigma_i= \sigma^2 I_{n_i}}\)). Consider, \(\epsilon_i = \epsilon_{(1)i} + \epsilon_{(2)i}\), where
- \(\epsilon_{(1)i}\) is a “serial correlation” component. That is, part of the individual’s profile is a response to time-varying stochastic processes.
- \(\epsilon_{(2)i}\) is the measurement error component, and is independent of \(\epsilon_{(1)i}\)
Then
\[ \mathbf{Y_i = X_i \beta + Z_i b_i + \epsilon_{(1)i} + \epsilon_{(2)i}} \]
where
\(\mathbf{b_i} \sim N(\mathbf{0,D})\)
\(\epsilon_{(2)i} \sim N(\mathbf{0,\sigma^2 I_{n_i}})\)
\(\epsilon_{(1)i} \sim N(\mathbf{0,\tau^2H_i})\)
\(\mathbf{b}_i\) and \(\epsilon_i\) are mutually independent
To model the structure of the \(n_i \times n_i\) correlation (or covariance ) matrix \(\mathbf{H}_i\). Let the (j,k)th element of \(\mathbf{H}_i\) be \(h_{ijk}= g(t_{ij}t_{ik})\). that is a function of the times \(t_{ij}\) and \(t_{ik}\) , which is assumed to be some function of the “distance’ between the times.
\[ h_{ijk} = g(|t_{ij}-t_{ik}|) \]
for some decreasing function \(g(.)\) with \(g(0)=1\) (for correlation matrices).
Examples of this type of function:
- Exponential function: \(g(|t_{ij}-t_{ik}|) = \exp(-\phi|t_{ij} - t_{ik}|)\)
- Gaussian function: \(g(|t_{ij} - t_{ik}|) = \exp(-\phi(t_{ij} - t_{ik})^2)\)
Similar structures could also be used for \(\mathbf{D}\) matrix (of \(\mathbf{b}\))
Example: Autoregressive Covariance Structure
A first order Autoregressive Model (AR(1)) has the form
\[ \alpha_t = \phi \alpha_{t-1} + \eta_t \]
where \(\eta_t \sim iid N (0,\sigma^2_\eta)\)
Then, the covariance between two observations is
\[ cov(\alpha_t, \alpha_{t+h}) = \frac{\sigma^2_\eta \phi^{|h|}}{1- \phi^2} \]
for \(h = 0, \pm 1, \pm 2, ...; |\phi|<1\)
Hence,
\[ corr(\alpha_t, \alpha_{t+h}) = \phi^{|h|} \]
If we let \(\alpha_T = (\alpha_1,...\alpha_T)'\), then
\[ corr(\alpha_T) = \left[ \begin{array} {ccccc} 1 & \phi^1 & \phi^2 & ... & \phi^2 \\ \phi^1 & 1 & \phi^1 & ... & \phi^{T-1} \\ \phi^2 & \phi^1 & 1 & ... & \phi^{T-2} \\ . & . & . & . &. \\ \phi^T & \phi^{T-1} & \phi^{T-2} & ... & 1 \end{array} \right] \]
Notes:
- The correlation decreases as time lag increases
- This matrix structure is known as a Toeplitz structure
- More complicated covariance structures are possible, which is critical component of spatial random effects models and time series models.
- Often, we don’t need both random effects \(\mathbf{b}\) and \(\epsilon_{(1)i}\)
More in the Time Series section