8.1 Dependent Data

Forms of dependent data:

  • Multivariate measurements on different individuals: (e.g., a person’s blood pressure, fat, etc are correlated)
  • Clustered measurements: (e.g., blood pressure measurements of people in the same family can be correlated).
  • Repeated measurements: (e.g., measurement of cholesterol over time can be correlated) “If data are collected repeatedly on experimental material to which treatments were applied initially, the data is a repeated measure.” (Schabenberger and Pierce 2001)
  • Longitudinal data: (e.g., individual’s cholesterol tracked over time are correlated): “data collected repeatedly over time in an observational study are termed longitudinal.” (Schabenberger and Pierce 2001)
  • Spatial data: (e.g., measurement of individuals living in the same neighborhood are correlated)

Hence, we like to account for these correlations.

Linear Mixed Model (LMM), also known as Mixed Linear Model has 2 components:

  • Fixed effect (e.g, gender, age, diet, time)

  • Random effects representing individual variation or auto correlation/spatial effects that imply dependent (correlated) errors

Review Two-Way Mixed Effects ANOVA


We choose to model the random subject-specific effect instead of including dummy subject covariates in our model because:

  • reduction in the number of parameters to estimate
  • when you do inference, it would make more sense that you can infer from a population (i.e., random effect).


LLM Motivation

In a repeated measurements analysis where \(Y_{ij}\) is the response for the i-th individual measured at the j-th time,

\(i =1,…,N\) ; \(j = 1,…,n_i\)

\[ \mathbf{Y}_i = \left( \begin{array} {c} Y_{i1} \\ . \\ .\\ .\\ Y_{in_i} \end{array} \right) \]

is all measurements for subject i.

Stage 1: (Regression Model) how the response changes over time for the ith subject

\[ \mathbf{Y_i = Z_i \beta_i + \epsilon_i} \]

where

  • \(Z_i\) is an \(n_i \times q\) matrix of known covariates
  • \(\beta_i\) is an unknown q x 1 vector of subjective -specific coefficients (regression coefficients different for each subject)
  • \(\epsilon_i\) are the random errors (typically \(\sim N(0, \sigma^2 I)\))

We notice that there are two many \(\beta\) to estimate here. Hence, this is the motivation for the second stage

Stage 2: (Parameter Model)

\[ \mathbf{\beta_i = K_i \beta + b_i} \]

where

  • \(K_i\) is a q x p matrix of known covariates
  • \(\beta\) is a p x 1 vector of unknown parameter
  • \(\mathbf{b}_i\) are independent \(N(0,D)\) random variables

This model explain the observed variability between subjects with respect to the subject-specific regression coefficients, \(\beta_i\). We model our different coefficient (\(\beta_i\)) with respect to \(\beta\).

Example:

Stage 1:

\[ Y_{ij} = \beta_{1i} + \beta_{2i}t_{ij} + \epsilon_{ij} \]

where

  • \(j = 1,..,n_i\)

In the matrix notation,

\[ \mathbf{Y_i} = \left( \begin{array} {c} Y_{i1} \\ .\\ Y_{in_i} \end{array} \right) \]

\[ \mathbf{Z}_i = \left( \begin{array} {cc} 1 & t_{i1} \\ . & . \\ 1 & t_{in_i} \end{array} \right) \]

\[ \beta_i = \left( \begin{array} {c} \beta_{1i} \\ \beta_{2i} \end{array} \right) \]

\[ \epsilon_i = \left( \begin{array} {c} \epsilon_{i1} \\ . \\ \epsilon_{in_i} \end{array} \right) \]

Thus,

\[ \mathbf{Y_i = Z_i \beta_i + \epsilon_i} \]

Stage 2:

\[ \beta_{1i} = \beta_0 + b_{1i} \\ \beta_{2i} = \beta_1 L_i + \beta_2 H_i + \beta_3 C_i + b_{2i} \]

where \(L_i, H_i, C_i\) are indicator variables defined to 1 as the subject falls into different categories.

Subject specific intercepts do not depend upon treatment, with \(\beta_0\) (the average response at the start of treatment), and \(\beta_1 , \beta_2, \beta_3\) (the average time effects for each of three treatment groups).

\[ \mathbf{K}_i = \left( \begin{array} {cccc} 1 & 0 & 0 & 0 \\ 0 & L_i & H_i & C_i \end{array} \right) \\ \beta = (\beta_0 , \beta_1, \beta_2, \beta_3)' \\ \mathbf{b}_i = \left( \begin{array} {c} b_{1i} \\ b_{2i} \\ \end{array} \right) \\ \beta_i = \mathbf{K_i \beta + b_i} \]

To get \(\hat{\beta}\), we can fit the model sequentially:

  1. Estimate \(\hat{\beta_i}\) in the first stage
  2. Estimate \(\hat{\beta}\) in the second stage by replacing \(\beta_i\) with \(\hat{\beta}_i\)

However, problems arise from this method:

  • information is lost by summarizing the vector \(\mathbf{Y}_i\) solely by \(\hat{\beta}_i\)
  • we need to account for variability when replacing \(\beta_i\) with its estimate
  • different subjects might have different number of observations.

To address these problems, we can use Linear Mixed Model (Laird and Ware 1982)

Substituting stage 2 into stage 1:

\[ \mathbf{Y}_i = \mathbf{Z}_i \mathbf{K}_i \beta + \mathbf{Z}_i \mathbf{b}_i + \mathbf{\epsilon}_i \]

Let \(\mathbf{X}_i = \mathbf{Z}_i \mathbf{K}_i\) be an \(n_i \times p\) matrix . Then, the LMM is

\[ \mathbf{Y}_i = \mathbf{X}_i \beta + \mathbf{Z}_i \mathbf{b}_i + \mathbf{\epsilon}_i \]

where

  • \(i = 1,..,N\)
  • \(\beta\) are the fixed effects, which are common to all subjects
  • \(\mathbf{b}_i\) are the subject specific random effects. \(\mathbf{b}_i \sim N_q (\mathbf{0,D})\)
  • \(\mathbf{\epsilon}_i \sim N_{n_i}(\mathbf{0,\Sigma_i})\)
  • \(\mathbf{b}_i\) and \(\epsilon_i\) are independent
  • \(\mathbf{Z}_{i(n_i \times q})\) and \(\mathbf{X}_{i(n_i \times p})\) are matrices of known covariates.

Equivalently, in the hierarchical form, we call conditional or hierarchical formulation of the linear mixed model

\[ \mathbf{Y}_i | \mathbf{b}_i \sim N(\mathbf{X}_i \beta+ \mathbf{Z}_i \mathbf{b}_i, \mathbf{\Sigma}_i) \\ \mathbf{b}_i \sim N(\mathbf{0,D}) \]

for \(i = 1,..,N\). denote the respective functions by \(f(\mathbf{Y}_i |\mathbf{b}_i)\) and \(f(\mathbf{b}_i)\)

In general,

\[ f(A,B) = f(A|B)f(B) \\ f(A) = \int f(A,B)dB = \int f(A|B) f(B) dB \]

In the LMM, the marginal density of \(\mathbf{Y}_i\) is

\[ f(\mathbf{Y}_i) = \int f(\mathbf{Y}_i | \mathbf{b}_i) f(\mathbf{b}_i) d\mathbf{b}_i \]

which can be shown

\[ \mathbf{Y}_i \sim N(\mathbf{X_i \beta, Z_i DZ'_i + \Sigma_i}) \]

This is the marginal formulation of the linear mixed model

Notes:

We no longer have \(Z_i b_i\) in the mean, but add error in the variance (marginal dependence in Y). kinda of averaging out the common effect. Technically, we shouldn’t call it averaging the error b (adding it to the variance covariance matrix), it should be called adding random effect

Continue with our example

\[ Y_{ij} = (\beta_0 + b_{1i}) + (\beta_1L_i + \beta_2 H_i + \beta_3 C_i + b_{2i})t_{ij} + \epsilon_{ij} \]

for each treatment group

\[ Y_{ik}= \begin{cases} \beta_0 + b_{1i} + (\beta_1 + \ b_{2i})t_{ij} + \epsilon_{ij} & L \\ \beta_0 + b_{1i} + (\beta_2 + \ b_{2i})t_{ij} + \epsilon_{ij} & H\\ \beta_0 + b_{1i} + (\beta_3 + \ b_{2i})t_{ij} + \epsilon_{ij} & C \end{cases} \]

  • Intercepts and slopes are all subject specific
  • Different treatment groups have different slops, but the same intercept.

In the hierarchical model form

\[ \mathbf{Y}_i | \mathbf{b}_i \sim N(\mathbf{X}_i \beta + \mathbf{Z}_i \mathbf{b}_i, \mathbf{\Sigma}_i)\\ \mathbf{b}_i \sim N(\mathbf{0,D}) \]

X will be in the form of

\[ \mathbf{X}_i = \mathbf{Z}_i \mathbf{K}_i \\ = \left[ \begin{array} {cc} 1 & t_{i1} \\ 1 & t_{i2} \\ . & . \\ 1 & t_{in_i} \end{array} \right] \times \left[ \begin{array} {cccc} 1 & 0 & 0 & 0 \\ 0 & L_i & H_i & C_i \\ \end{array} \right] \\ = \left[ \begin{array} {cccc} 1 & t_{i1}L_i & t_{i1}H_i & T_{i1}C_i \\ 1 & t_{i2}L_i & t_{i2}H_i & T_{i2}C_i \\ . &. &. &. \\ 1 & t_{in_i}L_i & t_{in_i}H_i & T_{in_i}C_i \\ \end{array} \right] \]

\[ \beta = (\beta_0, \beta_1, \beta_2, \beta_3)' \\ \mathbf{b}_i = \left( \begin{array} {c} b_{1i} \\ b_{2i} \end{array} \right) , D = \left( \begin{array} {cc} d_{11} & d_{12}\\ d_{12} & d_{22} \end{array} \right) \]

Assuming \(\mathbf{\Sigma}_i = \sigma^2 \mathbf{I}_{n_i}\), which is called conditional independence, meaning the response on subject i are independent conditional on \(\mathbf{b}_i\) and \(\beta\)


In the marginal model form

\[ Y_{ij} = \beta_0 + \beta_1 L_i t_{ij} + \beta_2 H_i t_{ij} + \beta_3 C_i t_{ij} + \eta_{ij} \]

where \(\eta_i \sim N(\mathbf{0},\mathbf{Z}_i\mathbf{DZ}_i'+ \mathbf{\Sigma}_i)\)

Equivalently,

\[ \mathbf{Y_i \sim N(X_i \beta, Z_i DZ_i' + \Sigma_i}) \]

In this case that \(n_i = 2\)

\[ \mathbf{Z_iDZ_i'} = \left( \begin{array} {cc} 1 & t_{i1} \\ 1 & t_{i2} \end{array} \right) \left( \begin{array} {cc} d_{11} & d_{12} \\ d_{12} & d_{22} \end{array} \right) \left( \begin{array} {cc} 1 & 1 \\ t_{i1} & t_{i2} \end{array} \right) \\ = \left( \begin{array} {cc} d_{11} + 2d_{12}t_{i1} + d_{22}t_{i1}^2 & d_{11} + d_{12}(t_{i1} + t_{i2}) + d_{22}t_{i1}t_{i2} \\ d_{11} + d_{12}(t_{i1} + t_{i2}) + d_{22} t_{i1} t_{i2} & d_{11} + 2d_{12}t_{i2} + d_{22}t_{i2}^2 \end{array} \right) \]

\[ var(Y_{i1}) = d_{11} + 2d_{12}t_{i1} + d_{22} t_{i1}^2 + \sigma^2 \]

On top of correlation in the errors, the marginal implies that the variance function of the response is quadratic over time, with positive curvature \(d_{22}\)

8.1.1 Random-Intercepts Model

If we remove the random slopes,

  • the assumption is that all variability in subject-specific slopes can be attributed to treatment differences
  • the model is random-intercepts model. This has subject specific intercepts, but the same slopes within each treatment group.

\[ \mathbf{Y}_i | b_i \sim N(\mathbf{X}_i \beta + 1 b_i , \Sigma_i) \\ b_i \sim N(0,d_{11}) \]

The marginal model is then (\(\mathbf{\Sigma}_i = \sigma^2 \mathbf{I}\))

\[ \mathbf{Y}_i \sim N(\mathbf{X}_i \beta, 11'd_{11} + \sigma^2 \mathbf{I}) \]

The marginal covariance matrix is

\[ cov(\mathbf{Y}_i) = 11'd_{11} + \sigma^2I \\ = \left( \begin{array} {cccc} d_{11}+ \sigma^2 & d_{11} & ... & d_{11} \\ d_{11} & d_{11} + \sigma^2 & d_{11} & ... \\ . & . & . & . \\ d_{11} & ... & ... & d_{11} + \sigma^2 \end{array} \right) \]

the associated correlation matrix is

\[ corr(\mathbf{Y}_i) = \left( \begin{array} {cccc} 1 & \rho & ... & \rho \\ \rho & 1 & \rho & ... \\ . & . & . & . \\ \rho & ... & ... & 1 \\ \end{array} \right) \]

where \(\rho \equiv \frac{d_{11}}{d_{11} + \sigma^2}\)

Thu, we have

  • constant variance over time
  • equal, positive correlation between any two measurements from the same subject
  • a covariance structure that is called compound symmetry, and \(\rho\) is called the intra-class correlation
  • that when \(\rho\) is large, the inter-subject variability (\(d_{11}\)) is large relative to the intra-subject variability (\(\sigma^2\))

8.1.2 Covariance Models

If the conditional independence assumption, (\(\mathbf{\Sigma_i= \sigma^2 I_{n_i}}\)). Consider, \(\epsilon_i = \epsilon_{(1)i} + \epsilon_{(2)i}\), where

  • \(\epsilon_{(1)i}\) is a “serial correlation” component. That is, part of the individual’s profile is a response to time-varying stochastic processes.
  • \(\epsilon_{(2)i}\) is the measurement error component, and is independent of \(\epsilon_{(1)i}\)

Then

\[ \mathbf{Y_i = X_i \beta + Z_i b_i + \epsilon_{(1)i} + \epsilon_{(2)i}} \]

where

  • \(\mathbf{b_i} \sim N(\mathbf{0,D})\)

  • \(\epsilon_{(2)i} \sim N(\mathbf{0,\sigma^2 I_{n_i}})\)

  • \(\epsilon_{(1)i} \sim N(\mathbf{0,\tau^2H_i})\)

  • \(\mathbf{b}_i\) and \(\epsilon_i\) are mutually independent

To model the structure of the \(n_i \times n_i\) correlation (or covariance ) matrix \(\mathbf{H}_i\). Let the (j,k)th element of \(\mathbf{H}_i\) be \(h_{ijk}= g(t_{ij}t_{ik})\). that is a function of the times \(t_{ij}\) and \(t_{ik}\) , which is assumed to be some function of the "distance’ between the times.

\[ h_{ijk} = g(|t_{ij}-t_{ik}|) \]

for some decreasing function \(g(.)\) with \(g(0)=1\) (for correlation matrices).

Examples of this type of function:

  • Exponential function: \(g(|t_{ij}-t_{ik}|) = \exp(-\phi|t_{ij} - t_{ik}|)\)
  • Gaussian function: \(g(|t_{ij} - t_{ik}|) = \exp(-\phi(t_{ij} - t_{ik})^2)\)

Similar structures could also be used for \(\mathbf{D}\) matrix (of \(\mathbf{b}\))

Example: Autoregressive Covariance Structure

A first order Autoregressive Model (AR(1)) has the form

\[ \alpha_t = \phi \alpha_{t-1} + \eta_t \]

where \(\eta_t \sim iid N (0,\sigma^2_\eta)\)

Then, the covariance between two observations is

\[ cov(\alpha_t, \alpha_{t+h}) = \frac{\sigma^2_\eta \phi^{|h|}}{1- \phi^2} \]

for \(h = 0, \pm 1, \pm 2, ...; |\phi|<1\)

Hence,

\[ corr(\alpha_t, \alpha_{t+h}) = \phi^{|h|} \]

If we let \(\alpha_T = (\alpha_1,...\alpha_T)'\), then

\[ corr(\alpha_T) = \left[ \begin{array} {ccccc} 1 & \phi^1 & \phi^2 & ... & \phi^2 \\ \phi^1 & 1 & \phi^1 & ... & \phi^{T-1} \\ \phi^2 & \phi^1 & 1 & ... & \phi^{T-2} \\ . & . & . & . &. \\ \phi^T & \phi^{T-1} & \phi^{T-2} & ... & 1 \end{array} \right] \]

Notes:

  • The correlation decreases as time lag increases
  • This matrix structure is known as a Toeplitz structure
  • More complicated covariance structures are possible, which is critical component of spatial random effects models and time series models.
  • Often, we don’t need both random effects \(\mathbf{b}\) and \(\epsilon_{(1)i}\)

More in the Time Series section