## 8.1 Dependent Data

Forms of dependent data:

• Multivariate measurements on different individuals: (e.g., a person’s blood pressure, fat, etc are correlated)
• Clustered measurements: (e.g., blood pressure measurements of people in the same family can be correlated).
• Repeated measurements: (e.g., measurement of cholesterol over time can be correlated) “If data are collected repeatedly on experimental material to which treatments were applied initially, the data is a repeated measure.”
• Longitudinal data: (e.g., individual’s cholesterol tracked over time are correlated): “data collected repeatedly over time in an observational study are termed longitudinal.”
• Spatial data: (e.g., measurement of individuals living in the same neighborhood are correlated)

Hence, we like to account for these correlations.

Linear Mixed Model (LMM), also known as Mixed Linear Model has 2 components:

• Fixed effect (e.g, gender, age, diet, time)

• Random effects representing individual variation or auto correlation/spatial effects that imply dependent (correlated) errors

We choose to model the random subject-specific effect instead of including dummy subject covariates in our model because:

• reduction in the number of parameters to estimate
• when you do inference, it would make more sense that you can infer from a population (i.e., random effect).

LLM Motivation

In a repeated measurements analysis where $$Y_{ij}$$ is the response for the $$i$$-th individual measured at the $$j$$-th time,

$$i =1,…,N$$ ; $$j = 1,…,n_i$$

$\mathbf{Y}_i = \left( \begin{array} {c} Y_{i1} \\ . \\ .\\ .\\ Y_{in_i} \end{array} \right)$

is all measurements for subject $$i$$.

Stage 1: (Regression Model) how the response changes over time for the $$i$$-th subject

$\mathbf{Y_i = Z_i \beta_i + \epsilon_i}$

where

• $$Z_i$$ is an $$n_i \times q$$ matrix of known covariates
• $$\beta_i$$ is an unknown $$q \times 1$$ vector of subjective -specific coefficients (regression coefficients different for each subject)
• $$\epsilon_i$$ are the random errors (typically $$\sim N(0, \sigma^2 I)$$)

We notice that there are two many $$\beta$$ to estimate here. Hence, this is the motivation for the second stage

Stage 2: (Parameter Model)

$\mathbf{\beta_i = K_i \beta + b_i}$

where

• $$K_i$$ is a $$q \times p$$ matrix of known covariates
• $$\beta$$ is a $$p \times 1$$ vector of unknown parameter
• $$\mathbf{b}_i$$ are independent $$N(0,D)$$ random variables

This model explain the observed variability between subjects with respect to the subject-specific regression coefficients, $$\beta_i$$. We model our different coefficient ($$\beta_i$$) with respect to $$\beta$$.

Example:

Stage 1:

$Y_{ij} = \beta_{1i} + \beta_{2i}t_{ij} + \epsilon_{ij}$

where

• $$j = 1,..,n_i$$

In the matrix notation,

$\mathbf{Y_i} = \left( \begin{array} {c} Y_{i1} \\ .\\ Y_{in_i} \end{array} \right); \mathbf{Z}_i = \left( \begin{array} {cc} 1 & t_{i1} \\ . & . \\ 1 & t_{in_i} \end{array} \right)$

$\beta_i = \left( \begin{array} {c} \beta_{1i} \\ \beta_{2i} \end{array} \right); \epsilon_i = \left( \begin{array} {c} \epsilon_{i1} \\ . \\ \epsilon_{in_i} \end{array} \right)$

Thus,

$\mathbf{Y_i = Z_i \beta_i + \epsilon_i}$

Stage 2:

\begin{aligned} \beta_{1i} &= \beta_0 + b_{1i} \\ \beta_{2i} &= \beta_1 L_i + \beta_2 H_i + \beta_3 C_i + b_{2i} \end{aligned}

where $$L_i, H_i, C_i$$ are indicator variables defined to 1 as the subject falls into different categories.

Subject specific intercepts do not depend upon treatment, with $$\beta_0$$ (the average response at the start of treatment), and $$\beta_1 , \beta_2, \beta_3$$ (the average time effects for each of three treatment groups).

\begin{aligned} \mathbf{K}_i &= \left( \begin{array} {cccc} 1 & 0 & 0 & 0 \\ 0 & L_i & H_i & C_i \end{array} \right) \\ \beta &= (\beta_0 , \beta_1, \beta_2, \beta_3)' \\ \mathbf{b}_i &= \left( \begin{array} {c} b_{1i} \\ b_{2i} \\ \end{array} \right) \\ \beta_i &= \mathbf{K_i \beta + b_i} \end{aligned}

To get $$\hat{\beta}$$, we can fit the model sequentially:

1. Estimate $$\hat{\beta_i}$$ in the first stage
2. Estimate $$\hat{\beta}$$ in the second stage by replacing $$\beta_i$$ with $$\hat{\beta}_i$$

However, problems arise from this method:

• information is lost by summarizing the vector $$\mathbf{Y}_i$$ solely by $$\hat{\beta}_i$$
• we need to account for variability when replacing $$\beta_i$$ with its estimate
• different subjects might have different number of observations.

To address these problems, we can use Linear Mixed Model

Substituting stage 2 into stage 1:

$\mathbf{Y}_i = \mathbf{Z}_i \mathbf{K}_i \beta + \mathbf{Z}_i \mathbf{b}_i + \mathbf{\epsilon}_i$

Let $$\mathbf{X}_i = \mathbf{Z}_i \mathbf{K}_i$$ be an $$n_i \times p$$ matrix . Then, the LMM is

$\mathbf{Y}_i = \mathbf{X}_i \beta + \mathbf{Z}_i \mathbf{b}_i + \mathbf{\epsilon}_i$

where

• $$i = 1,..,N$$
• $$\beta$$ are the fixed effects, which are common to all subjects
• $$\mathbf{b}_i$$ are the subject specific random effects. $$\mathbf{b}_i \sim N_q (\mathbf{0,D})$$
• $$\mathbf{\epsilon}_i \sim N_{n_i}(\mathbf{0,\Sigma_i})$$
• $$\mathbf{b}_i$$ and $$\epsilon_i$$ are independent
• $$\mathbf{Z}_{i(n_i \times q})$$ and $$\mathbf{X}_{i(n_i \times p})$$ are matrices of known covariates.

Equivalently, in the hierarchical form, we call conditional or hierarchical formulation of the linear mixed model

\begin{aligned} \mathbf{Y}_i | \mathbf{b}_i &\sim N(\mathbf{X}_i \beta+ \mathbf{Z}_i \mathbf{b}_i, \mathbf{\Sigma}_i) \\ \mathbf{b}_i &\sim N(\mathbf{0,D}) \end{aligned}

for $$i = 1,..,N$$. denote the respective functions by $$f(\mathbf{Y}_i |\mathbf{b}_i)$$ and $$f(\mathbf{b}_i)$$

In general,

\begin{aligned} f(A,B) &= f(A|B)f(B) \\ f(A) &= \int f(A,B)dB = \int f(A|B) f(B) dB \end{aligned}

In the LMM, the marginal density of $$\mathbf{Y}_i$$ is

$f(\mathbf{Y}_i) = \int f(\mathbf{Y}_i | \mathbf{b}_i) f(\mathbf{b}_i) d\mathbf{b}_i$

which can be shown

$\mathbf{Y}_i \sim N(\mathbf{X_i \beta, Z_i DZ'_i + \Sigma_i})$

This is the marginal formulation of the linear mixed model

Notes:

We no longer have $$Z_i b_i$$ in the mean, but add error in the variance (marginal dependence in Y). kinda of averaging out the common effect. Technically, we shouldn’t call it averaging the error b (adding it to the variance covariance matrix), it should be called adding random effect

Continue with our example

$Y_{ij} = (\beta_0 + b_{1i}) + (\beta_1L_i + \beta_2 H_i + \beta_3 C_i + b_{2i})t_{ij} + \epsilon_{ij}$

for each treatment group

$Y_{ik}= \begin{cases} \beta_0 + b_{1i} + (\beta_1 + \ b_{2i})t_{ij} + \epsilon_{ij} & L \\ \beta_0 + b_{1i} + (\beta_2 + \ b_{2i})t_{ij} + \epsilon_{ij} & H\\ \beta_0 + b_{1i} + (\beta_3 + \ b_{2i})t_{ij} + \epsilon_{ij} & C \end{cases}$

• Intercepts and slopes are all subject specific
• Different treatment groups have different slops, but the same intercept.

In the hierarchical model form

\begin{aligned} \mathbf{Y}_i | \mathbf{b}_i &\sim N(\mathbf{X}_i \beta + \mathbf{Z}_i \mathbf{b}_i, \mathbf{\Sigma}_i)\\ \mathbf{b}_i &\sim N(\mathbf{0,D}) \end{aligned}

X will be in the form of

$\beta = (\beta_0, \beta_1, \beta_2, \beta_3)'$

\begin{aligned} \mathbf{X}_i &= \mathbf{Z}_i \mathbf{K}_i \\ &= \left[ \begin{array} {cc} 1 & t_{i1} \\ 1 & t_{i2} \\ . & . \\ 1 & t_{in_i} \end{array} \right] \times \left[ \begin{array} {cccc} 1 & 0 & 0 & 0 \\ 0 & L_i & H_i & C_i \\ \end{array} \right] \\ &= \left[ \begin{array} {cccc} 1 & t_{i1}L_i & t_{i1}H_i & T_{i1}C_i \\ 1 & t_{i2}L_i & t_{i2}H_i & T_{i2}C_i \\ . &. &. &. \\ 1 & t_{in_i}L_i & t_{in_i}H_i & T_{in_i}C_i \\ \end{array} \right]\end{aligned}

$\mathbf{b}_i = \left( \begin{array} {c} b_{1i} \\ b_{2i} \end{array} \right)$

$D = \left( \begin{array} {cc} d_{11} & d_{12}\\ d_{12} & d_{22} \end{array} \right)$

Assuming $$\mathbf{\Sigma}_i = \sigma^2 \mathbf{I}_{n_i}$$, which is called conditional independence, meaning the response on subject i are independent conditional on $$\mathbf{b}_i$$ and $$\beta$$

In the marginal model form

$Y_{ij} = \beta_0 + \beta_1 L_i t_{ij} + \beta_2 H_i t_{ij} + \beta_3 C_i t_{ij} + \eta_{ij}$

where $$\eta_i \sim N(\mathbf{0},\mathbf{Z}_i\mathbf{DZ}_i'+ \mathbf{\Sigma}_i)$$

Equivalently,

$\mathbf{Y_i \sim N(X_i \beta, Z_i DZ_i' + \Sigma_i})$

In this case that $$n_i = 2$$

\begin{aligned} \mathbf{Z_iDZ_i'} &= \left( \begin{array} {cc} 1 & t_{i1} \\ 1 & t_{i2} \end{array} \right) \left( \begin{array} {cc} d_{11} & d_{12} \\ d_{12} & d_{22} \end{array} \right) \left( \begin{array} {cc} 1 & 1 \\ t_{i1} & t_{i2} \end{array} \right) \\ &= \left( \begin{array} {cc} d_{11} + 2d_{12}t_{i1} + d_{22}t_{i1}^2 & d_{11} + d_{12}(t_{i1} + t_{i2}) + d_{22}t_{i1}t_{i2} \\ d_{11} + d_{12}(t_{i1} + t_{i2}) + d_{22} t_{i1} t_{i2} & d_{11} + 2d_{12}t_{i2} + d_{22}t_{i2}^2 \end{array} \right) \end{aligned}

$var(Y_{i1}) = d_{11} + 2d_{12}t_{i1} + d_{22} t_{i1}^2 + \sigma^2$

On top of correlation in the errors, the marginal implies that the variance function of the response is quadratic over time, with positive curvature $$d_{22}$$

### 8.1.1 Random-Intercepts Model

If we remove the random slopes,

• the assumption is that all variability in subject-specific slopes can be attributed to treatment differences
• the model is random-intercepts model. This has subject specific intercepts, but the same slopes within each treatment group.

\begin{aligned} \mathbf{Y}_i | b_i &\sim N(\mathbf{X}_i \beta + 1 b_i , \Sigma_i) \\ b_i &\sim N(0,d_{11}) \end{aligned}

The marginal model is then ($$\mathbf{\Sigma}_i = \sigma^2 \mathbf{I}$$)

$\mathbf{Y}_i \sim N(\mathbf{X}_i \beta, 11'd_{11} + \sigma^2 \mathbf{I})$

The marginal covariance matrix is

\begin{aligned} cov(\mathbf{Y}_i) &= 11'd_{11} + \sigma^2I \\ &= \left( \begin{array} {cccc} d_{11}+ \sigma^2 & d_{11} & ... & d_{11} \\ d_{11} & d_{11} + \sigma^2 & d_{11} & ... \\ . & . & . & . \\ d_{11} & ... & ... & d_{11} + \sigma^2 \end{array} \right) \end{aligned}

the associated correlation matrix is

$corr(\mathbf{Y}_i) = \left( \begin{array} {cccc} 1 & \rho & ... & \rho \\ \rho & 1 & \rho & ... \\ . & . & . & . \\ \rho & ... & ... & 1 \\ \end{array} \right)$

where $$\rho \equiv \frac{d_{11}}{d_{11} + \sigma^2}$$

Thu, we have

• constant variance over time
• equal, positive correlation between any two measurements from the same subject
• a covariance structure that is called compound symmetry, and $$\rho$$ is called the intra-class correlation
• that when $$\rho$$ is large, the inter-subject variability ($$d_{11}$$) is large relative to the intra-subject variability ($$\sigma^2$$)

### 8.1.2 Covariance Models

If the conditional independence assumption, ($$\mathbf{\Sigma_i= \sigma^2 I_{n_i}}$$). Consider, $$\epsilon_i = \epsilon_{(1)i} + \epsilon_{(2)i}$$, where

• $$\epsilon_{(1)i}$$ is a “serial correlation” component. That is, part of the individual’s profile is a response to time-varying stochastic processes.
• $$\epsilon_{(2)i}$$ is the measurement error component, and is independent of $$\epsilon_{(1)i}$$

Then

$\mathbf{Y_i = X_i \beta + Z_i b_i + \epsilon_{(1)i} + \epsilon_{(2)i}}$

where

• $$\mathbf{b_i} \sim N(\mathbf{0,D})$$

• $$\epsilon_{(2)i} \sim N(\mathbf{0,\sigma^2 I_{n_i}})$$

• $$\epsilon_{(1)i} \sim N(\mathbf{0,\tau^2H_i})$$

• $$\mathbf{b}_i$$ and $$\epsilon_i$$ are mutually independent

To model the structure of the $$n_i \times n_i$$ correlation (or covariance ) matrix $$\mathbf{H}_i$$. Let the (j,k)th element of $$\mathbf{H}_i$$ be $$h_{ijk}= g(t_{ij}t_{ik})$$. that is a function of the times $$t_{ij}$$ and $$t_{ik}$$ , which is assumed to be some function of the “distance’ between the times.

$h_{ijk} = g(|t_{ij}-t_{ik}|)$

for some decreasing function $$g(.)$$ with $$g(0)=1$$ (for correlation matrices).

Examples of this type of function:

• Exponential function: $$g(|t_{ij}-t_{ik}|) = \exp(-\phi|t_{ij} - t_{ik}|)$$
• Gaussian function: $$g(|t_{ij} - t_{ik}|) = \exp(-\phi(t_{ij} - t_{ik})^2)$$

Similar structures could also be used for $$\mathbf{D}$$ matrix (of $$\mathbf{b}$$)

Example: Autoregressive Covariance Structure

A first order Autoregressive Model (AR(1)) has the form

$\alpha_t = \phi \alpha_{t-1} + \eta_t$

where $$\eta_t \sim iid N (0,\sigma^2_\eta)$$

Then, the covariance between two observations is

$cov(\alpha_t, \alpha_{t+h}) = \frac{\sigma^2_\eta \phi^{|h|}}{1- \phi^2}$

for $$h = 0, \pm 1, \pm 2, ...; |\phi|<1$$

Hence,

$corr(\alpha_t, \alpha_{t+h}) = \phi^{|h|}$

If we let $$\alpha_T = (\alpha_1,...\alpha_T)'$$, then

$corr(\alpha_T) = \left[ \begin{array} {ccccc} 1 & \phi^1 & \phi^2 & ... & \phi^2 \\ \phi^1 & 1 & \phi^1 & ... & \phi^{T-1} \\ \phi^2 & \phi^1 & 1 & ... & \phi^{T-2} \\ . & . & . & . &. \\ \phi^T & \phi^{T-1} & \phi^{T-2} & ... & 1 \end{array} \right]$

Notes:

• The correlation decreases as time lag increases
• This matrix structure is known as a Toeplitz structure
• More complicated covariance structures are possible, which is critical component of spatial random effects models and time series models.
• Often, we don’t need both random effects $$\mathbf{b}$$ and $$\epsilon_{(1)i}$$

More in the Time Series section

### References

Laird, Nan M, and James H Ware. 1982. “Random-Effects Models for Longitudinal Data.” Biometrics, 963–74.
Schabenberger, Oliver, and Francis J Pierce. 2001. Contemporary Statistical Models for the Plant and Soil Sciences. CRC press.