Generalized Estimating Equations (GEE): extend regression models to handle correlated response data while still targeting population-level associations. GEE does not require a full likelihood specification; instead, it uses estimating equations and a working correlation structure to improve efficiency. Both WLS and GLS can be seen as special cases of the more general GEE framework. GEE extends these ideas beyond linear models to handle correlated data with non-Gaussian outcomes through a quasi-likelihood approach.
- Longitudinal data with repeated measures
- Clustered outcomes (e.g., patients within hospitals, students within schools)
- Multilevel or correlated designs
- Extends to non-normal data.
For example, suppose we have repeated binary outcomes \(y_{it}\) for subject \(i\) at time \(t\):
\[
\text{logit}\{\Pr(y_{it}=1)\} = \mathbf{x}_{it}^\top \beta
\]
If we fit this with standard logistic regression, we would incorrectly assume independence across time within each subject. GEE addresses this by introducing a working covariance matrix \(\mathbf{V}_i\) for subject \(i\) that accounts for correlation between repeated measures:
\[
\mathbf{V}_i = \mathbf{A}_i^{1/2}\,\mathbf{R}(\alpha)\,\mathbf{A}_i^{1/2},
\]
where \(\mathbf{A}_i\) is a diagonal variance matrix (from the mean model) and \(\mathbf{R}(\alpha)\) is a correlation structure (e.g., exchangeable, AR(1)).
We are interested in learning the average effect of covariates on the outcome across the population, while accounting for within-subject correlation. GEE provides a way to solve for \(\beta\) using the estimating equation:
\[
U(\beta) = \sum_i D_i^\top V_i^{-1}(y_i - \mu_i) = 0,
\]
where \(D_i = \partial \mu_i/\partial \beta^\top\).
Motivating Example: A Toy Example
Suppose we collect three measurements, two from the same subject and one from a different subject. Because two measurements come from the same subject, they might be correlated, while the third is independent. How should we combine these observations to estimate the overall population mean?
Consider a data \(y_1, y_2, y_3\). Mathematically, we can represent this with a covariance matrix where \(y_1\) and \(y_2\) are correlated, while \(y_3\) is independent.
\[
\begin{pmatrix}
y_1 \\
y_2 \\
y_3
\end{pmatrix} \sim N \left(\begin{pmatrix}
\mu\\
\mu \\
\mu\end{pmatrix}),
\Sigma = \begin{bmatrix}
\sigma^2 & R & 0\\
R & \sigma^2 & 0 \\
0 & 0 & \sigma^2 \end{bmatrix}
\right)
\]
Things to note: In this example, \(y_1\) and \(y_2\) come from the same cluster and may be correlated \(R \neq 0\).
If we simply take the average of the three values, we still get an unbiased estimate of the population mean \(\hat{\mu} = \frac{1}{3}(y_1 + y_2 + y_3)\). But when \(y_1\) and \(y_2\) are correlated, this average does not make the best use of the data—the variance of the estimator depends on the correlation \(R\). The above estimator has variance:
\[
[\frac{1}{3},\frac{1}{3}, \frac{1}{3}]\Sigma[\frac{1}{3},\frac{1}{3}, \frac{1}{3}]^T = \frac{3\sigma^2 + 2R}{9}
\] So we can see \(R \uparrow\) then \(\text{variance} \uparrow\).
In a general context, we wish to find a set of weights \(\mathbf{w} = (w_1, w_2, w_3)\) such that the weights meet these conditions:
- \(\sum_k w_k = 1\) (\(\hat{\mu}\) is unbiased.
- \(w_k>0\) for all \(k\).
- The variance of the estimator:
\[
\begin{align*}
var(\hat{\mu}) &= [\frac{1}{w},\frac{1}{w}, \frac{1}{w}]\Sigma[\frac{1}{w},\frac{1}{w}, \frac{1}{w}]^T\\
& = [\frac{1}{w},\frac{1}{w}, \frac{1}{w}]
\begin{bmatrix}
\sigma^2 & R & 0\\
R & \sigma^2 & 0\\
0 & 0 & \sigma^2 \end{bmatrix}
[\frac{1}{w},\frac{1}{w}, \frac{1}{w}]^T\\
& = [w_1 \sigma^2 + w_2, w_1 R + w_2, w_2 \sigma^2] \cdot [\frac{1}{w},\frac{1}{w}, \frac{1}{w}]^T\\
& = (w_1^2 + w_2^2 + w_3^2)\sigma^2 + 2 w_1 w_2 R
\end{align*}
\] When R = 0, minimum variance when \(w_1 = w_2 = w_3\) (the simple sample average). But this is not the case when \(R \neq 0\). So let’s look at some other options:
- Simple average:
\[
\mathbf{w} = \left[\frac{1}{3},\frac{1}{3}, \frac{1}{3}\right]
\]
- Use only one of \(y_1\) or \(y_2\):
\[
\mathbf{w} = \left[\frac{1}{2},0, \frac{1}{2}\right]
\]
- Secret: assuming \(R\) and \(\sigma^2\) are known.
\[
\mathbf{w} = \left( \frac{1}{3 + R/\sigma^2},\frac{1}{3 + R/\sigma^2}, \frac{1 + R/\sigma^2}{3 + R/\sigma^2}\right)
\]
Toy example: efficiency under different weighting schemes
\(\left[\tfrac{1}{3},\tfrac{1}{3},\tfrac{1}{3}\right]\) |
\(\tfrac{3\sigma^2+2R}{9}\) |
\(\tfrac{1}{3}\sigma^2\) |
\(\tfrac{4}{9}\sigma^2\) |
\(\tfrac{5}{9}\sigma^2\) |
\(\left[\tfrac{1}{4},\tfrac{1}{4},\tfrac{1}{2}\right]\) |
\(\tfrac{3}{8}\sigma^2+\tfrac{1}{8}R\) |
\(\tfrac{3}{8}\sigma^2\) |
\(\tfrac{7}{16}\sigma^2\) |
\(\tfrac{1}{2}\sigma^2\) |
\(\left[\tfrac{1}{3+R/\sigma^2},\tfrac{1}{3+R/\sigma^2},\tfrac{1+R/\sigma^2}{3+R/\sigma^2}\right]\) |
\(\tfrac{(1+R/\sigma^2)}{3+R/\sigma^2}\,\sigma^2\) |
\(\tfrac{1}{3}\sigma^2\) |
\(\tfrac{3}{7}\sigma^2\) |
\(\tfrac{1}{2}\sigma^2\) |
Notes.
- Scenario 3 comes from minimizing \(\operatorname{Var}(\hat\mu)\) subject to \(\sum w_k=1\), which yields \((w_1=w_2=\sigma^2/(R+3\sigma^2)), (w_3=(R+\sigma^2)/(R+3^\sigma2))\) and \(\operatorname{Var}(\hat\mu)=\sigma^2(R+\sigma^2)/(R+3\sigma^2)\).
- Scenario 3 (Secret Optimal Weights) have the smallest standard error for the complete range of \(R\) values.
- Scenario 3 = Scenario 1 when \(R=0\) (independence).
Estimating Equations for \(\hat{\mu}\)
In the toy example with three observations \(y_1, y_2, y_3\), we define an estimator of the population mean as a weighted average:
\[
\hat{\mu} = \sum_{k=1}^3 w_k y_k, \qquad \sum_{k=1}^3 w_k = 1.
\]
Each estimator arises as the solution to an estimating equation of the form
\[
U(\mu) = \sum_{k=1}^3 w_k (y_k - \mu) = 0.
\]
Solving \(U(\mu)=0\) gives \(\hat{\mu} = \sum w_k y_k\).
The difference between estimators lies in the choice of weights \(w_k\).