11 경시적 자료를 위한 기본적인 선형 모형

Acknowledgement: 본 절의 구성에는 다음 교재를 발췌하였다.

주 출처

이근백 (2022). 경시적 자료분석 - R 활용. 자유아카데미. - 4장 연속형 경시적 자료분석을 위한 선형모형

보조 출처

김양진 (2020). R과 SAS를 활용한 경시적 자료분석. 자유아카데미. - 3장 반복측정 ANOVA 모형

11.1 자료분석 예제: 납노출 아동 치료 연구 자료

어린이에게 납노출은 인지기능 손상 위험, 예전 집페인트는 납성분 함유
석신산염(succimer): 신규 경구약 (먹는 치료제)
납노출 아동 치료 연구(Treatment of lead-exposed children trial: TLC trial), 1990년
- 대상: 혈중 납농도가 \(20 \sim 44 \mu \mathrm{~g} / \mathrm{dL}\)인 12–33개월령 어린이, 교재에서는 100명
- succimer 치료그룹 / 위약 통제그룹 임의실험 연구
- 반응변수 \(y_{ij}\): 0, 1, 4, 6주차의 혈중 납 수준(\(j=1,2,3,4\))
- \(y_{ij}\): 개체(어린이) \(i\) 의 \(j\) 번째 주기에 속한 혈중 납 수준
연구 목적
- 1. 두 그룹의 혈중 납수준이 차이나는가?
- 1. 두 그룹의 혈중 납수준 차이가 주차가 증가함에 따라 증가하는가?

11.1.1 데이터 전처리와 탐색

데이터 로드

library(psych)
tlc = read.table("data/tlc.txt", header=F)
colnames(tlc) = c("ID", "Group", "week.0", "week.1", "week.4", "week.6")
str(tlc) # wide-format data

'data.frame':   100 obs. of  6 variables:
 $ ID    : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Group : chr  "P" "A" "A" "P" ...
 $ week.0: num  30.8 26.5 25.8 24.7 20.4 20.4 28.6 33.7 19.7 31.1 ...
 $ week.1: num  26.9 14.8 23 24.5 2.8 5.4 20.8 31.6 14.9 31.2 ...
 $ week.4: num  25.8 19.5 19.1 22 3.2 4.5 19.2 28.5 15.3 29.2 ...
 $ week.6: num  23.8 21 23.2 22.5 9.4 11.9 18.4 25.1 14.7 30.1 ...

# convert into long-format data
library(reshape2)
tlcl = reshape(tlc, direction="long",
  varying = list(c("week.0", "week.1", "week.4", "week.6")),
  v.names = "lead", 
  timevar = "week", 
  times = c(0,1,4,6), 
  idvar = "ID")
head(tlcl) # P: placebo, A: action (treatment) applied

    ID Group week lead
1.0  1     P    0 30.8
2.0  2     A    0 26.5
3.0  3     A    0 25.8
4.0  4     P    0 24.7
5.0  5     A    0 20.4
6.0  6     A    0 20.4

Boxplot, spaghetti plots
대조군(P): 점진적 감소
처치군(A): 1주 차에 급격 감소 후 다시 증가, 대조군에 비해 변동폭이 큼

library(ggplot2)


Attaching package: 'ggplot2'

The following objects are masked from 'package:psych':

    %+%, alpha

## LLM prompt example
# 나: 내가 R에서 그림을 하나 그릴 건데 나좀 도와줘. 데이터는 이렇게 생겼어.
# > head(tlcl, 20)
#      ID Group week lead
# 1.0   1     P    0 30.8
# 2.0   2     A    0 26.5
# 3.0   3     A    0 25.8
# 4.0   4     P    0 24.7
# 5.0   5     A    0 20.4
# 6.0   6     A    0 20.4
# 7.0   7     P    0 28.6
# 8.0   8     P    0 33.7
# 9.0   9     P    0 19.7
# 10.0 10     P    0 31.1
# 11.0 11     P    0 19.8
# 12.0 12     A    0 24.8
# 13.0 13     P    0 21.4
# 14.0 14     A    0 27.9
# 15.0 15     P    0 21.1
# 16.0 16     P    0 20.6
# 17.0 17     P    0 24.0
# 18.0 18     P    0 37.6
# 19.0 19     A    0 35.3
#
#
# 나: 그룹, 시점별로 boxplot을 그려줘

# 그룹과 주별 boxplot
ggplot(tlcl, aes(x = factor(week), y = lead, fill = Group)) +
  geom_boxplot() +
  labs(title = "Weekly Boxplot of Lead Values by Group",
       x = "Week",
       y = "Lead") +
  theme_minimal() +
  scale_fill_manual(values = c("P" = "skyblue", "A" = "salmon"))

# LLM prompt example: 좋아. 이번엔 spagetti plot을 그려보자. 전체적인 트렌드 변화가 아니라 개개인별 트렌드의 변화를 보기 위함이야.
# 그룹과 주별 spaghetti plot
ggplot(tlcl, aes(x = week, y = lead, group = ID, color = Group)) +
  geom_line(alpha = 0.5) +
  labs(title = "Individual Lead Value Trends by Week (Spaghetti Plot)",
       x = "Week",
       y = "Lead") +
  theme_minimal() +
  scale_color_manual(values = c("P" = "skyblue", "A" = "salmon"))

# LLM prompt example: 두 그룹에 대해서 panel을 분리해줘
# 그룹별 패널을 분리한 스파게티 플롯
ggplot(tlcl, aes(x = week, y = lead, group = ID)) +
  geom_line(aes(color = Group), alpha = 0.5) +
  labs(title = "Individual Lead Value Trends by Week (Spaghetti Plot)",
       x = "Week",
       y = "Lead") +
  theme_minimal() +
  scale_color_manual(values = c("P" = "skyblue", "A" = "salmon")) +
  facet_wrap(~ Group)

다중 상관관계 탐색
- Insight: 반복 측정의 상관관계를 설명하는 모형이 필요함

library(psych)
pairs.panels(tlc[ ,c("week.0", "week.1", "week.4", "week.6")])

pairs.panels(tlc[tlc$Group=="A" ,c("week.0", "week.1", "week.4", "week.6")])

pairs.panels(tlc[tlc$Group=="P" ,c("week.0", "week.1", "week.4", "week.6")])

11.2 반복측정의 상관관계를 무시하고 선형회귀모형 적합하기

모든 반복측정자료가 독립이라는 가정하에서 회귀분석을 시행해 보자.

모형식
- \(\mathbb{I}({\rm week}_{ij} =1)\), \(=4)\), \(=6)\): 개체 \(i\)의 측정시점 \(j\)가 각각 1주차, 4주차, 6주차인지 여부에 대한 지시함수(indicator function)
- \(A_{i}\): 개체 \(i\)가 실험군이면 1, 대조군이면 0
가정: \(\epsilon_{ij} \stackrel{\rm iid}{\sim} N(0,\sigma^2)\)

\[ \begin{aligned} Y_{i j} &= \beta_{0}+\beta_{1} \mathbb{I}({\rm week}_{ij} =1) +\beta_{2} \mathbb{I}({\rm week}_{ij} =4)+\beta_{3} \mathbb{I}({\rm week}_{ij} = 6)+\beta_{4} A_{i} \\ & ~~ +\beta_{5} A_{i} \mathbb{I}({\rm week}_{ij} =1) +\beta_{6} A_{i} \mathbb{I}({\rm week}_{ij} =4) +\beta_{7} A_{i} \mathbb{I}({\rm week}_{ij} = 6) + \epsilon_{ij} \end{aligned} \tag{1} \]

적합 결과 (*는 유의수준 5%에서 통계적으로 유의함을 뜻함)
- 교호작용 항들이 모두 유의함.
- lm에서 지원하는 모형 진단 코드들 \(\to\) 모형이 잘 설명하지 못하는 이상수치(outlier)가 존재함을 확인할 수 있음

\[ \begin{aligned} \widehat{E}\left(Y_{i j}\right) & = 26.27^{*} - 1.61 \mathbb{I}({\rm week}_{ij} =1) -2.20 \mathbb{I}({\rm week}_{ij} =4) - 2.63^{*} \mathbb{I}({\rm week}_{ij} = 6) + 0.27 A_{i} \\ & ~~-11.41^{*} A_{i} \mathbb{I}({\rm week}_{ij} =1) -8.82^{*} A_{i} \mathbb{I}({\rm week}_{ij} =4) -3.15^{*} A_{i} \mathbb{I}({\rm week}_{ij} = 6) \end{aligned} \]

코드

# just linear regression!

# make referecne category for "P"
tlcl$Group.f =relevel(factor(tlcl$Group), ref="P")

tlc.lm = lm(lead ~ factor(week)*Group.f, data=tlcl)
summary(tlc.lm)


Call:
lm(formula = lead ~ factor(week) * Group.f, data = tlcl)

Residuals:
    Min      1Q  Median      3Q     Max 
-16.662  -4.621  -0.993   3.673  43.138 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)              26.272      0.937  28.038  < 2e-16 ***
factor(week)1            -1.612      1.325  -1.216   0.2245    
factor(week)4            -2.202      1.325  -1.662   0.0974 .  
factor(week)6            -2.626      1.325  -1.982   0.0482 *  
Group.fA                  0.268      1.325   0.202   0.8398    
factor(week)1:Group.fA  -11.406      1.874  -6.086 2.75e-09 ***
factor(week)4:Group.fA   -8.824      1.874  -4.709 3.47e-06 ***
factor(week)6:Group.fA   -3.152      1.874  -1.682   0.0934 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.626 on 392 degrees of freedom
Multiple R-squared:  0.3284,    Adjusted R-squared:  0.3164 
F-statistic: 27.38 on 7 and 392 DF,  p-value: < 2.2e-16

# anova(tlc.lm)
par(mfrow=c(2,2))
plot(tlc.lm)

11.3 반복측정의 상관관계를 고려한 회귀분석 1: 일반 선형 모형 (general linear model) 기반 방법

11.3.1 개요

일반화선형모형 (Generalized linear model)과는 지칭하는 모형 명이 다름에 주의!
식 1과 모형은 동일하나, error term에 대한 가정이 완화된다. 각 개채 \(i\)에 대하여, \(\boldsymbol{\epsilon}_i = (\epsilon_{i1}, \epsilon_{i2}, \ldots, \epsilon_{iK})^T\)라 하면, (\(K\): 최대 관찰 횟수), 일반선형모형은 다음처럼 특수한 구조를 가진 공분산 모형들을 허용한다.
- 교환가능 상관계수(exchangeable correlation) 또는 복합대칭 (compound symmetry): \[ {\rm Var}(\boldsymbol{\epsilon}_i) = \sigma^2 \times \begin{bmatrix} 1 & \rho & \rho & \cdots & \rho \\ \rho & 1 & \rho & \cdots & \rho \\ \vdots & \vdots & & \ddots & \vdots \\ \rho & \rho & & \cdots & 1 \end{bmatrix} \]
- 1차 자기회귀 (first-order autoregressive, or AR(1)) \[ {\rm Var}(\boldsymbol{\epsilon}_i) = \sigma^2 \times \begin{bmatrix} 1 & \rho & \rho^2 & \cdots & \rho^{K-1} \\ \rho & 1 & \rho & \cdots & \rho^{K-2} \\ \vdots & \vdots & & \ddots & \vdots \\ \rho^{K-1} & \rho^{K-2} & & \cdots & 1 \end{bmatrix} \]
- 띠 상관계수 (banded correlation) \[ {\rm Var}(\boldsymbol{\epsilon}_i) = \sigma^2 \times \begin{bmatrix} 1 & \rho & 0 & \cdots & 0 \\ \rho & 1 & \rho & \cdots & \rho \\ \vdots & \vdots & & \ddots & \vdots \\ 0 & 0 & & \cdots & 1 \end{bmatrix} \]
모수 추정 방법: 최대가능도추정법(maximum likelihood, ML) 또는 제한된 최대가능도추정법(restricted maximum likelihood, REML)이 있다.
- 두 추정법은 sample size가 무한히 커지면 서로 차이가 없게 되나, finite sample에서는 다르다.
- REML 추정법은 ML추정량이 가지는 편향(bias)를 보정하는 이점이 있다.

11.3.2 적합 결과

ML 추정에 의한 계수
- 앞 결과와 비슷하나 세부적으로는 차이가 있음, 차이점은 \(\mathbb{I}({\rm week}_{ij} =4)\) 항이 유의해짐 \[ \begin{aligned} \widehat{E}\left(Y_{i j}\right) & = 26.27^{*} - 1.61 \mathbb{I}({\rm week}_{ij} =1) - 2.20^{*} \mathbb{I}({\rm week}_{ij} =4) - 2.63^{*} \mathbb{I}({\rm week}_{ij} = 6) + 0.27 A_{i} \\ & ~~-11.41^{*} A_{i} \mathbb{I}({\rm week}_{ij} =1) -8.82^{*} A_{i} \mathbb{I}({\rm week}_{ij} =4) -3.15^{*} A_{i} \mathbb{I}({\rm week}_{ij} = 6) \end{aligned} \]
두 군에 대하여 모형식을 따로 보면 아래와 같음. 1, 4, 6주차에 실험군에서 혈중 납 수준이 더 낮고, 그 차이는 유의함.
- 대조군: \(\hat{y}_{i j} = 26.27-1.61 \mathbb{I}({\rm week}_{ij} =1)-2.20 \mathbb{I}({\rm week}_{ij} =4)-2.63 \mathbb{I}({\rm week}_{ij} =6)\)
- 실험군: \(\hat{y}_{i j} = 26.54-13.02 \mathbb{I}({\rm week}_{ij} =1)-11.02 \mathbb{I}({\rm week}_{ij} =4)-5.78 \mathbb{I}({\rm week}_{ij} =6)\)

#General linear model using ML & REML
library(nlme)

# sorting data by ID and week
o <- order(tlcl$Group, tlcl$ID, tlcl$week)
tlcl <- tlcl[o,]

# ML estimation
tlc.ml = gls(lead ~ factor(week)*Group.f, data=tlcl, method="ML",
    correlation = corCompSymm(form = ~ 1 | ID))
# "~ 1": 관측치들 사이의 등분산을 가정함
# "|ID": 반복 측정된 데이터의 그룹화 변수가 ID임을 알려줌
summary(tlc.ml)

Generalized least squares fit by maximum likelihood
  Model: lead ~ factor(week) * Group.f 
  Data: tlcl 
       AIC      BIC    logLik
  2490.822 2530.736 -1235.411

Correlation Structure: Compound symmetry
 Formula: ~1 | ID 
 Parameter estimate(s):
      Rho 
0.5954401 

Coefficients:
                         Value Std.Error   t-value p-value
(Intercept)             26.272 0.9370175 28.037898  0.0000
factor(week)1           -1.612 0.8428574 -1.912542  0.0565
factor(week)4           -2.202 0.8428574 -2.612542  0.0093
factor(week)6           -2.626 0.8428574 -3.115592  0.0020
Group.fA                 0.268 1.3251428  0.202242  0.8398
factor(week)1:Group.fA -11.406 1.1919804 -9.568950  0.0000
factor(week)4:Group.fA  -8.824 1.1919804 -7.402807  0.0000
factor(week)6:Group.fA  -3.152 1.1919804 -2.644339  0.0085

 Correlation: 
                       (Intr) fct()1 fct()4 fct()6 Grp.fA f()1:G f()4:G
factor(week)1          -0.450                                          
factor(week)4          -0.450  0.500                                   
factor(week)6          -0.450  0.500  0.500                            
Group.fA               -0.707  0.318  0.318  0.318                     
factor(week)1:Group.fA  0.318 -0.707 -0.354 -0.354 -0.450              
factor(week)4:Group.fA  0.318 -0.354 -0.707 -0.354 -0.450  0.500       
factor(week)6:Group.fA  0.318 -0.354 -0.354 -0.707 -0.450  0.500  0.500

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.5402789 -0.7044388 -0.1513922  0.5599072  6.5767945 

Residual standard error: 6.559122 
Degrees of freedom: 400 total; 392 residual

잔차 분석

# model diagnosis

# 목적: 각 ID별로 잔차를 시각화하여 특정 ID에서 비정상적으로 큰 잔차가 있는지 확인
# resid(.)~ID: 잔차를 ID별로 플로팅,  특정 관측치가 모델과 큰 차이를 보이는지 확인 가능
# id=0.01: 잔차가 상위 1%에 속하는 이상치에 ID를 표시
# 어떤 ID가 큰 잔차를 가지면, 그 데이터가 모델에 잘 맞지 않거나 비정상적인 패턴을 가질 가능성이 있음.
plot(tlc.ml, resid(.)~ID, id=0.01)

# 목적: week별로 잔차를 시각화하여 시간에 따른 잔차의 변화를 Group별로 확인합니다. 이는 시간에 따라 잔차가 특정 방향으로 편향되는지 확인하는 데 유용합니다.
# 설명:
# resid(.)~week | Group: Group별로 week에 따른 잔차를 플로팅합니다.
# abline=0: 기준선(y = 0)을 추가하여 잔차의 편향을 시각화합니다.
# id=~ID==40: ID가 40인 관측치의 잔차를 강조 표시합니다. 특정 ID의 잔차가 어떻게 분포되는지 확인할 수 있습니다.
# 해석: 잔차가 특정 week에서 일정한 패턴이나 추세를 보인다면, 이는 모델이 시간에 따라 데이터의 패턴을 제대로 반영하지 못하고 있음을 나타낼 수 있습니다.
plot(tlc.ml,resid(.)~week | Group, abline=0, id=~ID==40)

# 목적: 예측된 값(fitted(.))에 대한 잔차를 플로팅하여 Group별로 적합도 검토 및 패턴을 확인합니다.
# 설명:
# resid(.,type="p")~fitted(.) | Group: 예측된 값에 대한 피어슨 잔차(Pearson residuals)를 Group별로 플로팅합니다.
# id=~ID==40: ID가 40인 관측치의 잔차를 강조 표시합니다.
# 해석: 이상적인 경우 잔차는 예측된 값과 무관하게 무작위로 분포해야 합니다. 잔차가 특정 패턴을 보이면, 모델에 비선형성이나 오차의 이분산성(heteroscedasticity) 등의 문제가 있을 수 있습니다.
plot(tlc.ml,resid(.,type="p")~fitted(.) | Group, id=~ID==40)

REML 추정에 의한 계수 \[ \begin{aligned} \widehat{E}\left(Y_{i j}\right) & = 26.27^{*} - 1.61 \mathbb{I}({\rm week}_{ij} =1) -2.20 \mathbb{I}({\rm week}_{ij} =4) - 2.63^{*} \mathbb{I}({\rm week}_{ij} = 6) + 0.27 A_{i} \\ & ~~-11.41^{*} A_{i} \mathbb{I}({\rm week}_{ij} =1) -8.82^{*} A_{i} \mathbb{I}({\rm week}_{ij} =4) -3.15^{*} A_{i} \mathbb{I}({\rm week}_{ij} = 6) \end{aligned} \]

# REML
tlc.reml <- gls(lead ~ factor(week)*Group.f, data=tlcl, method="REML",
    correlation=corCompSymm(form = ~1|ID))
summary(tlc.reml)

Generalized least squares fit by REML
  Model: lead ~ factor(week) * Group.f 
  Data: tlcl 
       AIC      BIC    logLik
  2480.621 2520.334 -1230.311

Correlation Structure: Compound symmetry
 Formula: ~1 | ID 
 Parameter estimate(s):
      Rho 
0.5954401 

Coefficients:
                         Value Std.Error   t-value p-value
(Intercept)             26.272 0.9370175 28.037898  0.0000
factor(week)1           -1.612 0.8428574 -1.912542  0.0565
factor(week)4           -2.202 0.8428574 -2.612542  0.0093
factor(week)6           -2.626 0.8428574 -3.115592  0.0020
Group.fA                 0.268 1.3251428  0.202242  0.8398
factor(week)1:Group.fA -11.406 1.1919804 -9.568950  0.0000
factor(week)4:Group.fA  -8.824 1.1919804 -7.402807  0.0000
factor(week)6:Group.fA  -3.152 1.1919804 -2.644339  0.0085

 Correlation: 
                       (Intr) fct()1 fct()4 fct()6 Grp.fA f()1:G f()4:G
factor(week)1          -0.450                                          
factor(week)4          -0.450  0.500                                   
factor(week)6          -0.450  0.500  0.500                            
Group.fA               -0.707  0.318  0.318  0.318                     
factor(week)1:Group.fA  0.318 -0.707 -0.354 -0.354 -0.450              
factor(week)4:Group.fA  0.318 -0.354 -0.707 -0.354 -0.450  0.500       
factor(week)6:Group.fA  0.318 -0.354 -0.354 -0.707 -0.450  0.500  0.500

Standardized residuals:
       Min         Q1        Med         Q3        Max 
-2.5147478 -0.6973588 -0.1498706  0.5542799  6.5106944 

Residual standard error: 6.625714 
Degrees of freedom: 400 total; 392 residual

11.4 반복측정의 상관관계를 고려한 회귀분석 2: 일반화 추정 방정식 (generalized estimating equation, GEE) 기반 방법

11.4.1 개요

위의 방법 일반 선형 모형 + (RE)ML 방법은 공분산 행렬에 대한 가정이 올바르다는 전제 하에서 모수를 추정한다. 만약 가정이 성립하지 않으면, \(\widehat{\beta}\)의 표준오차 계산이 정확하지 않을 수 있다. (즉 유의한/비유의한 변수를 잘못 검출할 수 있다.)
대신 일반화 추정 방정식 (generalized estimating equation, GEE) 방법에서 공분산 행렬에 대한 “샌드위치 강건 공분산”(sandwich robust variance) 추정량을 이용하면, 공분산 모형 가정에 강건한(robust) 표준오차 계산이 가능하다.
주의점은, \(\widehat{\beta}\)의 계산 과정에서 가상관행렬(working correlation matrix)의 지정을 필요로 한다. 어떠한 가상관행렬 구조도 상관없으나, 가상관행렬이 error의 true 상관행렬 \({\rm Var}(\boldsymbol{\epsilon}_i)\)와 가까울수록 결과적인 \(\widehat{\beta}\)의 분산이 줄어든다. (좋아진다는 뜻이다) 대표적인 옵션은 아래 세 개이다.
- Independence (단위행렬)
- Exchangeable (교환가능 상관계수)
- Unstructured (표본 공분산)

11.4.2 적합 결과

추정 계수 \[ \begin{aligned} \widehat{E}\left(Y_{i j}\right) & = 26.27^{*} - 1.61^{*} \mathbb{I}({\rm week}_{ij} =1) -2.20^{*} \mathbb{I}({\rm week}_{ij} =4) - 2.63^{*} \mathbb{I}({\rm week}_{ij} = 6) + 0.27 A_{i} \\ & ~~-11.41^{*} A_{i} \mathbb{I}({\rm week}_{ij} =1) -8.82^{*} A_{i} \mathbb{I}({\rm week}_{ij} =4) -3.15^{*} A_{i} \mathbb{I}({\rm week}_{ij} = 6) \end{aligned} \]
세부 적합 결과
- Naive S.E.: 모형기반 표준오차(model-based S.E.)
  - GEE 가상관행렬 옵션에서 ’independece’을 선택 하여 나온 결과의 Naive S.E.는 맨 위의 개체내 상관관계를 무시한 회귀분석에서 나온 S.E.와 동일함 (정확히 같다. 왜냐면 가정과 추정방법이 동일해지기 때문)
  - ‘exchangeable’을 선택하여 나온 결과의 Naive S.E.는 위의 일반선형 모형에서 ’corCompSymm’ 옵션을 지정한 최대가능도 방법 결과와 동일함 (역시 가정과 추정방법이 동일해지기 때문에 정확히 같은 결과이다)
- Robust S.E.는 샌드위치 추정량을 이용한 강전한 표준오차를 의미. 가상관행렬 옵션지정에 상관없이 결과는 동일함.
- Naive S.E.의 값보다는 Robust S.E.의 값이 대체로 작음. 즉 더 좁은 신뢰구간을 얻을 수 있음 (통계적으로 효율적efficient임)

# General linear model using GEE
library(gee)
# independence working correlation
tlcl.gee.indep = gee(lead ~ factor(week)*Group.f, id=ID,
  corstr="independence", data=tlcl)

Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27

running glm to get initial regression estimate

           (Intercept)          factor(week)1          factor(week)4 
                26.272                 -1.612                 -2.202 
         factor(week)6               Group.fA factor(week)1:Group.fA 
                -2.626                  0.268                -11.406 
factor(week)4:Group.fA factor(week)6:Group.fA 
                -8.824                 -3.152

summary(tlcl.gee.indep)


 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
 Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Independent 

Call:
gee(formula = lead ~ factor(week) * Group.f, id = ID, data = tlcl, 
    corstr = "independence")

Summary of Residuals:
     Min       1Q   Median       3Q      Max 
-16.6620  -4.6205  -0.9930   3.6725  43.1380 


Coefficients:
                       Estimate Naive S.E.    Naive z Robust S.E.    Robust z
(Intercept)              26.272  0.9370175 28.0378980   0.7033749  37.3513444
factor(week)1            -1.612  1.3251428 -1.2164727   0.4330325  -3.7225846
factor(week)4            -2.202  1.3251428 -1.6617077   0.4386752  -5.0196593
factor(week)6            -2.626  1.3251428 -1.9816732   0.5278091  -4.9752834
Group.fA                  0.268  1.3251428  0.2022424   0.9944085   0.2695069
factor(week)1:Group.fA  -11.406  1.8740349 -6.0863327   1.1086833 -10.2878794
factor(week)4:Group.fA   -8.824  1.8740349 -4.7085569   1.1408849  -7.7343471
factor(week)6:Group.fA   -3.152  1.8740349 -1.6819324   1.2439296  -2.5339055

Estimated Scale Parameter:  43.90009
Number of Iterations:  1

Working Correlation
     [,1] [,2] [,3] [,4]
[1,]    1    0    0    0
[2,]    0    1    0    0
[3,]    0    0    1    0
[4,]    0    0    0    1

#exchangeable working correlation
tlcl.gee.exchange = gee(lead ~ factor (week)*Group.f, id=ID,
  corstr="exchangeable", data=tlcl)

Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate

           (Intercept)          factor(week)1          factor(week)4 
                26.272                 -1.612                 -2.202 
         factor(week)6               Group.fA factor(week)1:Group.fA 
                -2.626                  0.268                -11.406 
factor(week)4:Group.fA factor(week)6:Group.fA 
                -8.824                 -3.152

summary(tlcl.gee.exchange)


 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
 Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Exchangeable 

Call:
gee(formula = lead ~ factor(week) * Group.f, id = ID, data = tlcl, 
    corstr = "exchangeable")

Summary of Residuals:
     Min       1Q   Median       3Q      Max 
-16.6620  -4.6205  -0.9930   3.6725  43.1380 


Coefficients:
                       Estimate Naive S.E.    Naive z Robust S.E.    Robust z
(Intercept)              26.272  0.9370175 28.0378980   0.7033749  37.3513444
factor(week)1            -1.612  0.8470380 -1.9031023   0.4330325  -3.7225846
factor(week)4            -2.202  0.8470380 -2.5996472   0.4386752  -5.0196593
factor(week)6            -2.626  0.8470380 -3.1002150   0.5278091  -4.9752834
Group.fA                  0.268  1.3251428  0.2022424   0.9944085   0.2695069
factor(week)1:Group.fA  -11.406  1.1978927 -9.5217212   1.1086833 -10.2878794
factor(week)4:Group.fA   -8.824  1.1978927 -7.3662693   1.1408849  -7.7343471
factor(week)6:Group.fA   -3.152  1.1978927 -2.6312875   1.2439296  -2.5339055

Estimated Scale Parameter:  43.90009
Number of Iterations:  1

Working Correlation
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.5914168 0.5914168 0.5914168
[2,] 0.5914168 1.0000000 0.5914168 0.5914168
[3,] 0.5914168 0.5914168 1.0000000 0.5914168
[4,] 0.5914168 0.5914168 0.5914168 1.0000000

# unstructured working correlation
tlcl.gee.unstruct = gee(lead ~ factor(week)*Group.f,,id=ID,
  corstr="unstructured", data=tlcl)

Beginning Cgee S-function, @(#) geeformula.q 4.13 98/01/27
running glm to get initial regression estimate

           (Intercept)          factor(week)1          factor(week)4 
                26.272                 -1.612                 -2.202 
         factor(week)6               Group.fA factor(week)1:Group.fA 
                -2.626                  0.268                -11.406 
factor(week)4:Group.fA factor(week)6:Group.fA 
                -8.824                 -3.152

summary(tlcl.gee.unstruct)


 GEE:  GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
 gee S-function, version 4.13 modified 98/01/27 (1998) 

Model:
 Link:                      Identity 
 Variance to Mean Relation: Gaussian 
 Correlation Structure:     Unstructured 

Call:
gee(formula = lead ~ factor(week) * Group.f, id = ID, data = tlcl, 
    corstr = "unstructured")

Summary of Residuals:
     Min       1Q   Median       3Q      Max 
-16.6620  -4.6205  -0.9930   3.6725  43.1380 


Coefficients:
                       Estimate Naive S.E.    Naive z Robust S.E.    Robust z
(Intercept)              26.272  0.9370175 28.0378980   0.7033749  37.3513444
factor(week)1            -1.612  0.9958441 -1.6187273   0.4330325  -3.7225846
factor(week)4            -2.202  0.9838820 -2.2380732   0.4386752  -5.0196593
factor(week)6            -2.626  0.9316319 -2.8187099   0.5278091  -4.9752834
Group.fA                  0.268  1.3251428  0.2022424   0.9944085   0.2695069
factor(week)1:Group.fA  -11.406  1.4083362 -8.0989182   1.1086833 -10.2878794
factor(week)4:Group.fA   -8.824  1.3914193 -6.3417260   1.1408849  -7.7343471
factor(week)6:Group.fA   -3.152  1.3175265 -2.3923618   1.2439296  -2.5339055

Estimated Scale Parameter:  43.90009
Number of Iterations:  1

Working Correlation
          [,1]      [,2]      [,3]      [,4]
[1,] 1.0000000 0.4352486 0.4487346 0.5057311
[2,] 0.4352486 1.0000000 0.8094551 0.6759677
[3,] 0.4487346 0.8094551 1.0000000 0.6975035
[4,] 0.5057311 0.6759677 0.6975035 1.0000000