24.6 Single Factor Covariance Model
The single-factor covariance model (Analysis of Covariance, ANCOVA) accounts for both treatment effects and a continuous covariate:
\[ Y_{ij} = \mu_{.} + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) + \epsilon_{ij} \]
for \(i = 1, \dots, r\) (treatments) and \(j = 1, \dots, n_i\) (observations per treatment).
- \(\mu_{.}\): Overall mean response.
- \(\tau_i\): Fixed treatment effects (\(\sum \tau_i = 0\)).
- \(\gamma\): Fixed regression coefficient (relationship between covariate \(X\) and response \(Y\)).
- \(X_{ij}\): Observed covariate (fixed, not random).
- \(\epsilon_{ij} \sim iid N(0, \sigma^2)\): Independent random errors.
If we use \(\gamma X_{ij}\) directly (without centering), then \(\mu_{.}\) is no longer the overall mean. Thus, centering the covariate is necessary to maintain interpretability.
Expectation and Variance
\[ \begin{aligned} E(Y_{ij}) &= \mu_. + \tau_i + \gamma(X_{ij}-\bar{X}_{..}) \\ var(Y_{ij}) &= \sigma^2 \end{aligned} \]
Since \(Y_{ij} \sim N(\mu_{ij},\sigma^2)\), we express:
\[ \mu_{ij} = \mu_. + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) \]
where \(\sum \tau_i = 0\). The mean response \(\mu_{ij}\) is a regression line with intercept \(\mu_. + \tau_i\) and slope \(\gamma\) for each treatment \(i\).
Key Assumptions
- All treatments share the same slope (\(\gamma\)).
- No interaction between treatment and covariate (parallel regression lines).
- If slopes differ, ANCOVA is not appropriate → use separate regressions per treatment.
A more general model allows multiple covariates:
\[ Y_{ij} = \mu_. + \tau_i + \gamma_1(X_{ij1}-\bar{X}_{..1}) + \gamma_2(X_{ij2}-\bar{X}_{..2}) + \epsilon_{ij} \]
Using indicator variables for treatments:
For treatment \(i = 1\): \[ l_1 = \begin{cases} 1 & \text{if case belongs to treatment 1} \\ -1 & \text{if case belongs to treatment $r$} \\ 0 & \text{otherwise} \end{cases} \]
For treatment \(i = r-1\): \[ l_{r-1} = \begin{cases} 1 & \text{if case belongs to treatment $r-1$} \\ -1 & \text{if case belongs to treatment $r$} \\ 0 & \text{otherwise} \end{cases} \]
Defining \(x_{ij} = X_{ij}- \bar{X}_{..}\), the regression model is:
\[ Y_{ij} = \mu_. + \tau_1 l_{ij,1} + \dots + \tau_{r-1} l_{ij,r-1} + \gamma x_{ij} + \epsilon_{ij} \]
where \(I_{ij,1}\) is the indicator variable \(l_1\) for the \(j\)-th case in treatment \(i\).
The treatment effects (\(\tau_i\)) are simply regression coefficients for the indicator variables.
24.6.1 Statistical Inference for Treatment Effects
To test treatment effects:
\[ \begin{aligned} &H_0: \tau_1 = \tau_2 = \dots = 0 \\ &H_a: \text{Not all } \tau_i = 0 \end{aligned} \]
Full Model (with treatment effects): \[ Y_{ij} = \mu_. + \tau_i + \gamma X_{ij} + \epsilon_{ij} \]
Reduced Model (without treatment effects): \[ Y_{ij} = \mu_. + \gamma X_{ij} + \epsilon_{ij} \]
F-Test for Treatment Effects
The test statistic is:
\[ F = \frac{SSE(R) - SSE(F)}{(N-2)-(N-(r+1))} \Big/ \frac{SSE(F)}{N-(r+1)} \]
where:
\(SSE(R)\): Sum of squared errors for the reduced model.
\(SSE(F)\): Sum of squared errors for the full model.
\(N\): Total number of observations.
\(r\): Number of treatment groups.
Under \(H_0\), the statistic follows an \(F\)-distribution:
\[ F \sim F_{(r-1, N-(r+1))} \]
Comparisons of Treatment Effects
For \(r = 3\), we estimate:
Comparison | Estimate | Variance of Estimator |
---|---|---|
\(\tau_1 - \tau_2\) | \(\hat{\tau}_1 - \hat{\tau}_2\) | \(var(\hat{\tau}_1) + var(\hat{\tau}_2) - 2cov(\hat{\tau}_1, \hat{\tau}_2)\) |
\(\tau_1 - \tau_3\) | \(2 \hat{\tau}_1 + \hat{\tau}_2\) | \(4var(\hat{\tau}_1) + var(\hat{\tau}_2) - 4cov(\hat{\tau}_1, \hat{\tau}_2)\) |
\(\tau_2 - \tau_3\) | \(\hat{\tau}_1 + 2 \hat{\tau}_2\) | \(var(\hat{\tau}_1) + 4var(\hat{\tau}_2) - 4cov(\hat{\tau}_1, \hat{\tau}_2)\) |
24.6.2 Testing for Parallel Slopes
To check if slopes differ across treatments, we use the model:
\[ Y_{ij} = \mu_{.} + \tau_1 I_{ij,1} + \tau_2 I_{ij,2} + \gamma X_{ij} + \beta_1 I_{ij,1}X_{ij} + \beta_2 I_{ij,2}X_{ij} + \epsilon_{ij} \]
where:
- \(\beta_1, \beta_2\): Interaction coefficients (slope differences across treatments).
Hypothesis Test
\[ \begin{aligned} &H_0: \beta_1 = \beta_2 = 0 \quad (\text{Slopes are equal}) \\ &H_a: \text{At least one } \beta \neq 0 \quad (\text{Slopes differ}) \end{aligned} \]
If the \(F\)-test fails to reject \(H_0\), then we assume parallel slopes.
24.6.3 Adjusted Means
The adjusted treatment means account for covariate effects:
\[ Y_{i.}(\text{adj}) = \bar{Y}_{i.} - \hat{\gamma}(\bar{X}_{i.} - \bar{X}_{..}) \]
where:
\(\bar{Y}_{i.}\): Observed mean response for treatment \(i\).
\(\hat{\gamma}\): Estimated regression coefficient.
\(\bar{X}_{i.}\): Mean covariate value for treatment \(i\).
\(\bar{X}_{..}\): Overall mean covariate value.
This provides estimated treatment means after controlling for covariate effects.