24.6 Single Factor Covariance Model

The single-factor covariance model (Analysis of Covariance, ANCOVA) accounts for both treatment effects and a continuous covariate:

\[ Y_{ij} = \mu_{.} + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) + \epsilon_{ij} \]

for \(i = 1, \dots, r\) (treatments) and \(j = 1, \dots, n_i\) (observations per treatment).

  • \(\mu_{.}\): Overall mean response.
  • \(\tau_i\): Fixed treatment effects (\(\sum \tau_i = 0\)).
  • \(\gamma\): Fixed regression coefficient (relationship between covariate \(X\) and response \(Y\)).
  • \(X_{ij}\): Observed covariate (fixed, not random).
  • \(\epsilon_{ij} \sim iid N(0, \sigma^2)\): Independent random errors.

If we use \(\gamma X_{ij}\) directly (without centering), then \(\mu_{.}\) is no longer the overall mean. Thus, centering the covariate is necessary to maintain interpretability.

Expectation and Variance

\[ \begin{aligned} E(Y_{ij}) &= \mu_. + \tau_i + \gamma(X_{ij}-\bar{X}_{..}) \\ var(Y_{ij}) &= \sigma^2 \end{aligned} \]

Since \(Y_{ij} \sim N(\mu_{ij},\sigma^2)\), we express:

\[ \mu_{ij} = \mu_. + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) \]

where \(\sum \tau_i = 0\). The mean response \(\mu_{ij}\) is a regression line with intercept \(\mu_. + \tau_i\) and slope \(\gamma\) for each treatment \(i\).


Key Assumptions

  1. All treatments share the same slope (\(\gamma\)).
  2. No interaction between treatment and covariate (parallel regression lines).
  3. If slopes differ, ANCOVA is not appropriate → use separate regressions per treatment.

A more general model allows multiple covariates:

\[ Y_{ij} = \mu_. + \tau_i + \gamma_1(X_{ij1}-\bar{X}_{..1}) + \gamma_2(X_{ij2}-\bar{X}_{..2}) + \epsilon_{ij} \]


Using indicator variables for treatments:

For treatment \(i = 1\): \[ l_1 = \begin{cases} 1 & \text{if case belongs to treatment 1} \\ -1 & \text{if case belongs to treatment $r$} \\ 0 & \text{otherwise} \end{cases} \]

For treatment \(i = r-1\): \[ l_{r-1} = \begin{cases} 1 & \text{if case belongs to treatment $r-1$} \\ -1 & \text{if case belongs to treatment $r$} \\ 0 & \text{otherwise} \end{cases} \]

Defining \(x_{ij} = X_{ij}- \bar{X}_{..}\), the regression model is:

\[ Y_{ij} = \mu_. + \tau_1 l_{ij,1} + \dots + \tau_{r-1} l_{ij,r-1} + \gamma x_{ij} + \epsilon_{ij} \]

where \(I_{ij,1}\) is the indicator variable \(l_1\) for the \(j\)-th case in treatment \(i\).

The treatment effects (\(\tau_i\)) are simply regression coefficients for the indicator variables.


24.6.1 Statistical Inference for Treatment Effects

To test treatment effects:

\[ \begin{aligned} &H_0: \tau_1 = \tau_2 = \dots = 0 \\ &H_a: \text{Not all } \tau_i = 0 \end{aligned} \]

  1. Full Model (with treatment effects): \[ Y_{ij} = \mu_. + \tau_i + \gamma X_{ij} + \epsilon_{ij} \]

  2. Reduced Model (without treatment effects): \[ Y_{ij} = \mu_. + \gamma X_{ij} + \epsilon_{ij} \]

F-Test for Treatment Effects

The test statistic is:

\[ F = \frac{SSE(R) - SSE(F)}{(N-2)-(N-(r+1))} \Big/ \frac{SSE(F)}{N-(r+1)} \]

where:

  • \(SSE(R)\): Sum of squared errors for the reduced model.

  • \(SSE(F)\): Sum of squared errors for the full model.

  • \(N\): Total number of observations.

  • \(r\): Number of treatment groups.

Under \(H_0\), the statistic follows an \(F\)-distribution:

\[ F \sim F_{(r-1, N-(r+1))} \]


Comparisons of Treatment Effects

For \(r = 3\), we estimate:

Comparison Estimate Variance of Estimator
\(\tau_1 - \tau_2\) \(\hat{\tau}_1 - \hat{\tau}_2\) \(var(\hat{\tau}_1) + var(\hat{\tau}_2) - 2cov(\hat{\tau}_1, \hat{\tau}_2)\)
\(\tau_1 - \tau_3\) \(2 \hat{\tau}_1 + \hat{\tau}_2\) \(4var(\hat{\tau}_1) + var(\hat{\tau}_2) - 4cov(\hat{\tau}_1, \hat{\tau}_2)\)
\(\tau_2 - \tau_3\) \(\hat{\tau}_1 + 2 \hat{\tau}_2\) \(var(\hat{\tau}_1) + 4var(\hat{\tau}_2) - 4cov(\hat{\tau}_1, \hat{\tau}_2)\)

24.6.2 Testing for Parallel Slopes

To check if slopes differ across treatments, we use the model:

\[ Y_{ij} = \mu_{.} + \tau_1 I_{ij,1} + \tau_2 I_{ij,2} + \gamma X_{ij} + \beta_1 I_{ij,1}X_{ij} + \beta_2 I_{ij,2}X_{ij} + \epsilon_{ij} \]

where:

  • \(\beta_1, \beta_2\): Interaction coefficients (slope differences across treatments).

Hypothesis Test

\[ \begin{aligned} &H_0: \beta_1 = \beta_2 = 0 \quad (\text{Slopes are equal}) \\ &H_a: \text{At least one } \beta \neq 0 \quad (\text{Slopes differ}) \end{aligned} \]

If the \(F\)-test fails to reject \(H_0\), then we assume parallel slopes.


24.6.3 Adjusted Means

The adjusted treatment means account for covariate effects:

\[ Y_{i.}(\text{adj}) = \bar{Y}_{i.} - \hat{\gamma}(\bar{X}_{i.} - \bar{X}_{..}) \]

where:

  • \(\bar{Y}_{i.}\): Observed mean response for treatment \(i\).

  • \(\hat{\gamma}\): Estimated regression coefficient.

  • \(\bar{X}_{i.}\): Mean covariate value for treatment \(i\).

  • \(\bar{X}_{..}\): Overall mean covariate value.

This provides estimated treatment means after controlling for covariate effects.