24.6 Single Factor Covariance Model

The single-factor covariance model (Analysis of Covariance, ANCOVA) accounts for both treatment effects and a continuous covariate:

\[ Y_{ij} = \mu_{.} + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) + \epsilon_{ij} \]

for \(i = 1, \dots, r\) (treatments) and \(j = 1, \dots, n_i\) (observations per treatment).

  • \(\mu_{.}\): Overall mean response.
  • \(\tau_i\): Fixed treatment effects (\(\sum \tau_i = 0\)).
  • \(\gamma\): Fixed regression coefficient (relationship between covariate \(X\) and response \(Y\)).
  • \(X_{ij}\): Observed covariate (fixed, not random).
  • \(\epsilon_{ij} \sim iid N(0, \sigma^2)\): Independent random errors.

If we use \(\gamma X_{ij}\) directly (without centering), then \(\mu_{.}\) is no longer the overall mean. Thus, centering the covariate is necessary to maintain interpretability.

Expectation and Variance

\[ \begin{aligned} E(Y_{ij}) &= \mu_. + \tau_i + \gamma(X_{ij}-\bar{X}_{..}) \\ var(Y_{ij}) &= \sigma^2 \end{aligned} \]

Since \(Y_{ij} \sim N(\mu_{ij},\sigma^2)\), we express:

\[ \mu_{ij} = \mu_. + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) \]

where \(\sum \tau_i = 0\). The mean response \(\mu_{ij}\) is a regression line with intercept \(\mu_. + \tau_i\) and slope \(\gamma\) for each treatment \(i\).


Key Assumptions

  1. All treatments share the same slope (\(\gamma\)).
  2. No interaction between treatment and covariate (parallel regression lines).
  3. If slopes differ, ANCOVA is not appropriate → use separate regressions per treatment.

A more general model allows multiple covariates:

\[ Y_{ij} = \mu_. + \tau_i + \gamma_1(X_{ij1}-\bar{X}_{..1}) + \gamma_2(X_{ij2}-\bar{X}_{..2}) + \epsilon_{ij} \]


Using indicator variables for treatments:

For treatment \(i = 1\): \[ l_1 = \begin{cases} 1 & \text{if case belongs to treatment 1} \\ -1 & \text{if case belongs to treatment $r$} \\ 0 & \text{otherwise} \end{cases} \]

For treatment \(i = r-1\): \[ l_{r-1} = \begin{cases} 1 & \text{if case belongs to treatment $r-1$} \\ -1 & \text{if case belongs to treatment $r$} \\ 0 & \text{otherwise} \end{cases} \]

Defining \(x_{ij} = X_{ij}- \bar{X}_{..}\), the regression model is:

\[ Y_{ij} = \mu_. + \tau_1 l_{ij,1} + \dots + \tau_{r-1} l_{ij,r-1} + \gamma x_{ij} + \epsilon_{ij} \]

where \(I_{ij,1}\) is the indicator variable \(l_1\) for the \(j\)-th case in treatment \(i\).

The treatment effects (\(\tau_i\)) are simply regression coefficients for the indicator variables.


24.6.1 Statistical Inference for Treatment Effects

To test treatment effects:

\[ \begin{aligned} &H_0: \tau_1 = \tau_2 = \dots = 0 \\ &H_a: \text{Not all } \tau_i = 0 \end{aligned} \]

  1. Full Model (with treatment effects): \[ Y_{ij} = \mu_. + \tau_i + \gamma X_{ij} + \epsilon_{ij} \]

  2. Reduced Model (without treatment effects): \[ Y_{ij} = \mu_. + \gamma X_{ij} + \epsilon_{ij} \]

F-Test for Treatment Effects

The test statistic is:

\[ F = \frac{SSE(R) - SSE(F)}{(N-2)-(N-(r+1))} \Big/ \frac{SSE(F)}{N-(r+1)} \]

where:

  • \(SSE(R)\): Sum of squared errors for the reduced model.

  • \(SSE(F)\): Sum of squared errors for the full model.

  • \(N\): Total number of observations.

  • \(r\): Number of treatment groups.

Under \(H_0\), the statistic follows an \(F\)-distribution:

\[ F \sim F_{(r-1, N-(r+1))} \]


Comparisons of Treatment Effects

For \(r = 3\), we estimate:

Estimates and Variances for Treatment Comparisons
Comparison Estimate Variance of Estimator
\(\tau_1 - \tau_2\) \(\hat{\tau}_1 - \hat{\tau}_2\) \(var(\hat{\tau}_1) + var(\hat{\tau}_2) - 2cov(\hat{\tau}_1, \hat{\tau}_2)\)
\(\tau_1 - \tau_3\) \(2 \hat{\tau}_1 + \hat{\tau}_2\) \(4var(\hat{\tau}_1) + var(\hat{\tau}_2) - 4cov(\hat{\tau}_1, \hat{\tau}_2)\)
\(\tau_2 - \tau_3\) \(\hat{\tau}_1 + 2 \hat{\tau}_2\) \(var(\hat{\tau}_1) + 4var(\hat{\tau}_2) - 4cov(\hat{\tau}_1, \hat{\tau}_2)\)

24.6.2 Testing for Parallel Slopes

To check if slopes differ across treatments, we use the model:

\[ Y_{ij} = \mu_{.} + \tau_1 I_{ij,1} + \tau_2 I_{ij,2} + \gamma X_{ij} + \beta_1 I_{ij,1}X_{ij} + \beta_2 I_{ij,2}X_{ij} + \epsilon_{ij} \]

where:

  • \(\beta_1, \beta_2\): Interaction coefficients (slope differences across treatments).

Hypothesis Test

\[ \begin{aligned} &H_0: \beta_1 = \beta_2 = 0 \quad (\text{Slopes are equal}) \\ &H_a: \text{At least one } \beta \neq 0 \quad (\text{Slopes differ}) \end{aligned} \]

If the \(F\)-test fails to reject \(H_0\), then we assume parallel slopes.


24.6.3 Adjusted Means

The adjusted treatment means account for covariate effects:

\[ Y_{i.}(\text{adj}) = \bar{Y}_{i.} - \hat{\gamma}(\bar{X}_{i.} - \bar{X}_{..}) \]

where:

  • \(\bar{Y}_{i.}\): Observed mean response for treatment \(i\).

  • \(\hat{\gamma}\): Estimated regression coefficient.

  • \(\bar{X}_{i.}\): Mean covariate value for treatment \(i\).

  • \(\bar{X}_{..}\): Overall mean covariate value.

This provides estimated treatment means after controlling for covariate effects.


This chapter has provided an exploration of ANOVA, a foundational technique for comparing group means across multiple experimental conditions. Beginning with the Completely Randomized Design, we established the basic framework for understanding how ANOVA partitions variability. We then extended the discussion to Nonparametric ANOVA, accommodating situations where the assumptions of traditional ANOVA are violated.

Subsequent sections introduced advanced designs such as Randomized Block and Nested Designs, which offer increased precision and control over variability by accounting for structured sources of heterogeneity. We also addressed Sample Size Planning, emphasizing the importance of statistical power and design efficiency. Finally, we examined the Single Factor Covariance Model, integrating covariates into the ANOVA framework to adjust for confounding variables and improve estimation accuracy.

Together, these topics equip the reader with the methodological rigor needed to design, implement, and interpret experiments involving multiple group comparisons in complex real-world settings.