20.6 Single Factor Covariance Model

\[ Y_{ij} = \mu_{.} + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) + \epsilon_{ij} \]

for \(i = 1,...,r;j=1,..,n_i\)

where

  • \(\mu_.\) overall mean
  • \(\tau_i\): fixed treatment effects (\(\sum \tau_i =0\))
  • \(\gamma\): fixed regression coefficient effect between X and Y
  • \(X_{ij}\) covariate (not random)
  • \(\epsilon_{ij} \sim iid N(0,\sigma^2)\): random errors

If we just use \(\gamma X_{ij}\) as the regression term (rather than \(\gamma(X_{ij}-\bar{X}_{..})\)), then \(\mu_.\) is no longer the overall mean; thus we need to centered mean.

\[ E(Y_{ij}) = \mu_. + \tau_i + \gamma(X_{ij}-\bar{X}_{..}) \\ var(Y_{ij}) = \sigma^2 \]

\(Y_{ij} \sim N(\mu_{ij},\sigma^2)\), where

\[ \mu_{ij} = \mu_. + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) \\ \sum \tau_i =0 \]

Thus, the mean response (\(\mu_{ij}\)) is a regression line with intercept \(\mu_. + \tau_i\) and slope \(\gamma\) for each treatment i.

Assumption:

  • All treatment regression lines have the same slope
  • when treatment interact with covariate X (non-parallel slopes), covariance analysis is not appropriate. in which case we should use separate regression lines.

More complicated regression features (e.g., quadratic, cubic) or additional covariates e.g.,

\[ Y_{ij} = \mu_. + \tau_i + \gamma_1(X_{ij1}-\bar{X}_{..2}) + \gamma_2(X_{ij2}-\bar{X}_{..2}) + \epsilon_{ij} \]

Regression Formulation

We can use indicator variables for treatments

\[ l_1 = \begin{cases} 1 & \text{if case is from treatment 1}\\ -1 & \text{if case is from treatment r}\\ 0 &\text{otherwise}\\ \end{cases} . \\ . \\ . \\ l_{r-1} = \begin{cases} 1 & \text{if case is from treatment r-1}\\ -1 & \text{if case is from treatment r}\\ 0 &\text{otherwise}\\ \end{cases} \]

Let \(x_{ij} = X_{ij}- \bar{X}_{..}\). the regression model is

\[ Y_{ij} = \mu_. + \tau_1l_{ij,1} + .. + \tau_{r-1}l_{ij,r-1} + \gamma x_{ij}+\epsilon_{ij} \]

where \(I_{ij,1}\) is the indicator variable \(l_1\) for the j-th case from treatment i. The treatment effect \(\tau_1,..\tau_{r-1}\) are just regression coefficients for the indicator variables.

We could use the same diagnostic tools for this case.

Inference

Treatment effects

\[ H_0: \tau_1 = \tau_2 = ...= 0 \\ H_a: \text{not all } \tau_i =0 \]

\[ \text{Full Model}: Y_{ij} = \mu_. + \tau_i + \gamma X_{ij} +\epsilon_{ij} \\ \text{Reduced Model}: Y_{ij} = \mu_. + \gamma X_{ij} + \epsilon_{ij} \]

\[ F = \frac{SSE(R) - SSE(F)}{(N-2)-(N-(r+1))} / \frac{SSE(F)}{N-(r+1)} \sim F_{(r-1,N-(r+1))} \]

If we are interested in comparisons of treatment effects.
For example, r - 3. We estimate \(\tau_1,\tau_2, \tau_3 = -\tau_1 - \tau_2\)

Comparison Estimate Variance of Estimator
\(\tau_1 - \tau_2\) \(\hat{\tau}_1 - \hat{\tau}_2\) \(var(\hat {\tau}_1) + var(\hat{\tau}_2) - 2cov(\hat{ \tau}_1\hat{\tau}_2)\)
\(\tau_1 - \tau_3\) \(2 \hat{\tau}_1 + \hat{\tau}_2\) \(4var(\hat {\tau}_1) + var(\hat{\tau}_2) - 4cov(\hat{ \tau}_1\hat{\tau}_2)\)
\(\tau_2 - \tau_3\) \(\hat{\tau}_1 + 2 \hat{\tau}_2\) \(var(\hat{\tau}_1) + 4var(\hat{\tau}_2) - 4cov(\hat{\tau}_1\hat{\tau}_2)\)

Testing for Parallel Slopes

Example:

r = 3

\[ Y_{ij} = \mu_{.} + \tau_1 I_{ij,1} + \tau_2 I_{ij,2} + \gamma X_{ij} + \beta_1 I_{ij,1}X_{ij} + \beta_2 I_{ij,2}X_{ij} + \epsilon_{ij} \]

where \(\beta_1,\beta_2\): interaction coefficients.

\[ H_0: \beta_1 = \beta_2 = 0 \\ H_a: \text{at least one} \beta \neq 0 \]

If we can’t reject \(H_0\) using F-test then we have evidence that the slopes are parallel

Adjusted Means

The means in response after adjusting for the covariate effect

\[ Y_{i.}(adj) = \bar{Y}_{i.} - \hat{\gamma}(\bar{X}_{i.} - \bar{X}_{..}) \]