## 20.6 Single Factor Covariance Model

$Y_{ij} = \mu_{.} + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) + \epsilon_{ij}$

for $$i = 1,...,r;j=1,..,n_i$$

where

• $$\mu_.$$ overall mean
• $$\tau_i$$: fixed treatment effects ($$\sum \tau_i =0$$)
• $$\gamma$$: fixed regression coefficient effect between X and Y
• $$X_{ij}$$ covariate (not random)
• $$\epsilon_{ij} \sim iid N(0,\sigma^2)$$: random errors

If we just use $$\gamma X_{ij}$$ as the regression term (rather than $$\gamma(X_{ij}-\bar{X}_{..})$$), then $$\mu_.$$ is no longer the overall mean; thus we need to centered mean.

$E(Y_{ij}) = \mu_. + \tau_i + \gamma(X_{ij}-\bar{X}_{..}) \\ var(Y_{ij}) = \sigma^2$

$$Y_{ij} \sim N(\mu_{ij},\sigma^2)$$, where

$\mu_{ij} = \mu_. + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) \\ \sum \tau_i =0$

Thus, the mean response ($$\mu_{ij}$$) is a regression line with intercept $$\mu_. + \tau_i$$ and slope $$\gamma$$ for each treatment i.

Assumption:

• All treatment regression lines have the same slope
• when treatment interact with covariate X (non-parallel slopes), covariance analysis is not appropriate. in which case we should use separate regression lines.

More complicated regression features (e.g., quadratic, cubic) or additional covariates e.g.,

$Y_{ij} = \mu_. + \tau_i + \gamma_1(X_{ij1}-\bar{X}_{..2}) + \gamma_2(X_{ij2}-\bar{X}_{..2}) + \epsilon_{ij}$

Regression Formulation

We can use indicator variables for treatments

$l_1 = \begin{cases} 1 & \text{if case is from treatment 1}\\ -1 & \text{if case is from treatment r}\\ 0 &\text{otherwise}\\ \end{cases} . \\ . \\ . \\ l_{r-1} = \begin{cases} 1 & \text{if case is from treatment r-1}\\ -1 & \text{if case is from treatment r}\\ 0 &\text{otherwise}\\ \end{cases}$

Let $$x_{ij} = X_{ij}- \bar{X}_{..}$$. the regression model is

$Y_{ij} = \mu_. + \tau_1l_{ij,1} + .. + \tau_{r-1}l_{ij,r-1} + \gamma x_{ij}+\epsilon_{ij}$

where $$I_{ij,1}$$ is the indicator variable $$l_1$$ for the j-th case from treatment i. The treatment effect $$\tau_1,..\tau_{r-1}$$ are just regression coefficients for the indicator variables.

We could use the same diagnostic tools for this case.

Inference

Treatment effects

$H_0: \tau_1 = \tau_2 = ...= 0 \\ H_a: \text{not all } \tau_i =0$

$\text{Full Model}: Y_{ij} = \mu_. + \tau_i + \gamma X_{ij} +\epsilon_{ij} \\ \text{Reduced Model}: Y_{ij} = \mu_. + \gamma X_{ij} + \epsilon_{ij}$

$F = \frac{SSE(R) - SSE(F)}{(N-2)-(N-(r+1))} / \frac{SSE(F)}{N-(r+1)} \sim F_{(r-1,N-(r+1))}$

If we are interested in comparisons of treatment effects.
For example, r - 3. We estimate $$\tau_1,\tau_2, \tau_3 = -\tau_1 - \tau_2$$

Comparison Estimate Variance of Estimator
$$\tau_1 - \tau_2$$ $$\hat{\tau}_1 - \hat{\tau}_2$$ $$var(\hat {\tau}_1) + var(\hat{\tau}_2) - 2cov(\hat{ \tau}_1\hat{\tau}_2)$$
$$\tau_1 - \tau_3$$ $$2 \hat{\tau}_1 + \hat{\tau}_2$$ $$4var(\hat {\tau}_1) + var(\hat{\tau}_2) - 4cov(\hat{ \tau}_1\hat{\tau}_2)$$
$$\tau_2 - \tau_3$$ $$\hat{\tau}_1 + 2 \hat{\tau}_2$$ $$var(\hat{\tau}_1) + 4var(\hat{\tau}_2) - 4cov(\hat{\tau}_1\hat{\tau}_2)$$

Testing for Parallel Slopes

Example:

r = 3

$Y_{ij} = \mu_{.} + \tau_1 I_{ij,1} + \tau_2 I_{ij,2} + \gamma X_{ij} + \beta_1 I_{ij,1}X_{ij} + \beta_2 I_{ij,2}X_{ij} + \epsilon_{ij}$

where $$\beta_1,\beta_2$$: interaction coefficients.

$H_0: \beta_1 = \beta_2 = 0 \\ H_a: \text{at least one} \beta \neq 0$

If we can’t reject $$H_0$$ using F-test then we have evidence that the slopes are parallel

Adjusted Means

The means in response after adjusting for the covariate effect

$Y_{i.}(adj) = \bar{Y}_{i.} - \hat{\gamma}(\bar{X}_{i.} - \bar{X}_{..})$