21.6 Single Factor Covariance Model
\[ Y_{ij} = \mu_{.} + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) + \epsilon_{ij} \]
for \(i = 1,...,r;j=1,..,n_i\)
where
- \(\mu_.\) overall mean
- \(\tau_i\): fixed treatment effects (\(\sum \tau_i =0\))
- \(\gamma\): fixed regression coefficient effect between X and Y
- \(X_{ij}\) covariate (not random)
- \(\epsilon_{ij} \sim iid N(0,\sigma^2)\): random errors
If we just use \(\gamma X_{ij}\) as the regression term (rather than \(\gamma(X_{ij}-\bar{X}_{..})\)), then \(\mu_.\) is no longer the overall mean; thus we need to centered mean.
\[ \begin{aligned} E(Y_{ij}) &= \mu_. + \tau_i + \gamma(X_{ij}-\bar{X}_{..}) \\ var(Y_{ij}) &= \sigma^2 \end{aligned} \]
\(Y_{ij} \sim N(\mu_{ij},\sigma^2)\),
where
\[ \begin{aligned} \mu_{ij} &= \mu_. + \tau_i + \gamma(X_{ij} - \bar{X}_{..}) \\ \sum \tau_i &=0 \end{aligned} \]
Thus, the mean response (\(\mu_{ij}\)) is a regression line with intercept \(\mu_. + \tau_i\) and slope \(\gamma\) for each treatment $$i.
Assumption:
- All treatment regression lines have the same slope
- when treatment interact with covariate \(X\) (non-parallel slopes), covariance analysis is not appropriate. in which case we should use separate regression lines.
More complicated regression features (e.g., quadratic, cubic) or additional covariates e.g.,
\[ Y_{ij} = \mu_. + \tau_i + \gamma_1(X_{ij1}-\bar{X}_{..2}) + \gamma_2(X_{ij2}-\bar{X}_{..2}) + \epsilon_{ij} \]
Regression Formulation
We can use indicator variables for treatments
\[ l_1 = \begin{cases} 1 & \text{if case is from treatment 1}\\ -1 & \text{if case is from treatment r}\\ 0 &\text{otherwise}\\ \end{cases} \]
\[ . \]
\[ . \]
\[ l_{r-1} = \begin{cases} 1 & \text{if case is from treatment r-1}\\ -1 & \text{if case is from treatment r}\\ 0 &\text{otherwise}\\ \end{cases} \]
Let \(x_{ij} = X_{ij}- \bar{X}_{..}\). the regression model is
\[ Y_{ij} = \mu_. + \tau_1l_{ij,1} + .. + \tau_{r-1}l_{ij,r-1} + \gamma x_{ij}+\epsilon_{ij} \]
where \(I_{ij,1}\) is the indicator variable \(l_1\) for the j-th case from treatment i. The treatment effect \(\tau_1,..\tau_{r-1}\) are just regression coefficients for the indicator variables.
We could use the same diagnostic tools for this case.
Inference
Treatment effects
\[ \begin{aligned} &H_0: \tau_1 = \tau_2 = ...= 0 \\ &H_a: \text{not all } \tau_i =0 \end{aligned} \]
\[ \begin{aligned} &\text{Full Model}: Y_{ij} = \mu_. + \tau_i + \gamma X_{ij} +\epsilon_{ij} \\ &\text{Reduced Model}: Y_{ij} = \mu_. + \gamma X_{ij} + \epsilon_{ij} \end{aligned} \]
\[ F = \frac{SSE(R) - SSE(F)}{(N-2)-(N-(r+1))} / \frac{SSE(F)}{N-(r+1)} \sim F_{(r-1,N-(r+1))} \]
If we are interested in comparisons of treatment effects.
For example, r - 3. We estimate \(\tau_1,\tau_2, \tau_3 = -\tau_1 - \tau_2\)
Comparison | Estimate | Variance of Estimator |
---|---|---|
\(\tau_1 - \tau_2\) | \(\hat{\tau}_1 - \hat{\tau}_2\) | \(var(\hat {\tau}_1) + var(\hat{\tau}_2) - 2cov(\hat{ \tau}_1\hat{\tau}_2)\) |
\(\tau_1 - \tau_3\) | \(2 \hat{\tau}_1 + \hat{\tau}_2\) | \(4var(\hat {\tau}_1) + var(\hat{\tau}_2) - 4cov(\hat{ \tau}_1\hat{\tau}_2)\) |
\(\tau_2 - \tau_3\) | \(\hat{\tau}_1 + 2 \hat{\tau}_2\) | \(var(\hat{\tau}_1) + 4var(\hat{\tau}_2) - 4cov(\hat{\tau}_1\hat{\tau}_2)\) |
Testing for Parallel Slopes
Example:
r = 3
\[ Y_{ij} = \mu_{.} + \tau_1 I_{ij,1} + \tau_2 I_{ij,2} + \gamma X_{ij} + \beta_1 I_{ij,1}X_{ij} + \beta_2 I_{ij,2}X_{ij} + \epsilon_{ij} \]
where \(\beta_1,\beta_2\): interaction coefficients.
\[ \begin{aligned} &H_0: \beta_1 = \beta_2 = 0 \\ &H_a: \text{at least one} \beta \neq 0 \end{aligned} \]
If we can’t reject \(H_0\) using F-test then we have evidence that the slopes are parallel
Adjusted Means
The means in response after adjusting for the covariate effect
\[ Y_{i.}(adj) = \bar{Y}_{i.} - \hat{\gamma}(\bar{X}_{i.} - \bar{X}_{..}) \]