21.1 Completely Randomized Design (CRD)
Treatment factor A with \(a\ge2\) treatments levels. Experimental units are randomly assigned to each treatment. The number of experimental units in each group can be
- equal (balanced): n
- unequal (unbalanced): \(n_i\) for the i-th group (i = 1,…,a).
The total sample size is \(N=\sum_{i=1}^{a}n_i\)
Possible assignments of units to treatments are \(k=\frac{N!}{n_1!n_2!...n_a!}\)
Each has probability 1/k of being selected. Each experimental unit is measured with a response \(Y_{ij}\), in which j denotes unit and i denotes treatment.
Treatment
1 | 2 | … | a | |
---|---|---|---|---|
\(Y_{11}\) | \(Y_{21}\) | … | \(Y_{a1}\) | |
\(Y_{12}\) | … | … | … | |
… | … | … | … | |
Sample Mean | \(\bar{Y_{1.}}\) | \(\bar{Y_{2.}}\) | … | \(\bar{Y_{a.}}\) |
Sample SD | \(s_1\) | \(s_2\) | … | \(s_a\) |
where \(\bar{Y_{i.}}=\frac{1}{n_i}\sum_{j=1}^{n_i}Y_{ij}\)
\(s_i^2=\frac{1}{n_i-1}\sum_{j=1}^{n_i}(Y_{ij}-\bar{Y_i})^2\)
And the grand mean is \(\bar{Y_{..}}=\frac{1}{N}\sum_{i}\sum_{j}Y_{ij}\)
21.1.1 Single Factor Fixed Effects Model
also known as Single Factor (One-Way) ANOVA or ANOVA Type I model.
Partitioning the Variance
The total variability of the \(Y_{ij}\) observation can be measured as the deviation of \(Y_{ij}\) around the overall mean \(\bar{Y_{..}}\): \(Y_{ij} - \bar{Y_{..}}\)
This can be rewritten as:
\[ \begin{aligned} Y_{ij} - \bar{Y_{..}}&=Y_{ij} - \bar{Y_{..}} + \bar{Y_{i.}} - \bar{Y_{i.}} \\ &= (\bar{Y_{i.}}-\bar{Y_{..}})+(Y_{ij}-\bar{Y_{i.}}) \end{aligned} \]
where
- the first term is the between treatment differences (i.e., the deviation of the treatment mean from the overall mean)
- the second term is within treatment differences (i.e., the deviation of the observation around its treatment mean)
\[ \begin{aligned} \sum_{i}\sum_{j}(Y_{ij} - \bar{Y_{..}})^2 &= \sum_{i}n_i(\bar{Y_{i.}}-\bar{Y_{..}})^2+\sum_{i}\sum_{j}(Y_{ij}-\bar{Y_{i.}})^2 \\ SSTO &= SSTR + SSE \\ total~SS &= treatment~SS + error~SS \\ (N-1)~d.f. &= (a-1)~d.f. + (N - a) ~ d.f. \end{aligned} \]
we lose a d.f. for the total corrected SSTO because of the estimation of the mean (\(\sum_{i}\sum_{j}(Y_{ij} - \bar{Y_{..}})=0\))
And, for the SSTR \(\sum_{i}n_i(\bar{Y_{i.}}-\bar{Y_{..}})=0\)
Accordingly, \(MSTR= \frac{SST}{a-1}\) and \(MSR=\frac{SSE}{N-a}\)
ANOVA Table
Source of Variation | SS | df | MS |
---|---|---|---|
Between Treatments | \(\sum_{i}n_i (\bar{Y_{i.}}-\bar{Y_{..}})^2\) | a-1 | SSTR/(a-1) |
Error (within treatments) | \(\sum_{i}\sum_{j}(Y_{ij}-\bar{Y_{i.}})^2\) | N-a | SSE/(N-a) |
Total (corrected) | \(\sum_{i}n_i (\bar{Y_{i.}}-\bar{Y_{..}})^2\) | N-1 |
Linear Model Explanation of ANOVA
21.1.1.1 Cell means model
\[ Y_{ij}=\mu_i+\epsilon\_{ij} \]
where
\(Y_{ij}\) response variable in \(j\)-th subject for the \(i\)-th treatment
\(\mu_i\): parameters (fixed) representing the unknown population mean for the i-th treatment
\(\epsilon_{ij}\) independent \(N(0,\sigma^2)\) errors
\(E(Y_{ij})=\mu_i\) \(var(Y_{ij})=var(\epsilon_{ij})=\sigma^2\)
All observations have the same variance
Example:
\(a = 3\) (3 treatments) \(n_1=n_2=n_3=2\)
\[ \begin{aligned} \left(\begin{array}{c} Y_{11}\\ Y_{12}\\ Y_{21}\\ Y_{22}\\ Y_{31}\\ Y_{32}\\ \end{array}\right) &= \left(\begin{array}{ccc} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 1 \\ \end{array}\right) \left(\begin{array}{c} \mu_1 \\ \mu_2 \\ \mu_3 \\ \end{array}\right) + \left(\begin{array}{c} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{31} \\ \epsilon_{32} \\ \end{array}\right)\\ \mathbf{y} &= \mathbf{X\beta} +\mathbf{\epsilon} \end{aligned} \]
\(X_{k,ij}=1\) if the \(k\)-th treatment is used
\(X_{k,ij}=0\) Otherwise
Note: no intercept term.
\[\begin{equation} \begin{aligned} \mathbf{b}= \left[\begin{array}{c} \mu_1 \\ \mu_2 \\ \mu_3 \\ \end{array}\right] &= (\mathbf{x}'\mathbf{x})^{-1}\mathbf{x}'\mathbf{y} \\ & = \left[\begin{array}{ccc} n_1 & 0 & 0\\ 0 & n_2 & 0\\ 0 & 0 & n_3 \\ \end{array}\right]^{-1} \left[\begin{array}{c} Y_1\\ Y_2\\ Y_3\\ \end{array}\right] \\ & = \left[\begin{array}{c} \bar{Y_1}\\ \bar{Y_2}\\ \bar{Y_3}\\ \end{array}\right] \end{aligned} \tag{21.1} \end{equation}\]
is the BLUE (best linear unbiased estimator) for \(\beta=[\mu_1 \mu_2\mu_3]'\)
\[ E(\mathbf{b})=\beta \]
\[ var(\mathbf{b})=\sigma^2(\mathbf{X'X})^{-1}=\sigma^2 \left[\begin{array}{ccc} 1/n_1 & 0 & 0\\ 0 & 1/n_2 & 0\\ 0 & 0 & 1/n_3\\ \end{array}\right] \]
\(var(b_i)=var(\hat{\mu_i})=\sigma^2/n_i\) where \(\mathbf{b} \sim N(\beta,\sigma^2(\mathbf{X'X})^{-1})\)
\[ \begin{aligned} MSE &= \frac{1}{N-a} \sum_{i}\sum_{j}(Y_{ij}-\bar{Y_{i.}})^2 \\ &= \frac{1}{N-a} \sum_{i}[(n_i-1)\frac{\sum_{i}(Y_{ij}-\bar{Y_{i.}})^2}{n_i-1}] \\ &= \frac{1}{N-a} \sum_{i}(n_i-1)s_1^2 \end{aligned} \]
We have \(E(s_i^2)=\sigma^2\)
\(E(MSE)=\frac{1}{N-a}\sum_{i}(n_i-1)\sigma^2=\sigma^2\)
Hence, MSE is an unbiased estimator of \(\sigma^2\), regardless of whether the treatment means are equal or not.
\(E(MSTR)=\sigma^2+\frac{\sum_{i}n_i(\mu_i-\mu_.)^2}{a-1}\)
where \(\mu_.=\frac{\sum_{i=1}^{a}n_i\mu_i}{\sum_{i=1}^{a}n_i}\)
If all treatment means are equals (=\(\mu_.\)), \(E(MSTR)=\sigma^2\).
Then we can use an \(F\)-test for the equality of all treatment means:
\[H_0:\mu_1=\mu_2=..=\mu_a\]
\[H_a: not~al l~ \mu_i ~ are ~ equal \]
\(F=\frac{MSTR}{MSE}\)
where large values of F support \(H_a\) (since MSTR will tend to exceed MSE when \(H_a\) holds)
and F near 1 support \(H_0\) (upper tail test)
Equivalently, when \(H_0\) is true, \(F \sim f_{(a-1,N-a)}\)
- If \(F \leq f_{(a-1,N-a;1-\alpha)}\), we cannot reject \(H_0\)
- If \(F \geq f_{(a-1,N-a;1-\alpha)}\), we reject \(H_0\)
Note: If \(a = 2\) (2 treatments), \(F\)-test = two sample \(t\)-test
21.1.1.2 Treatment Effects (Factor Effects)
Besides Cell means model, we have another way to formalize one-way ANOVA: \[Y_{ij} = \mu + \tau_i + \epsilon_{ij}\] where
- \(Y_{ij}\) is the \(j\)-th response for the \(i\)-th treatment
- \(\tau_i\) is \(i\)-th treatment effect
- \(\mu\) constant component, common to all observations
- \(\epsilon_{ij}\) independent random errors ~ \(N(0,\sigma^2)\)
For example, \(a = 3\), \(n_1=n_2=n_3=2\)
\[\begin{equation} \begin{aligned} \left(\begin{array}{c} Y_{11}\\ Y_{12}\\ Y_{21}\\ Y_{22}\\ Y_{31}\\ Y_{32}\\ \end{array}\right) &= \left(\begin{array}{cccc} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ \end{array}\right) \left(\begin{array}{c} \mu \\ \tau_1 \\ \tau_2 \\ \tau_3\\ \end{array}\right) + \left(\begin{array}{c} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{31} \\ \epsilon_{32} \\ \end{array}\right)\\ \mathbf{y} &= \mathbf{X\beta} +\mathbf{\epsilon} \end{aligned} \tag{21.2} \end{equation}\]
However,
\[ \mathbf{X'X} = \left( \begin{array} {cccc} \sum_{i}n_i & n_1 & n_2 & n_3 \\ n_1 & n_1 & 0 & 0 \\ n_2 & 0 & n_2 & 0 \\ n_3 & 0 & 0 & n_3 \\ \end{array} \right) \]
is singular thus does not exist, \(\mathbf{b}\) is insolvable (infinite solutions)
Hence, we have to impose restrictions on the parameters to a model matrix \(\mathbf{X}\) of full rank.
Whatever restriction we use, we still have:
\(E(Y_{ij})=\mu + \tau_i = \mu_i = mean ~ response ~ for ~ i-th ~ treatment\)
21.1.1.2.1 Restriction on sum of tau
\(\sum_{i=1}^{a}\tau_i=0\)
implies
\[ \mu= \mu +\frac{1}{a}\sum_{i=1}^{a}(\mu+\tau_i) \]
is the average of the treatment mean (grand mean) (overall mean)
\[ \begin{aligned} \tau_i &=(\mu+\tau_i) -\mu = \mu_i-\mu \\ &= \text{treatment mean} - \text{grand~mean} \\ &= \text{treatment effect} \end{aligned} \]
\[ \tau_a=-\tau_1-\tau_2-...-\tau_{a-1} \]
Hence, the mean for the a-th treatment is
\[ \mu_a=\mu+\tau_a=\mu-\tau_1-\tau_2-...-\tau_{a-1} \]
Hence, the model need only “a” parameters:
\[ \mu,\tau_1,\tau_2,..,\tau_{a-1} \]
Equation (21.2) becomes
\[\begin{equation} \begin{aligned} \left(\begin{array}{c} Y_{11}\\ Y_{12}\\ Y_{21}\\ Y_{22}\\ Y_{31}\\ Y_{32}\\ \end{array}\right) &= \left(\begin{array}{ccc} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & -1 & -1 \\ 1 & -1 & -1 \\ \end{array}\right) \left(\begin{array}{c} \mu \\ \tau_1 \\ \tau_2 \\ \end{array}\right) + \left(\begin{array}{c} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{31} \\ \epsilon_{32} \\ \end{array}\right)\\ \mathbf{y} &= \mathbf{X\beta} +\mathbf{\epsilon} \end{aligned} \end{equation}\]
where \(\beta\equiv[\mu,\tau_1,\tau_2]'\)
Equation (21.1) with \(\sum_{i}\tau_i=0\) becomes
\[ \begin{aligned} \mathbf{b}= \left[\begin{array}{c} \hat{\mu} \\ \hat{\tau_1} \\ \hat{\tau_2} \\ \end{array}\right] &= (\mathbf{x}'\mathbf{x})^{-1}\mathbf{x}'\mathbf{y} \\ & = \left[\begin{array}{ccc} \sum_{i}n_i & n_1-n_3 & n_2-n_3\\ n_1-n_3 & n_1+n_3 & n_3\\ n_2-n_3 & n_3 & n_2-n_3 \\ \end{array}\right]^{-1} \left[\begin{array}{c} Y_{..}\\ Y_{1.}-Y_{3.}\\ Y_{2.}-Y_{3.}\\ \end{array}\right] \\ & = \left[\begin{array}{c} \frac{1}{3}\sum_{i=1}^{3}\bar{Y_{i.}}\\ \bar{Y_{1.}}-\frac{1}{3}\sum_{i=1}^{3}\bar{Y_{i.}}\\ \bar{Y_{2.}}-\frac{1}{3}\sum_{i=1}^{3}\bar{Y_{i.}}\\ \end{array}\right]\\ & = \left[\begin{array}{c} \hat{\mu}\\ \hat{\tau_1}\\ \hat{\tau_2}\\ \end{array}\right] \end{aligned} \]
and \(\hat{\tau_3}=-\hat{\tau_1}-\hat{\tau_2}=\bar{Y_3}-\frac{1}{3} \sum_{i}\bar{Y_{i.}}\)
21.1.1.2.2 Restriction on first tau
In R, lm()
uses the restriction \(\tau_1=0\)
For the previous example, for \(n_1=n_2=n_3=2\), and \(\tau_1=0\).
Then the treatment means can be written as:
\[ \begin{aligned} \mu_1 &= \mu + \tau_1 = \mu + 0 = \mu \\ \mu_2 &= \mu + \tau_2 \\ \mu_3 &= \mu + \tau_3 \end{aligned} \]
Hence, \(\mu\) is the mean response for the first treatment
In the matrix form,
\[ \begin{aligned} \left(\begin{array}{c} Y_{11}\\ Y_{12}\\ Y_{21}\\ Y_{22}\\ Y_{31}\\ Y_{32}\\ \end{array}\right) &= \left(\begin{array}{ccc} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ \end{array}\right) \left(\begin{array}{c} \mu \\ \tau_2 \\ \tau_3 \\ \end{array}\right) + \left(\begin{array}{c} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{31} \\ \epsilon_{32} \\ \end{array}\right)\\ \mathbf{y} &= \mathbf{X\beta} +\mathbf{\epsilon} \end{aligned} \]
\(\beta = [\mu,\tau_2,\tau_3]'\)
\[ \begin{aligned} \mathbf{b}= \left[\begin{array}{c} \hat{\mu} \\ \hat{\tau_2} \\ \hat{\tau_3} \\ \end{array}\right] &= (\mathbf{x}'\mathbf{x})^{-1}\mathbf{x}'\mathbf{y} \\ & = \left[\begin{array}{ccc} \sum_{i}n_i & n_2 & n_3\\ n_2 & n_2 & 0\\ n_3 & 0 & n_3 \\ \end{array}\right]^{-1} \left[\begin{array}{c} Y_{..}\\ Y_{2.}\\ Y_{3.}\\ \end{array}\right] \\ & = \left[ \begin{array}{c} \bar{Y_{1.}} \\ \bar{Y_{2.}} - \bar{Y_{1.}} \\ \bar{Y_{3.}} - \bar{Y_{1.}}\\ \end{array}\right] \end{aligned} \]
\[ E(\mathbf{b})= \beta = \left[\begin{array}{c} {\mu}\\ {\tau_2}\\ {\tau_3}\\ \end{array}\right] = \left[\begin{array}{c} \mu_1\\ \mu_2-\mu_1\\ \mu_3-\mu_1\\ \end{array}\right] \]
\[ \begin{aligned} var(\mathbf{b}) &= \sigma^2(\mathbf{X'X})^{-1} \\ var(\hat{\mu}) &= var(\bar{Y_{1.}})=\sigma^2/n_1 \\ var(\hat{\tau_2}) &= var(\bar{Y_{2.}}-\bar{Y_{1.}}) = \sigma^2/n_2 + \sigma^2/n_1 \\ var(\hat{\tau_3}) &= var(\bar{Y_{3.}}-\bar{Y_{1.}}) = \sigma^2/n_3 + \sigma^2/n_1 \end{aligned} \]
Note For all three parameterization, the ANOVA table is the same
- Model 1: \(Y_{ij} = \mu_i + \epsilon_{ij}\)
- Model 2: \(Y_{ij} = \mu + \tau_i + \epsilon_{ij}\) where \(\sum_{i} \tau_i=0\)
- Model 3: \(Y_{ij}= \mu + \tau_i + \epsilon_{ij}\) where \(\tau_1=0\)
All models have the same calculation for \(\hat{Y}\) as
\[ \mathbf{\hat{Y} = X(X'X)^{-1}X'Y=PY = Xb} \]
ANOVA Table
Source of Variation | SS | df | MS | F |
---|---|---|---|---|
Between Treatments | \(\sum_{i} n _ i (\bar { Y_ {i .} } -\bar{Y_{..}})^2 = \mathbf{Y ' (P-P_1)Y}\) | a-1 | \(\frac{SSTR}{a-1}\) | \(\frac{MSTR}{MSE}\) |
Error (within treatments) |
\(\sum_{i}\sum_{j}(Y_{ij} -\bar{Y_{i.}})^2=\mathbf{e'e}\) | N-a | \(\frac{SSE}{N-a}\) | |
Total (corrected) | \(\sum_{i } n_i(\bar{Y_{i.}}-\bar{Y_{..}})^2=\mathbf{Y'Y - Y'P_1Y}\) | N-1 |
where \(\mathbf{P_1} = \frac{1}{n}\mathbf{J}\)
The \(F\)-statistic here has \((a-1,N-a)\) degrees of freedom, which gives the same value for all three parameterization, but the hypothesis test is written a bit different:
\[ \begin{aligned} &H_0 : \mu_1 = \mu_2 = ... = \mu_a \\ &H_0 : \mu + \tau_1 = \mu + \tau_2 = ... = \mu + \tau_a \\ &H_0 : \tau_1 = \tau_2 = ...= \tau_a \end{aligned} \]
The \(F\)-test here serves as a preliminary analysis, to see if there is any difference at different factors. For more in-depth analysis, we consider different testing of treatment effects.
21.1.1.3 Testing of Treatment Effects
- A Single Treatment Mean \(\mu_i\)
- A Differences Between Treatment Means
- A Contrast Among Treatment Means
- A Linear Combination of Treatment Means
21.1.1.3.1 Single Treatment Mean
We have \(\hat{\mu_i}=\bar{Y_{i.}}\) where
- \(E(\bar{Y_{i.}})=\mu_i\)
- \(var(\bar{Y_{i}})=\sigma^2/n_i\) estimated by \(s^2(\bar{Y_{i.}})=MSE / n_i\)
Since \(\frac{\bar{Y_{i.}}-\mu_i}{s(\bar{Y_{i.}})} \sim t_{N-a}\) and the confidence interval for \(\mu_i\) is \(\bar{Y_{i.}} \pm t_{1-\alpha/2;N-a}s(\bar{Y_{i.}})\),
then we can do a t-test for the means difference with some constant \(c\)
\[ \begin{aligned} &H_0: \mu_i = c \\ &H_1: \mu_i \neq c \end{aligned} \]
where
\[ T =\frac{\bar{Y_{i.}}-c}{s(\bar{Y_{i.}})} \]
follows \(t_{N-a}\) when \(H_0\) is true.
If \(|T| > t_{1-\alpha/2;N-a}\), we can reject \(H_0\)
21.1.1.3.2 Differences Between Treatment Means
Let \(D=\mu_i - \mu_i'\), also known as pairwise comparison
\(D\) can be estimated by \(\hat{D}=\bar{Y_{i}}-\bar{Y_{i}}'\) is unbiased (\(E(\hat{D})=\mu_i-\mu_i'\))
Since \(\bar{Y_{i}}\) and \(\bar{Y_{i}}'\) are independent, then
\[ var(\hat{D})=var(\bar{Y_{i}}) + var(\bar{Y_{i'}}) = \sigma^2(1/n_i + 1/n_i') \]
can be estimated with
\[ s^2(\hat{D}) = MSE(1/n_i + 1/n_i') \]
With the single treatment inference,
\[ \frac{\hat{D}-D}{s(\hat{D})} \sim t_{N-a} \]
hence,
\[ \hat{D} \pm t_{(1-\alpha/2;N-a)}s(\hat{D}) \]
Hypothesis tests:
\[ \begin{aligned} &H_0: \mu_i = \mu_i' \\ &H_a: \mu_i \neq \mu_i' \end{aligned} \]
can be tested by the following statistic
\[ T = \frac{\hat{D}}{s(\hat{D})} \sim t_{1-\alpha/2;N-a} \]
reject \(H_0\) if \(|T| > t_{1-\alpha/2;N-a}\)
21.1.1.3.3 Contrast Among Treatment Means
generalize the comparison of two means, we have contrasts
A contrast is a linear combination of treatment means:
\[ L = \sum_{i=1}^{a}c_i \mu_i \]
where each \(c_i\) is non-random constant and sum to 0:
\[ \sum_{i=1}^{a} c_i = 0 \]
An unbiased estimator of a contrast L is
\[ \hat{L} = \sum_{i=1}^{a}c_i \bar{Y}_{i.} \]
and \(E(\hat{L}) = L\). Since the \(\bar{Y}_{i.}\), i = 1,…, a are independent.
\[ \begin{aligned} var(\hat{L}) &= var(\sum_{i=1}^a c_i \bar{Y}_{i.}) = \sum_{i=1}^a var(c_i \bar{Y}_i) \\ &= \sum_{i=1}^a c_i^2 var(\bar{Y}_i) = \sum_{i=1}^a c_i^2 \sigma^2 /n_i \\ &= \sigma^2 \sum_{i=1}^{a} c_i^2 /n_i \end{aligned} \]
Estimation of the variance:
\[ s^2(\hat{L}) = MSE \sum_{i=1}^{a} \frac{c_i^2}{n_i} \]
\(\hat{L}\) is normally distributed (since it is a linear combination of independent normal random variables).
Then, since \(SSE/\sigma^2\) is \(\chi_{N-a}^2\)
\[ \frac{\hat{L}-L}{s(\hat{L})} \sim t_{N-a} \]
A \(1-\alpha\) confidence limits are given by
\[ \hat{L} \pm t_{1-\alpha/2; N-a}s(\hat{L}) \]
Hypothesis testing
\[ \begin{aligned} &H_0: L = 0 \\ &H_a: L \neq 0 \end{aligned} \]
with
\[ T = \frac{\hat{L}}{s(\hat{L})} \]
reject \(H_0\) if \(|T| > t_{1-\alpha/2;N-a}\)
21.1.1.3.4 Linear Combination of Treatment Means
just like contrast \(L = \sum_{i=1}^a c_i \mu_i\) but no restrictions on the \(c_i\) coefficients.
Tests on a single treatment mean, two treatment means, and contrasts can all be considered form the same perspective.
\[ \begin{aligned} &H_0: \sum c_i \mu_i = c \\ &H_a: \sum c_i \mu_i \neq c \end{aligned} \]
The test statistics ( \(t\)-stat) can be considered equivalently as \(F\)-tests; \(F = (T)^2\) where \(F \sim F_{1,N-a}\). Since the numerator degrees of freedom is always 1 in these cases, we refer to them as single-degree-of-freedom tests.
Multiple Contrasts
To test simultaneously \(k \ge 2\) contrasts, let \(T_1,...,T_k\) be the t-stat. The joint distribution of these random variables is a multivariate t-distribution (the tests are dependent since they re based on the same data).
Limitations for comparing multiple contrasts:
The confidence coefficient \(1-\alpha\) only applies to a particular estimate, not a series of estimates; similarly, the Type I error rate, \(\alpha\), applies to a particular test, not a series of tests. Example: 3 \(t\)-tests at \(\alpha = 0.05\), if tests are independent (which they are not), \(0.95^3 = 0.857\) (thus \(\alpha - 0.143\) not 0.05)
The confidence coefficient \(1-\alpha\) and significance level \(\alpha\) are appropriate only if the test was not suggest by the data.
- often, the results of an experiment suggest important (i.e.,..g, potential significant) relationships.
- the process of studying effects suggests by the data is called data snooping
Multiple Comparison Procedures:
21.1.1.3.4.1 Tukey
All pairwise comparisons of factor level means. All pairs \(D = \mu_i - \mu_i'\) or all tests of the form:
\[ \begin{aligned} &H_0: \mu_i -\mu_i' = 0 \\ &H_a: \mu_i - \mu_i' \neq 0 \end{aligned} \]
- When all sample sizes are equal (\(n_1 = n_2 = ... = n_a\)) then the Tukey method family confidence coefficient is exactly \(1-\alpha\) and the significance level is exactly \(\alpha\)
- When the sample sizes are not equal, the family confidence coefficient is greater than \(1-\alpha\) (i.e., the significance level is less than \(\alpha\)) so the test conservative
- Tukey considers the studentized range distribution. If we have \(Y_1,..,Y_r\), observations from a normal distribution with mean \(\alpha\) and variance \(\sigma^2\). Define: \[ w = max(Y_i) - min(Y_i) \] as the range of the observations. Let \(s^2\) be an estimate of \(\sigma^2\) with v degrees of freedom. Then, \[ q(r,v) = \frac{w}{s} \] is called the studentized range. The distribution of q uses a special table.
Notes
- when we are not interested in testing all pairwise comparison,s the confidence coefficient for the family of comparisons under consideration will be greater than \(1-\alpha\) (with the significance level less than \(\alpha\))
- Tukey can be used for “data snooping” as long as the effects to be studied on the basis of preliminary data analysis are pairwise comparisons.
21.1.1.3.4.2 Scheffe
This method applies when the family of interest is the set of possible contrasts among the treatment means:
\[ L = \sum_{i=1}^a c_i \mu_i \]
where \(\sum_{i=1}^a c_i =0\)
That is, the family of all possible contrasts \(L\) or
\[ \begin{aligned} &H_0: L = 0 \\ &H_a: L \neq 0 \end{aligned} \]
The family confidence level for the Scheffe procedure is exactly \(1-\alpha\) (i.e., significance level = \(\alpha\)) whether the sample sizes are equal or not.
For simultaneous confidence intervals,
\[ \hat{L} \pm Ss(\hat{L}) \]
where \(\hat{L}=\sum c_i \bar{Y}_{i.},s^2(\hat{L}) = MSE \sum c_i^2/n_i\) and \(S^2 = (a-1)f_{1-\alpha;a-1,N-a}\)
The Scheffe procedure considers
\[ F = \frac{\hat{L}^2}{(a-1)s^2(\hat{L})} \]
where we reject \(H_0\) at the family significance level \(\alpha\) if \(F > f_{(1-\alpha;a-1,N-a)}\)
Note
- Since applications of the Scheffe never involve all conceivable contrasts, the finite family confidence coefficient will be larger than \(1-\alpha\), so \(1-\alpha\) is a lower bound. Thus, people often consider a larger \(\alpha\) (e.g., 90% confidence interval)
- Scheffe can be used for “data scooping” since the family of statements contains all possible contrasts.
- If only pairwise comparisons are to be considered, The Tukey procedure gives narrower confidence limits.
21.1.1.3.4.3 Bonferroni
Applicable whether the sample sizes are equal or unequal.
For the confidence intervals,
\[ \hat{L} \pm B s(\hat{L}) \]
where \(B= t_{(1-\alpha/(2g);N-a)}\) and g is the number of comparisons in the family.
Hypothesis testing
\[ \begin{aligned} &H_0: L = 0 \\ &H_a: L \neq 0 \end{aligned} \]
Let \(T= \frac{\hat{L}}{s(\hat{L})}\) and reject \(H_0\) if \(|T|>t_{1-\alpha/(2g),N-a}\)
Notes
- If all pairwise comparisons are of interest, the Tukey procedure is superior (narrower confidence intervals). If not, Bonferroni may be better.
- Bonferroni is better than Scheffe when the number of contrasts is about the same as the treatment levels (or less).
- Recommendation: compute all threes and pick the smallest.
- Bonferroni can’t be used for data snooping
21.1.2 Single Factor Random Effects Model
Also known as ANOVA Type II models.
Treatments are chosen at from from larger population. We extend inference to all treatments in the population and not restrict our inference to those treatments that happened to be selected for the study.
21.1.2.1 Random Cell Means
\[ Y_{ij} = \mu_i + \epsilon_{ij} \]
where
- \(\mu_i \sim N(\mu, \sigma^2_{\mu})\) and independent
- \(\epsilon_{ij} \sim N(0,\sigma^2)\) and independent
\(\mu_i\) and \(\epsilon_{ij}\) are mutually independent for \(i =1,...,a; j = 1,...,n\)
With all treatment sample sizes are equal
\[ \begin{aligned} E(Y_{ij}) &= E(\mu_i) = \mu \\ var(Y_{ij}) &= var(\mu_i) + var(\epsilon_i) = \sigma^2_{\mu} + \sigma^2 \end{aligned} \]
Since \(Y_{ij}\) are not independent
\[ \begin{aligned} cov(Y_{ij},Y_{ij'}) &= E(Y_{ij}Y_{ij'}) - E(Y_{ij})E(Y_{ij'}) \\ &= E(\mu_i^2 + \mu_i \epsilon_{ij'} + \mu_i \epsilon_{ij} + \epsilon_{ij}\epsilon_{ij'}) - \mu^2 \\ &= \sigma^2_{\mu} + \mu^2 - \mu^2 & \text{if} j \neq j' \\ &= \sigma^2_{\mu} & \text{if} j \neq j' \end{aligned} \]
\[ \begin{aligned} cov(Y_{ij},Y_{i'j'}) &= E(\mu_i \mu_{i'} + \mu_i \epsilon_{i'j'}+ \mu_{i'}\epsilon_{ij}+ \epsilon_{ij}\epsilon_{i'j'}) - \mu^2 \\ &= \mu^2 - \mu^2 & \text{if } i \neq i' \\ &= 0 \\ \end{aligned} \]
Hence,
- all observations have the same variance
- any two observations from the same treatment have covariance \(\sigma^2_{\mu}\)
- The correlation between any two responses from the same treatment:
\[ \begin{aligned} \rho(Y_{ij},Y_{ij'}) &= \frac{\sigma^2_{\mu}}{\sigma^2_{\mu}+ \sigma^2} && \text{$j \neq j'$} \end{aligned} \]
Inference
Intraclass Correlation Coefficient
\[ \frac{\sigma^2_{\mu}}{\sigma^2 + \sigma^2_{\mu}} \]
which measures the proportion of total variability of \(Y_{ij}\) accounted for by the variance of \(\mu_i\)
\[ \begin{aligned} &H_0: \sigma_{\mu}^2 = 0 \\ &H_a: \sigma_{\mu}^2 \neq 0 \end{aligned} \]
\(H_0\) implies \(\mu_i = \mu\) for all i, which can be tested by the F-test in ANOVA.
The understandings of the Single Factor Fixed Effects Model and the Single Factor Random Effects Model are different, the ANOVA is same for the one factor model. The difference is in the expected mean squares
Random Effects Model | Fixed Effects Model |
---|---|
\(E(MSE) = \sigma^2\) | \(E(MSE) = \sigma^2\) |
\(E(M STR) = \sigma^2 - n \sigma^2_\mu\) | \(E(MSTR) = \sigma^2 + \frac{ \sum_i n_i (\mu_i - \mu)^2}{a-1}\) |
If \(\sigma^2_\mu\), then MSE and MSTR have the same expectation (\(\sigma^2\)). Otherwise, \(E(MSTR) >E(MSE)\). Large values of the statistic
\[ F = \frac{MSTR}{MSE} \]
suggest we reject \(H_0\).
Since \(F \sim F_{(a-1,a(n-1))}\) when \(H_0\) holds. If \(F > f_{(1-\alpha;a-1,a(n-1))}\) we reject \(H_0\).
If sample sizes are not equal, \(F\)-test can still be used, but the df are \(a-1\) and \(N-a\).
21.1.2.1.1 Estimation of \(\mu\)
An unbiased estimator of \(E(Y_{ij})=\mu\) is the grand mean: \(\hat{\mu} = \hat{Y}_{..}\)
The variance of this estimator is
\[ \begin{aligned} var(\bar{Y}_{..}) &= var(\sum_i \bar{Y}_{i.}/a) \\ &= \frac{1}{a^2}\sum_ivar(\bar{Y}_{i.}) \\ &= \frac{1}{a^2}\sum_i(\sigma^2_\mu+\sigma^2/n) \\ &= \frac{1}{a^2}(\sigma^2_{\mu}+\sigma^2/n) \\ &= \frac{n\sigma^2_{\mu}+ \sigma^2}{an} \end{aligned} \]
An unbiased estimator of this variance is \(s^2(\bar{Y})=\frac{MSTR}{an}\). Thus \(\frac{\bar{Y}_{..}-\mu}{s(\bar{Y}_{..})} \sim t_{a-1}\)
A \(1-\alpha\) confidence interval is \(\bar{Y}_{..} \pm t_{(1-\alpha/2;a-1)}s(\bar{Y}_{..})\)
21.1.2.1.2 Estimation of \(\sigma^2_\mu/(\sigma^2_{\mu}+\sigma^2)\)
In the random and fixed effects model, MSTR and MSE are independent. When the sample sizes are equal (\(n_i = n\) for all i),
\[ \frac{\frac{MSTR}{n\sigma^2_\mu+ \sigma^2}}{\frac{MSE}{\sigma^2}} \sim f_{(a-1,a(n-1))} \]
\[ P(f_{(\alpha/2;a-1,a(n-1))}\le \frac{\frac{MSTR}{n\sigma^2_\mu+ \sigma^2}}{\frac{MSE}{\sigma^2}} \le f_{(1-\alpha/2;a-1,a(n-1))}) = 1-\alpha \]
\[ \begin{aligned} L &= \frac{1}{n}(\frac{MSTR}{MSE}(\frac{1}{f_{(1-\alpha/2;a-1,a(n-1))}})-1) \\ U &= \frac{1}{n}(\frac{MSTR}{MSE}(\frac{1}{f_{(\alpha/2;a-1,a(n-1))}})-1) \end{aligned} \]
The lower and upper \((L^*,U^*)\) confidence limits for \(\frac{\sigma^2_\mu}{\sigma^2_\mu + \sigma^2}\)
\[ \begin{aligned} L^* &= \frac{L}{1+L} \\ U^* &= \frac{U}{1+U} \end{aligned} \]
If the lower limit for \(\frac{\sigma^2_\mu}{\sigma^2}\) is negative, it is customary to set \(L = 0\).
21.1.2.1.3 Estimation of \(\sigma^2\)
\(a(n-1)MSE/\sigma^2 \sim \chi^2_{a(n-1)}\), the \((1-\alpha)\) confidence interval for \(\sigma^2\):
\[ \frac{a(n-1)MSE}{\chi^2_{1-\alpha/2;a(n-1)}} \le \sigma^2 \le \frac{a(n-1)MSE}{\chi^2_{\alpha/2;a(n-1)}} \]
can also be used in case sample sizes are not equal - then df is N-a.
21.1.2.1.4 Estimation of \(\sigma^2_\mu\)
\(E(MSE) = \sigma^2\) \(E(MSTR) = \sigma^2 + n\sigma^2_\mu\). Hence,
\[ \sigma^2_{\mu} = \frac{E(MSTR)- E(MSE)}{n} \]
An unbiased estimator of \(\sigma^2_\mu\) is given by
\[ s^2_\mu =\frac{MSTR-MSE}{n} \]
if \(s^2_\mu < 0\), set \(s^2_\mu = 0\)
If sample sizes are not equal,
\[ s^2_\mu = \frac{MSTR - MSE}{n'} \]
where \(n' = \frac{1}{a-1}(\sum_i n_i- \frac{\sum_i n^2_i}{\sum_i n_i})\)
no exact confidence intervals for \(\sigma^2_\mu\), but we can approximate intervals.
Satterthewaite Procedure can be used to construct approximate confidence intervals for linear combination of expected mean squares
A linear combination:
\[ \sigma^2_\mu = \frac{1}{n} E(MSTR) + (-\frac{1}{n}) E(MSE) \]
\[ S = d_1 E(MS_1) + ..+ d_h E(MS_h) \]
where \(d_i\) are coefficients.
An unbiased estimator of S is
\[ \hat{S} = d_1 MS_1 + ...+ d_h MS_h \]
Let \(df_i\) be the degrees of freedom associated with the mean square \(MS_i\). The Satterthwaite approximation:
\[ \frac{(df)\hat{S}}{S} \sim \chi^2_{df} \]
where
\[ df = \frac{(d_1MS_1+...+d_hMS_h)^2}{(d_1MS_1)^2/df_1 + ...+ (d_hMS_h)^2/df_h} \]
An approximate \(1-\alpha\) confidence interval for S:
\[ \frac{(df)\hat{S}}{\chi^2_{1-\alpha/2;df}} \le S \le \frac{(df)\hat{S}}{\chi^2_{\alpha/2;df}} \]
For the single factor random effects model
\[ \frac{(df)s^2_\mu}{\chi^2_{1-\alpha/2;df}} \le \sigma^2_\mu \le \frac{(df)s^2_\mu}{\chi^2_{\alpha/2;df}} \]
where
\[ df = \frac{(sn^2_\mu)^2}{\frac{(MSTR)^2}{a-1}+ \frac{(MSE)^2}{a(n-1)}} \]
21.1.2.2 Random Treatment Effects Model
\[ \tau_i = \mu_i - E(\mu_i) = \mu_i - \mu \]
we have \(\mu_i = \mu + \tau_i\) and
\[ Y_{ij} = \mu + \tau_i + \epsilon_{ij} \]
where
- \(\mu\) = constant, common to all observations
- \(\tau_i \sim N(0,\sigma^2_\tau)\) independent (random variables)
- \(\epsilon_{ij} \sim N(0,\sigma^2)\) independent.
- \(\tau_{i}, \epsilon_{ij}\) are independent (i=1,…,a; j =1,..,n)
- our model is concerned with only balanced single factor ANOVA.
Diagnostics Measures
- Non-constant error variance (plots, Levene test, Hartley test).
- Non-independence of errors (plots, Durban-Watson test).
- Outliers (plots, regression methods).
- Non-normality of error terms (plots, Shapiro-Wilk, Anderson-Darling).
- Omitted Variable Bias (plots)
Remedial
- Weighted Least Squares
- Transformations
- Non-parametric Procedures.
Note
Fixed effect ANOVA is relatively robust to
- non-normality
- unequal variances when sample sizes are approximately equal; at least the F-test and multiple comparisons. However, single comparisons of treatment means are sensitive to unequal variances.
Lack of independence can seriously affect both fixed and random effect ANVOA.
21.1.3 Two Factor Fixed Effect ANOVA
The multi-factor experiment is
- more efficient
- provides more info
- gives more validity to the findings.
21.1.3.1 Balanced
Assumption:
- All treatment sample sizes are equal
- All treatment means are of equal importance
Assume:
- Factor \(A\) has
a
levels and Factor \(B\) hasb
levels. All \(a \times b\) factor levels are considered. - The number of treatments for each level is n. \(N = abn\) observations in the study.
21.1.3.1.1 Cell Means Model
\[ Y_{ijk} = \mu_{ij} + \epsilon_{ijk} \]
where
- \(\mu_{ij}\) are fixed parameters (cell means)
- \(i = 1,...,a\) = the levels of Factor A
- \(j = 1,...,b\) = the levels of Factor B.
- \(\epsilon_{ijk} \sim \text{indep } N(0,\sigma^2)\) for \(i = 1,...,a\), \(j = 1,..,b\) and \(k = 1,..,n\)
And
\[ \begin{aligned} E(Y_{ijk}) &= \mu_{ij} \\ var(Y_{ijk}) &= var(\epsilon_{ijk}) = \sigma^2 \end{aligned} \]
Hence,
\[ Y_{ijk} \sim \text{indep } N(\mu_{ij},\sigma^2) \]
And the model is
\[ \mathbf{Y} = \mathbf{X} \beta + \epsilon \]
Thus,
\[ \begin{aligned} E(\mathbf{Y}) &= \mathbf{X}\beta \\ var(\mathbf{Y}) &= \sigma^2 \mathbf{I} \end{aligned} \]
Interaction
\[ (\alpha \beta)_{ij} = \mu_{ij} - (\mu_{..}+ \alpha_i + \beta_j) \]
where
- \(\mu_{..} = \sum_i \sum_j \mu_{ij}/ab\) is the grand mean
- \(\alpha_i = \mu_{i.}-\mu_{..}\) is the main effect for factor \(A\) at the \(i\)-th level
- \(\beta_j = \mu_{.j} - \mu_{..}\) is the main effect for factor \(B\) at the \(j\)-th level
- \((\alpha \beta)_{ij}\) is the interaction effect when factor \(A\) is at the \(i\)-th level and factor \(B\) is at the \(j\)-th level.
- \((\alpha \beta)_{ij} = \mu_{ij} - \mu_{i.}-\mu_{.j}+ \mu_{..}\)
Examine interactions:
- Examine whether all \(\mu_{ij}\) can be expressed as the sums \(\mu_{..} + \alpha_i + \beta_j\)
- Examine whether the difference between the mean responses for any two levels of factor \(B\) is the same for all levels of factor \(A\).
- Examine whether the difference between the mean response for any two levels of factor \(A\) is the same for all levels of factor \(B\)
- Examine whether the treatment mean curves for the different factor levels in a treatment plot are parallel.
For \(j = 1,...,b\)
\[ \begin{aligned} \sum_i(\alpha \beta)_{ij} &= \sum_i (\mu_{ij} - \mu_{..} - \alpha_i - \beta_j) \\ &= \sum_i \mu_{ij} - a \mu_{..} - \sum_i \alpha_i - a \beta_j \\ &= a \mu_{.j} - a \mu_{..}- \sum_i (\mu_{i.} - \mu_{..}) - a(\mu_{.j}-\mu_{..}) \\ &= a \mu_{.j} - a \mu_{..} - a \mu_{..}+ a \mu_{..} - a (\mu_{.j} - \mu_{..}) \\ &= 0 \end{aligned} \]
Similarly, \(\sum_j (\alpha \beta) = 0, i = 1,...,a\) and \(\sum_i \sum_j (\alpha \beta)_{ij} =0\), \(\sum_i \alpha_i = 0\), \(\sum_j \beta_j = 0\)
21.1.3.1.2 Factor Effects Model
\[ \begin{aligned} \mu_{ij} &= \mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij} \\ Y_{ijk} &= \mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij} + \epsilon_{ijk} \end{aligned} \]
where
- \(\mu_{..}\) is a constant
- \(\alpha_i\) are constants subject to the restriction \(\sum_i \alpha_i=0\)
- \(\beta_j\) are constants subject to the restriction \(\sum_j \beta_j = 0\)
- \((\alpha \beta)_{ij}\) are constants subject to the restriction \(\sum_i(\alpha \beta)_{ij} = 0\) for \(j=1,...,b\) and \(\sum_j(\alpha \beta)_{ij} = 0\) for \(i = 1,...,a\)
- \(\epsilon_{ijk} \sim \text{indep } N(0,\sigma^2)\) for \(k = 1,..,n\)
We have
\[ \begin{aligned} E(Y_{ijk}) &= \mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij}\\ var(Y_{ijk}) &= \sigma^2 \\ Y_{ijk} &\sim N (\mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij}, \sigma^2) \end{aligned} \]
We have \(1+a+b+ab\) parameters. But there are \(ab\) parameters in the Cell Means Model. In the Factor Effects Model, the restrictions limit the number of parameters that can be estimated:
\[ \begin{aligned} 1 &\text{ for } \mu_{..} \\ (a-1) &\text{ for } \alpha_i \\ (b-1) &\text{ for } \beta_j \\ (a-1)(b-1) &\text{ for } (\alpha \beta)_{ij} \end{aligned} \]
Hence, there are
\[ 1 + a - 1 + b - 1 + ab - a- b + 1 = ab \]
parameters in the model.
We can have several restrictions when considering the model in the form \(\mathbf{Y} = \mathbf{X} \beta + \epsilon\)
One way:
\[ \begin{aligned} \alpha_a &= \alpha_1 - \alpha_2 - ... - \alpha_{a-1} \\ \beta_b &= -\beta_1 - \beta_2 - ... - \beta_{b-1} \\ (\alpha \beta)_{ib} &= -(\alpha \beta)_{i1} -(\alpha \beta)_{i2} -...-(\alpha \beta)_{i,b-1} ; i = 1,..,a \\ (\alpha \beta)_{aj}& = -(\alpha \beta)_{1j}-(\alpha \beta)_{2j} - ... -(\alpha \beta)_{a-1,j}; j = 1,..,b \end{aligned} \]
We can fit the model by least squares or maximum likelihood
Cell Means Model
minimize
\[ Q = \sum_i \sum_j \sum_k (Y_{ijk}-\mu_{ij})^2 \]
estimators
\[ \begin{aligned} \hat{\mu}_{ij} &= \bar{Y}_{ij} \\ \hat{Y}_{ijk} &= \bar{Y}_{ij} \\ e_{ijk} = Y_{ijk} - \hat{Y}_{ijk} &= Y_{ijk} - \bar{Y}_{ij} \end{aligned} \]
Factor Effects Model
\[ Q = \sum_i \sum_j \sum_k (Y_{ijk} - \mu_{..}-\alpha_i = \beta_j - (\alpha \beta)_{ij})^2 \]
subject to the restrictions
\[ \begin{aligned} \sum_i \alpha_i &= 0 \\ \sum_j \beta_j &= 0 \\ \sum_i (\alpha \beta)_{ij} &= 0 \\ \sum_j (\alpha \beta)_{ij} &= 0 \end{aligned} \]
estimators
\[ \begin{aligned} \hat{\mu}_{..} &= \bar{Y}_{...} \\ \hat{\alpha}_i &= \bar{Y}_{i..} - \bar{Y}_{...} \\ \hat{\beta}_j &= \bar{Y}_{.j.}-\bar{Y}_{...} \\ (\hat{\alpha \beta})_{ij} &= \bar{Y}_{ij.} - \bar{Y}_{i..} - \bar{Y}_{.j.}+ \bar{Y}_{...} \end{aligned} \]
The fitted values
\[ \hat{Y}_{ijk} = \bar{Y}_{...}+ (\bar{Y}_{i..}- \bar{Y}_{...})+ (\bar{Y}_{.j.}- \bar{Y}_{...}) + (\bar{Y}_{ij.} - \bar{Y}_{i..}-\bar{Y}_{.j.}+\bar{Y}_{...}) = \bar{Y}_{ij.} \]
where
\[ \begin{aligned} e_{ijk} &= Y_{ijk} - \bar{Y}_{ij.} \\ e_{ijk} &\sim \text{ indep } (0,\sigma^2) \end{aligned} \]
and
\[ \begin{aligned} s^2_{\hat{\mu}..} &= \frac{MSE}{nab} \\ s^2_{\hat{\alpha}_i} &= MSE(\frac{1}{nb} - \frac{1}{nab}) \\ s^2_{\hat{\beta}_j} &= MSE(\frac{1}{na} - \frac{1}{nab}) \\ s^2_{(\hat{\alpha\beta})_{ij}} &= MSE (\frac{1}{n} - \frac{1}{na}- \frac{1}{nb} + \frac{1}{nab}) \end{aligned} \]
21.1.3.1.2.1 Partitioning the Total Sum of Squares
\[ Y_{ijk} - \bar{Y}_{...} = \bar{Y}_{ij.} - \bar{Y}_{...} + Y_{ijk} - \bar{Y}_{ij.} \]
\(Y_{ijk} - \bar{Y}_{...}\): Total deviation
\(\bar{Y}_{ij.} - \bar{Y}_{...}\): Deviation of treatment mean from overall mean
\(Y_{ijk} - \bar{Y}_{ij.}\): Deviation of observation around treatment mean (residual).
\[ \begin{aligned} \sum_i \sum_j \sum_k (Y_{ijk} - \bar{Y}_{...})^2 &= n \sum_i \sum_j (\bar{Y}_{ij.}- \bar{Y}_{...})^2+ \sum_i \sum_j sum_k (Y_{ijk} - \bar{ij.})^2 \\ SSTO &= SSTR + SSE \end{aligned} \]
(cross product terms are 0)
\[ \bar{Y}_{ij.}- \bar{Y}_{...} = \bar{Y}_{i..}-\bar{Y}_{...} + \bar{Y}_{.j.}-\bar{Y}_{...} + \bar{Y}_{ij.} - \bar{Y}_{i..} - \bar{Y}_{.j.} + \bar{Y}_{...} \]
squaring and summing:
\[ \begin{aligned} n\sum_i \sum_j (\bar{Y}_{ij.}-\bar{Y}_{...})^2 &= nb\sum_i (\bar{Y}_{i..}-\bar{Y}_{...})^2 + na \sum_j (\bar{Y}_{.j.}-\bar{Y}_{...})^2 \\ &+ n \sum_i \sum_j (\bar{Y}_{ij.}-\bar{Y}_{i..}- \bar{Y}_{.j.}+ \bar{Y}_{...})^2 \\ SSTR &= SSA + SSB + SSAB \end{aligned} \]
The interaction term from
\[ \begin{aligned} SSAB &= SSTO - SSE - SSA - SSB \\ SSAB &= SSTR - SSA - SSB \end{aligned} \]
where
- \(SSA\) is the factor \(A\) sum of squares (measures the variability of the estimated factor \(A\) level means \(\bar{Y}_{i..}\))- the more variable, the larger \(SSA\)
- \(SSB\) is the factor \(B\) sum of squares
- \(SSAB\) is the interaction sum of squares, measuring the variability of the estimated interactions.
21.1.3.1.2.2 Partitioning the df
\(N = abn\) cases and \(ab\) treatments.
For one-way ANOVA and regression, the partition has df:
\[ SS: SSTO = SSTR + SSE \]
\[ df: N-1 = (ab-1) + (N-ab) \]
we must further partition the \(ab-1\) df with SSTR
\[ SSTR = SSA + SSB + SSAB \]
\[ ab-1 = (a-1) + (b-1) + (a-1)(b-1) \]
- \(df_{SSA} = a-1\): a treatment deviations but 1 df is lost due to the restriction \(\sum (\bar{Y}_{i..}- \bar{Y}_{...})=0\)
- \(df_{SSB} = b-1\): b treatment deviations but 1 df is lost due to the restriction \(\sum (\bar{Y}_{.j.}- \bar{Y}_{...})=0\)
- \(df_{SSAB} = (a-1)(b-1)= (ab-1)-(a-1)-(b-1)\): ab interactions, there are (a+b-1) restrictions, so df = ab-a-(b-1)= (a-1)(b-1)
21.1.3.1.2.3 Mean Squares
\[ \begin{aligned} MSA &= \frac{SSA}{a-1}\\ MSB &= \frac{SSB}{b-1}\\ MSAB &= \frac{SSAB}{(a-1)(b-1)} \end{aligned} \]
The expected mean squares are
\[ \begin{aligned} E(MSE) &= \sigma^2 \\ E(MSA) &= \sigma^2 + nb \frac{\sum \alpha_i^2}{a-1} = \sigma^2 + nb \frac{\sum(\sum_{i.}-\mu_{..})^2}{a-1} \\ E(MSB) &= \sigma^2 + na \frac{\sum \beta_i^2}{b-1} = \sigma^2 + na \frac{\sum(\sum_{.j}-\mu_{..})^2}{b-1} \\ E(MSAB) &= \sigma^2 + n \frac{\sum \sum (\alpha \beta)_{ij}^2}{(a-1)(b-1)} = \sigma^2 + n \frac{\sum (\mu_{ij}- \mu_{i.}- \mu_{.j}+ \mu_{..} )^2}{(a-1)(b-1)} \end{aligned} \]
If there are no factor A main effects (all \(\mu_{i.} = 0\) or \(\alpha_i = 0\)) the MSA and MSE have the same expectation; otherwise MSA > MSE. Same for factor B, and interaction effects. which case we can examine F-statistics.
Interaction
\[ \begin{aligned} H_0: \mu_{ij}- \mu_{i.} - \mu_{.j} + \mu_{..} = 0 && \text{for all i,j} \\ H_a: \mu_{ij}- \mu_{i.} - \mu_{.j} + \mu_{..} \neq 0 && \text{for some i,j} \end{aligned} \]
or
\[ \begin{aligned} &H_0: \text{All}(\alpha \beta)_{ij} = 0 \\ &H_a: \text{Not all} (\alpha \beta) = 0 \end{aligned} \]
Let \(F = \frac{MSAB}{MSE}\). When \(H_0\) is true \(F \sim f_{((a-1)(b-1),ab(n-1))}\). So reject \(H_0\) when \(F > f_{((a-1)(b-1),ab(n-1))}\)
Factor A main effects:
\[ \begin{aligned} &H_0: \mu_{1.} = \mu_{2.} = ... = \mu_{a.} \\ &H_a: \text{Not all $\mu_{i.}$ are equal} \end{aligned} \]
or
\[ \begin{aligned} &H_0: \alpha_1 = ... = \alpha_a = 0 \\ &H_a: \text{Not all $\alpha_i$ are equal to 0} \end{aligned} \]
\(F= \frac{MSA}{MSE}\) and reject \(H_0\) if \(F>f_{(1-\alpha;a-1,ab(n-1))}\)
21.1.3.1.2.4 Two-way ANOVA
Source of Variation | SS | df | MS | F |
---|---|---|---|---|
Factor A | \(SSA\) | \(a-1\) | \(MSA = SSA/(a-1)\) | \(MSA/MSE\) |
Factor B | \(SSB\) | \(b-1\) | \(MSB = SSB/(b-1)\) | \(MSB/MSE\) |
AB interactions | \(SSAB\) | \((a-1)(b-1)\) | \(MSAB = SSAB /MSE\) | |
Error | \(SSE\) | \(ab(n-1)\) | \(MSE = SSE/ab(n-1)\) | |
Total (corrected) | \(SSTO\) | \(abn - 1\) |
Doing 2-way ANOVA means you always check interaction first, because if there are significant interactions, checking the significance of the main effects becomes moot.
The main effects concern the mean responses for levels of one factor averaged over the levels of the other factor. When interaction is present, we can’t conclude that a given factor has no effect, even if these averages are the same. It means that the effect of the factor depends on the level of the other factor.
On the other hand, if you can establish that there is no interaction, then you can consider inference on the factor main effects, which are then said to be additive.
And we can also compare factor means like the Single Factor Fixed Effects Model using Tukey, Scheffe, Bonferroni.
We can also consider contrasts in the 2-way model
\[ L = \sum c_i \mu_i \]
where \(\sum c_i =0\)
which is estimated by
\[ \hat{L} = \sum c_i \bar{Y}_{i..} \]
with variance
\[ \sigma^2(\hat{L}) = \frac{\sigma^2}{bn} \sum c_i^2 \]
and variance estimate
\[ \frac{MSE}{bn} \sum c_i^2 \]
Orthogonal Contrasts
\[ \begin{aligned} L_1 &= \sum c_i \mu_i, \sum c_i = 0 \\ L_2 &= \sum d_i \mu_i , \sum d_i = 0 \end{aligned} \]
these contrasts are said to be orthogonal if
\[ \sum \frac{c_i d_i}{n_i} = 0 \]
in balanced case \(\sum c_i d_i =0\)
\[ \begin{aligned} cov(\hat{L}_1, \hat{L}_2) &= cov(\sum_i c_i \bar{Y}_{i..}, \sum_l d_l \bar{Y}_{l..}) \\ &= \sum_i \sum_l c_i d_l cov(\bar{Y}_{i..},\bar{Y}_{l..}) \\ &= \sum_i c_i d_i \frac{\sigma^2}{bn} = 0 \end{aligned} \]
Orthogonal contrasts can be used to further partition the model sum of squares. There are many sets of orthogonal contrasts and thus, many ways to partition the sum of squares.
A special set of orthogonal contrasts that are used when the levels of a factor can be assigned values on a metric scale are called orthogonal polynomials
Coefficients can be found for the special case of
- equal spaced levels (e.g., (0 15 30 45 60))
- equal sample sizes (\(n_1 = n_2 = ... = n_{ab}\))
We can define the SS for a given contrast:
\[ SS_L = \frac{\hat{L}^2}{\sum_{i=1}^a (c^2_i/bn_i)} \]
\[ T = \frac{\hat{L}}{\sqrt{MSE\sum_{i=1}^a(c_i^2/bn_i)}} \sim t \]
Moreover,
\[ t^2_{(1-\alpha/2;df)}=F_{(1-\alpha;1,df)} \]
So,
\[ \frac{SS_L}{MSE} \sim F_{(1-\alpha;1,df_{MSE})} \]
all contrasts have d.f = 1
21.1.3.2 Unbalanced
We could have unequal numbers of replications for all treatment combinations:
- Observational studies
- Dropouts in designed studies
- Larger sample sizes for inexpensive treatments
- Sample sizes to match population makeup.
Assume that each factor combination has at least 1 observation (no empty cells)
Consider the same model as:
\[ Y_{ijk} = \mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij} + \epsilon_{ijk} \]
where sample sizes are: \(n_{ij}\):
\[ \begin{aligned} n_{i.} &= \sum_j n_{ij} \\ n_{.j} &= \sum_i n_{ij} \\ n_T &= \sum_i \sum_j n_{ij} \end{aligned} \]
Problem here is that
\[ SSTO \neq SSA + SSB + SSAB + SSE \]
(the design is non-orthogonal)
- For \(i = 1,...,a-1,\)
\[ u_i = \begin{cases} +1 & \text{if the obs is from the i-th level of Factor 1} \\ -1 & \text{if the obs is from the a-th level of Factor 1} \\ 0 & \text{otherwise} \\ \end{cases} \]
- For \(j=1,...,b-1\)
\[ v_i = \begin{cases} +1 & \text{if the obs is from the j-th level of Factor 1} \\ -1 & \text{if the obs is from the b-th level of Factor 1} \\ 0 & \text{otherwise} \\ \end{cases} \]
We can use these indicator variables as predictor variables and \(\mu_{..}, \alpha_i ,\beta_j, (\alpha \beta)_{ij}\) as unknown parameters.
\[ Y = \mu_{..} + \sum_{i=1}^{a-1} \alpha_i u_i + \sum_{j=1}^{b-1} \beta_j v_j + \sum_{i=1}^{a-1} \sum_{j=1}^{b-1}(\alpha \beta)_{ij} u_i v_j + \epsilon \]
To test hypotheses, we use the extra sum of squares idea.
For interaction effects
\[ \begin{aligned} &H_0: all (\alpha \beta)_{ij} = 0 \\ &H_a: \text{not all }(\alpha \beta)_{ij} =0 \end{aligned} \]
Or to test
\[ \begin{aligned} &H_0: \beta_1 = \beta_2 = \beta_3 = 0 \\ &H_a: \text{not all } \beta_j = 0 \end{aligned} \]
Analysis of Factor Means
(e.g., contrasts) is analogous to the balanced case, with modifications in the formulas for means and standard errors to account for unequal sample sizes.
Or , we can fit the cell means model and consider it from a regression perspective
If you have empty cells (i.e., some factor combinations have no observation), then the equivalent regression approach can’t be used. But you can still do partial analyses
21.1.4 Two-Way Random Effects ANOVA
\[ Y_{ijk} = \mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij} + \epsilon_{ij} \]
where
- \(\mu_{..}\): constant
- \(\alpha_i \sim N(0,\sigma^2_{\alpha}), i = 1,..,a\) (independent)
- \(\beta_j \sim N(0,\sigma^2_{\beta}), j = 1,..,b\) (independent)
- \((\alpha \beta)_{ij} \sim N(0,\sigma^2_{\alpha \beta}),i=1,...,a,j=1,..,b\) (independent)
- \(\epsilon_{ijk} \sim N(0,\sigma^2)\) (independent)
All \(\alpha_i, \beta_j, (\alpha \beta)_{ij}\) are pairwise independent
Theoretical means, variances, and covariances are
\[ \begin{aligned} E(Y_{ijk}) &= \mu_{..} \\ var(Y_{ijk}) &= \sigma^2_Y= \sigma^2_\alpha + \sigma^2_\beta + \sigma^2_{\alpha \beta} + \sigma^2 \end{aligned} \]
So
\(Y_{ijk} \sim N(\mu_{..},\sigma^2_\alpha + \sigma^2_\beta + \sigma^2_{\alpha \beta} + \sigma^2)\)
\[ \begin{aligned} cov(Y_{ijk},Y_{ij'k'}) &= \sigma^2_{\alpha}, j \neq j' \\ cov(Y_{ijk},Y_{i'jk'}) &= \sigma^2_{\beta}, i \neq i'\\ cov(Y_{ijk},Y_{ijk'}) &= \sigma^2_\alpha + \sigma^2_{\beta} + \sigma^2_{\alpha \beta}, k \neq k' \\ cov(Y_{ijk},Y_{i'j'k'}) &= , i \neq i', j \neq j' \end{aligned} \]
21.1.5 Two-Way Mixed Effects ANOVA
21.1.5.1 Balanced
One fixed factor, while other is random treatment levels, we have a mixed effects model or a mixed model
Restricted mixed model for 2-way ANOVA:
\[ Y_{ijk} = \mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij} + \epsilon_{ijk} \]
where
- \(\mu_{..}\): constant
- \(\alpha_i\): fixed effects with constraints subject to restriction \(\sum \alpha_i = 0\)
- \(\beta_j \sim indep N(0,\sigma^2_\beta)\)
- \((\alpha \beta)_{ij} \sim N(0,\frac{a-1}{a}\sigma^2_{\alpha \beta})\) subject to restriction \(\sum_i (\alpha \beta)_{ij} = 0\) for all j, the variance here is written as the proportion for convenience; it makes the expected mean squares simpler (other assumed \(var((\alpha \beta)_{ij}= \sigma^2_{\alpha \beta}\))
- \(cov((\alpha \beta)_{ij},(\alpha \beta)_{i'j'}) = - \frac{1}{a} \sigma^2_{\alpha \beta}, i \neq i'\)
- \(\epsilon_{ijk}\sim indepN(0,\sigma^2)\)
- \(\beta_j, (\alpha \beta)_{ij}, \epsilon_{ijk}\) are pairwise independent
Two-way mixed models are written in an “unrestricted” form, with no restrictions on the interaction effects \((\alpha \beta)_{ij}\), they are pairwise independent.
Let \(\beta^*, (\alpha \beta)^*_{ij}\) be the unrestricted random effects, and \((\bar{\alpha \beta})_{ij}^*\) the means averaged over the fixed factor for each level of random factor B.
\[ \begin{aligned} \beta_j &= \beta_j^* + (\bar{\alpha \beta})_{ij}^* \\ (\alpha \beta)_{ij} &= (\alpha \beta)_{ij}^* - (\bar{\alpha \beta})_{ij}^* \end{aligned} \]
Some consider the restricted model to be more general. but here we consider the restricted form.
\[ \begin{aligned} E(Y_{ijk}) &= \mu_{..} + \alpha_i \\ var(Y_{ijk}) &= \sigma^2_\beta + \frac{a-1}{a} \sigma^2_{\alpha \beta} + \sigma^2 \end{aligned} \]
Responses from the same random factor \((B)\) level are correlated
\[ \begin{aligned} cov(Y_{ijk},Y_{ijk'}) &= E(Y_{ijk}Y_{ijk'}) - E(Y_{ijk})E(Y_{ijk'}) \\ &= \sigma^2_\beta + \frac{a-1}{a} \sigma^2_{\alpha \beta} , k \neq k' \end{aligned} \]
Similarly,
\[ \begin{aligned} cov(Y_{ijk},Y_{i'jk'}) &= \sigma^2_\beta - \frac{1}{a} \sigma^2_{\alpha\ \beta}, i \neq i' \\ cov(Y_{ijk},Y_{i'j'k'}) &= 0, j \neq j' \end{aligned} \]
Hence, you can see that the only way you don’t have dependence in the \(Y\) is when they don’t share the same random effect.
An advantage of the restricted mixed model is that 2 observations from the same random factor b level can be positively or negatively correlated. In the unrestricted model, they can only be positively correlated.
Mean Square | Fixed ANOVA (A, B Fixed) |
Random ANOVA (A,B random) |
Mixed ANVOA (A fixed, B random) |
---|---|---|---|
MSA | a - 1 | \(\sigma ^2+ n b \frac{\sum\alpha_i^2}{a-1}\) | \(\sigma^2 + nb\sigma^ 2_ \alpha +n \sigma^ 2_{\alpha \beta}\) |
MSB | b-1 | \(\sigma^2 + n a \frac{\sum\beta ^2_j}{b-1}\) | \(\sigma^ 2 + na\sigma^2_ \beta +n \sigma^ 2_{\alpha \beta}\) |
MSAB | ( a-1)(b-1) | \(\sigma^2 + n \frac{\sum \sum(\alpha \beta )^2_ {ij}} { ( a-1)(b-1)}\) | \(\sigma^2+n \sigma^2_{\alpha \beta}\) |
MSE | (n-1)ab | \(\sigma^2\) | \(\sigma^2\) |
For fixed, random, and mixed models (balanced), the ANOVA table sums of squares calculations are identical. (also true for df and mean squares). The only difference is with the expected mean squares, thus the test statistics.
In Random ANOVA, we test
\[ \begin{aligned} &H_0: \sigma^2 = 0 \\ &H_a: \sigma^2 > 0 \end{aligned} \]
by considering \(F= \frac{MSA}{MSAB} \sim F_{a-1;(a-1)(b-1)}\)
The same test statistic is used for mixed models, but in that case we are testing null hypothesis that all of the \(\alpha_i = 0\)
The test statistic different for the same null hypothesis under the fixed effects model.
Test for effects of | Fixed ANOVA (A&B fixed) |
Random ANOVA (A&B random) |
Mixed ANOVA (A fixed, B random) |
---|---|---|---|
Factor A | \(\frac{MSA}{MSE}\) | \(\frac{MSA}{MSAB}\) | \(\frac{MSA}{MSAB}\) |
Factor B | \(\frac{MSB}{MSE}\) | \(\frac{MSB}{MSAB}\) | \(\frac{MSB}{MSE}\) |
AB interactions | \(\frac{MSAB}{MSE}\) | \(\frac{MSAB}{MSE}\) | \(\frac{MSAB}{MSE}\) |
Estimation Of Variance Components
In random and mixed effects models, we are interested in estimating the variance components
Variance component \(\sigma^2_\beta\) in the mixed ANOVA.
\[ E(\sigma^2_\beta) = \frac{E(MSB)-E(MSE)}{na} = \frac{\sigma^2 + na \sigma^2_\beta - \sigma^2}{na} = \sigma^2_\beta \]
which can be estimated with
\[ \hat{\sigma}^2_\beta = \frac{MSB - MSE}{na} \]
Confidence intervals for variance components can be constructed (approximately) by using the Satterthwaite procedure or the MLS procedure (like the 1-way random effects)
Estimation of Fixed Effects in Mixed Models
\[ \begin{aligned} \hat{\alpha}_i &= \bar{Y}_{i..} - \bar{Y}_{...} \\ \hat{\mu}_{i.} &= \bar{Y}_{...} + (\bar{Y}_{i..}- \bar{Y}_{...}) = \bar{Y}_{i..} \\ \sigma^2(\hat{\alpha}_i) &= \frac{\sigma^2 + n \sigma^2_{\alpha \beta}}{bn} = \frac{E(MSAB)}{bn} \\ s^2(\hat{\alpha}_i) &= \frac{MSAB}{bn} \end{aligned} \]
Contrasts on the Fixed Effects
\[ \begin{aligned} L &= \sum c_i \alpha_i \\ \sum c_i &= 0 \\ \hat{L} &= \sum c_i \hat{\alpha}_i \\ \sigma^2(\hat{L}) &= \sum c^2_i \sigma^2 (\hat{\alpha}_i) \\ s^2(\hat{L}) &= \frac{MSAB}{bn} \sum c^2_i \end{aligned} \]
Confidence intervals and tests can be constructed as usual
21.1.5.2 Unbalanced
For a mixed model with a = 2, b = 4
\[ \begin{aligned} Y_{ijk} &= \mu_{..} + \alpha_i + \beta_j + (\alpha \beta)_{ij} + \epsilon_{ijk} \\ var(\beta_j)&= \sigma^2_\beta \\ var((\alpha \beta)_{ij})&= \frac{2-1}{2}\sigma^2_{\alpha \beta} = \frac{\sigma^2_{\alpha \beta}}{2} \\ var(\epsilon_{ijk}) &= \sigma^2 \\ E(Y_{ijk}) &= \mu_{..} + \alpha_i \\ var(Y_{ijk}) &= \sigma^2_{\beta} + \frac{\sigma^2_{\alpha \beta}}{2} + \sigma^2 \\ cov(Y_{ijk},Y_{ijk'}) &= \sigma^2 + \frac{\sigma^2_{\alpha \beta}}{2}, k \neq k' \\ cov(Y_{ijk},Y_{i'jk'}) &= \sigma^2_{\beta} - \frac{\sigma^2_{\alpha \beta}}{2}, i \neq i' \\ cov(Y_{ijk},Y_{i'j'k'}) &= 0, j \neq j' \end{aligned} \]
assume
\[ \mathbf{Y} \sim N(\mathbf{X}\beta, M) \]
where \(M\) is block diagonal
density function
\[ f(\mathbf{Y}) = \frac{1}{(2\pi)^{N/2}|M|^{1/2}}exp(-\frac{1}{2}\mathbf{(Y - X \beta)' M^{-1}(Y-X\beta)}) \]
if we knew the variance components, we could use GLS:
\[ \hat{\beta}_{GLS} = \mathbf{(X'M^{-1}X)^{-1}X'M^{-1}Y} \]
but we usually don’t know the variance components \(\sigma^2, \sigma^2_\beta, \sigma^2_{\alpha \beta}\) that make up \(M\)
Another way to get estimates is by Maximum likelihood estimation
we try to maximize its log
\[ \ln L = - \frac{N}{2} \ln (2\pi) - \frac{1}{2}\ln|M| - \frac{1}{2} \mathbf{(Y-X \beta)'\Sigma^{-1}(Y-X\beta)} \]