Chapter 6 Week 8 - Multiple Treatment Comparisons and LSD

Outline:

Analysis Of Variance Continued

Multiple Comparisons of Treatment Means

Introduction

The Protected (Extended) t test

Least Significant Differences - LSD

Worked ANOVA Examples

Using R

See lecture notes in week 7.

Accompanying Workshop - done in week 9

The analysis of variance process and multiple comparisons of means - when the ANOVA rejects \(H_0\)

Workshop for week 8

Based on lectures in week 7

Project Requirements for Week 8

Nil.

Assessment for Week 8

Your second quiz worth 7% is this week.

6.1 Multiple Comparisons of Treatment Means

6.1.1 Introduction

In the ANOVA, the F-test is used to test the overall hypothesis of equality of all treatment means.

If \(H_0\) is false, i.e. some difference (or differences) do exist, interest lies in determining where the differences do occur; which treatment means are different.

A number of tests exist for comparisons of multiple treatment means, the most common of which is the extended (or protected) t-test. Other tests in common usage include:

Tukey’s q
Student-Neuman-Keull’s Multiple Range Test
Scheffe’s Test
Bonferroni
Duncan’s Multiple Range Test

6.1.2 The Protected (Extended) t-test

The original t-test was designed to compare two treatment means. If this test is simply extended and used to carry out all possible pairwise comparisons between more than two treatments, spurious significance may be found simply because so many of the tests are done - each test may be at a prescribed level of significance related to its specific type I error probability, but over all the possible tests this probability of error (the experimentwise error) may be quite different.

To overcome this problem a requirement is imposed that the F-test in the ANOVA must be significant before any t-tests are carried out. If the overall test for significance says that there are no significant differences then no further testing is carried out. The t-test with this conditioning on the outcome of the F-test is known as the Protected t- test.

Even though significant differences may occur in pairwise t-testing, if the F-test in the ANOVA is not significant, the null hypothesis that all treatment means are equal must be accepted.

Providing the F-test is significant, at least two treatment means will be detected as significantly different when the t-tests are done.

The treatment means are considered in pairs and each pair is tested using the standard t-test. EXCEPT that: in applying each test, the standard deviation used is that obtained from the error mean square in the ANOVA.

Similarly, the degrees of freedom appropriate for each individual t-test are the error degrees of freedom in the ANOVA.

IMPORTANT NOTE

Remember the extended t-test as described above must only be applied after a significant F-test has been found. This proviso gives a “protection” to the test to prevent the detection of false significant differences which can arise simply by comparing the highest and lowest of a number of means.

Example Wing Thickness of Butterflies

In the week 8 notes the following ANOVA was given for the wing thickness of butterfly species:

Source DF Sums of Squares Mean Squares Variance Ratio

Between Species 2 16.7000 8.3500 17.822

Within Species 9 4.2167 0.4685

Total 11 20.9167

On the basis of the \(F\)-test, the variance ratio of 17.822, we rejected the null hypothesis and concluded that the mean wing thicknesses of the 3 butterfly species were not all the same (p<0.05).

The actual means were:

Species 1 2 3

Mean Wing Thickness 4.67 6.80 7.75

Number of Replicates 3 5 4

Since the F-test was significant in the ANOVA further pairwise t-testing on the 3 means to isolate the specific differences, will be valid.

The general hypotheses will be:

\[\begin{align*} H_0:& \mu_i = \mu_j \\ H_1:& \mu_i \neq \mu_j \end{align*}\]

where \(\mu_i\) and \(\mu_j\) are the population mean wing thicknesses for the two species being compared.

The test statistic for each comparison will be

\[ T = \frac{\bar{X}_i - \bar{X}_j}{s\sqrt{\frac{1}{n_i} + \frac{1}{n_j}}} \]

where symbols are as defined in the notes on independent t- testing and \(s = \sqrt{\text{EMS}}\) from the ANOVA table.

Under \(H_0\): \(T \sim t_9\). From tables, \(t_9(0.975) = 2.262\).

Note that the degrees of freedom is always 9 in this example (error df from ANOVA) regardless of which pair of means is being tested.

Thus for each pairwise test the critical region will be: \(T < -2.262\) or \(T > 2.262\) (alternatively, we can write these two regions as \(|T| > 2.262\)).

(i) Species 1 vs Species 2

\[ T = \frac{4.67 - 6.80}{\sqrt{0.4685}\sqrt{\frac{1}{3} + \frac{1}{5}}} = \frac{-2.13}{0.6845\sqrt{0.5333}} = \frac{-2.13}{0.4999} = -4.261. \]

The calculated \(T\) lies in the critical region and thus we reject \(H_0\) in favour of \(H_1\). We conclude that the mean wing thicknesses of species 1 (4.67) and 2 (6.80) are significantly different (\(p < 0.05\)).

(ii) Species 1 vs Species 3

\[ T = \frac{4.67 - 7.75}{\sqrt{0.4685}\sqrt{\frac{1}{3} + \frac{1}{4}}} = \frac{-3.08}{0.6845\sqrt{0.5833}} = \frac{-3.08}{0.5228} = -5.891. \]

The calculated \(T\) lies in the critical region and thus we reject \(H_0\) in favour of \(H_1\). We conclude that the mean wing thicknesses of species 1 (4.67) and 3 (7.75) are significantly different (\(p < 0.05\)).

(iii) Species 2 vs Species 3

\[ T = \frac{6.80 - 7.75}{\sqrt{0.4685}\sqrt{\frac{1}{5} + \frac{1}{4}}} = \frac{-0.95}{0.6845\sqrt{0.4500}} = \frac{-0.95}{0.4592} = -2.069. \]

The calculated \(T\) does not lie in the critical region and thus we cannot reject \(H_0\). We conclude that there is insufficient evidence in the data to suggest that the mean wing thicknesses of species 2 (6.80) and 3 (7.75) are significantly different (\(p \geq 0.05\)).

The overall conclusion is that species 2 and 3 do not differ with respect to mean wing thickness. Species 1 butterflies have, on average, a mean wing thickness significantly less than that of butterflies of species 2 and 3 (\(\alpha = 0.05\)).

Source	DF	Sums of Squares	Mean Squares	Variance Ratio
Between Species	2	16.7000	8.3500	17.822
Within Species	9	4.2167	0.4685
Total	11	20.9167

Species	1	2	3
Mean Wing Thickness	4.67	6.80	7.75
Number of Replicates	3	5	4

6.1.3 Least Significant Differences - LSD

When a number of pairwise \(t\)-tests are carried out following an ANOVA, much of each calculation is common to all tests. This is especially true if there are equal replicates for some of the treatments.

The standard deviation, \(s\), and the critical region are always the same as they use the ANOVA error results. In the case of equal replications for each treatment, the standard error of the difference between the two means will be the same for all pairs of comparisons:

\[ SE_{\bar{y}_i - \bar{y}_j} = s\sqrt{\frac{2}{n}} \]

where \(s = \sqrt{\text{EMS}}\) from the ANOVA and \(n\) is the number of (equal) reps in each treatment.

Instead of evaluating the test statistic for all possible treatment pairs, the test statistic formula can be rearranged to find the smallest difference that must exist between any two means for significance to be reached. (Note that this is similar to the approach taken to find a confidence interval).

Using the traditional level of significance of 0.05, the critical value is: \(t_{\nu}(0.025)\) (these comparisions are always two-tailed). Substituting this in the equation for the test statistic T, gives:

\[ T = \frac{|\bar{y}_i - \bar{y}_j|^{\text{LSD}}}{s\sqrt{\frac{1}{n_i} + \frac{1}{n_j}}} > t_{\nu}(0.025). \]

Rearranging and solving for \(|\bar{y}_i - \bar{y}_j|^{\text{LSD}}\) gives:

\[ |\bar{y}_i - \bar{y}_j|^{\text{LSD}} > t_{\nu}(0.025) \times s \times \sqrt{\frac{1}{n_i} + \frac{1}{n_j}} \]

where \(|\bar{y}_i - \bar{y}_j|^{\text{LSD}}\) is the difference that must exist between means \(i\) and \(j\) if the test statistic is to just reach significance; that is, for \(T\) to be just larger than \(t_v(0.025)\).

When all treatments have the same number of replicates, say \(n\), only one LSD needs to be found for all the pairwise comparisons. If the replicates differ a LSD value must be calculated for every pair of differing replicates (this situation is not considered in this course).

The next step is to find all differences between the means and compare them with the relevant LSD.

Table of Mean Differences

The most efficient way of looking at the differences between the means is to construct a table of mean differences as follows:

	Means Ranked in Ascending Order \(\rightarrow\)
Means Ranked in Descending Order \(\downarrow\)	matrix of differences between treatment means

Example: Fuel Efficiency

An experiment was carried out to compare the fuel efficiency (measured as a percentage) of petrol engines using different fuels. Five fuel types were used, each replicated four times.

The statistical hypotheses are:

\[\begin{align*} H_0:& \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5 \\ H_1:& \text{The mean fuel efficiencies are not the same for all five fuel types} \end{align*}\]

where \(\mu_i\) is the mean fuel efficiency of fuel type \(i\).

Analysis of the results gave the following ANOVA table which will be completed in lectures:

Source DF Sums of Squares Mean Squares Variance Ratio

Fuel Type 106.8 8.26

Error

Total 621.15

Means Table:

Fuel Type A B C D E

Mean Efficiency (%) 93.0 84.3 64.3 92.6 88.4

\(n_i\) 4 4 4 4 4

From the tables \(F_{4,15}(0.95) = 3.06\).

The VR of 8.26 lies in the critical (rejection) region thus we reject \(H_0\). Therefore conclude that the five fuel types do not all produce the same efficiency (\(p < 0.05\)).

To determine which of the fuel types differ, t-testing can be carried out using the LSD.

\[\begin{align*} \text{LSD}(5\%, 4, 4) &= t_{15}(0.975) \times s \times \sqrt{\frac{1}{4} + \frac{1}{4}} \\ &= 2.131 \times 3.596 \times 0.7071 \\ &= 5.419. \end{align*}\]

Thus any pair of means that differs by at least 5.419 will be significantly different at the 0.05 significance level.

Table of Mean Differences:

Fuel Type \(\rightarrow\) C B E D A

Fuel Type \(\downarrow\) 64.3 84.3 88.4 92.6 93.0

A 93.0 28.7 8.7 4.6 0.4 0

D 92.6 28.3 8.3 4.2 0

E 88.4 24.1 4.1 0

B 84.3 24 0

C 64.3 0

The first entry in the table, 28.7, is the difference between the smallest mean (64.3 for fuel type C) and the largest mean (93.0 for fuel type A). Any difference value in the table greater than the caclulated LSD = 5.419 indicates a significant difference between those means at the 0.05 level of significance.

A useful way of presenting the results is as follows:

Significant Differences: A > C, B * D > C, B * E > C * B > C * (where * indicates a 5% level of significance).

The general symbols for other levels of significance are: 1% ** 0.1% ***

Conclusions

The mean efficiency of fuel type C is significantly lower than the mean efficiency of all the other fuel types (p < 0.05). Fuel types A and D have greater efficiency on average than do fuel types C and B (p < 0.05). Fuel types A, D and E appear to have the same efficiency on average (p > 0.05).

Fuel Type	A	B	C	D	E
Mean Efficiency (%)	93.0	84.3	64.3	92.6	88.4
\(n_i\)	4	4	4	4	4

	Fuel Type \(\rightarrow\)	C	B	E	D	A
Fuel Type \(\downarrow\)		64.3	84.3	88.4	92.6	93.0
A	93.0	28.7	8.7	4.6	0.4	0
D	92.6	28.3	8.3	4.2	0
E	88.4	24.1	4.1	0
B	84.3	24	0
C	64.3	0

Example: Harvester Example Revisited

Do this by hand yourself, then check your results using R.

How close are the estimates of treatment means and the standard deviation to the values we started with - remember we assumed these values and then constructed (simulated) these data?

harvesting.system <- factor(rep(c("nil", "CS1", "CS2", "new"), each = 5))
observations <- rep(55, 20)
dat <- data.frame(harvesting.system, observations)
dat$observations <- dat$observations + rnorm(20, mean = 0, sd = 10)
sys.effect <- rep(c(35, 5, -5, -35), each = 5)
dat$observations <- dat$observations + sys.effect
dat

##    harvesting.system observations
## 1                nil    85.969026
## 2                nil    77.123884
## 3                nil    92.182093
## 4                nil   100.802590
## 5                nil    94.597004
## 6                CS1    73.923636
## 7                CS1    58.127795
## 8                CS1    54.190692
## 9                CS1    85.859914
## 10               CS1    61.471669
## 11               CS2    59.124974
## 12               CS2    35.360924
## 13               CS2    30.576036
## 14               CS2    57.576205
## 15               CS2    42.240195
## 16               new    30.174202
## 17               new    30.498174
## 18               new    23.107933
## 19               new    36.150304
## 20               new     1.831334

Example: Growth Curves - Marine Birds

An ESC researcher is studying the growth rates of young marine birds on the Great Barrier Reef. The growth curve of these birds is known to be logistic in nature with a functional form as follows:

\[W = \frac{K}{1 + exp(-r(t - t_m))}\],

where:

\(W\) is the weight in grams of the individual bird at time \(t\) days;
\(K\) is the asymptotic weight of the individual bird (its adult weight);
\(r\) is the growth constant for the individual bird;
\(t\) is the time from birth in days;
\(t_m\) is the time in days to reach a weight of \(K/2\) grams.

It is believed that different species have different growth patterns which are reflected in different values for the coefficients, \(K\), \(r\) and \(t_m\). The growth curves of six individual birds from each of three species were studied. The breeds involved and the resulting coefficients are given below.

Bridled	Tern		Black	Noddy		Crested	Tern
\(K\)	\(t_m\)	\(R\)	\(K\)	\(t_m\)	\(R\)	\(K\)	\(t_m\)	\(R\)
105.0	9.7	0.1594	96.4	8.3	0.2171	280.4	12.8	0.1474
114.0	15.5	0.1193	103.4	7.1	0.1751	290.5	13.9	0.1447
124.7	14.9	0.1087	104.4	7.6	0.1707	291.5	17.2	0.1453
127.1	13.0	0.1097	115.2	10.1	0.1346	290.7	13.8	0.1342
127.7	13.7	0.0920	116.7	9.9	0.1672	292.2	12.2	0.1193
130.9	15.1	0.1242	117.7	10.1	0.1808	326.4	19.5	0.0821

Analyse these data to determine in what way (if any) the growth patterns differ between the three species. Initially consider \(t_m\) , the time to reach half the adult weight.

6.2 Using R

Refer to R section in week 7 lecture notes.