21.2 Nonparametric ANOVA
21.2.1 Kruskal-Wallis
Generalization of independent samples Wilcoxon Rank sum test for 2 independent samples (like F-test of one-way ANOVA is a generalization to several independent samples of the two sample t-test)
Consider the one-way case:
We have
- \(a\ge2\) treatments
- \(n_i\) is the sample size for the \(i\)-th treatment
- \(Y_{ij}\) is the \(j\)-th observation from the \(i\)-th treatment.
- we make no assumption of normality
- We only assume that observations on the \(i\)-th treatment are a random sample from the continuous CDF \(F_i\), i = 1,..,n, and are mutually independent.
\[ \begin{aligned} &H_0: F_1 = F_2 = ... = F_a \\ &H_a: F_i < F_j \text{ for some } i \neq j \end{aligned} \]
or if distribution is from the location-scale family, \(H_0: \theta_1 = \theta_2 = ... = \theta_a\))
Procedure
- Rank all \(N = \sum_{i=1}^a n_i\) observations in ascending order. Let \(r_{ij} = rank(Y_{ij})\), note \(\sum_i \sum_j r_{ij} = 1 + 2 .. + N = \frac{N(N+1)}{2}\)
- Calculate the rank sums and averages:
\[ r_{i.} = \sum_{j=1}^{n_i} r_{ij} \] and \[ \bar{r}_{i.} = \frac{r_{i.}}{n_i}, i = 1,..,a \] - Calculate the test statistic on the ranks: \[ \chi_{KW}^2 = \frac{SSTR}{\frac{SSTO}{N-1}} \] where \(SSTR = \sum n_i (\bar{r}_{i.}- \bar{r}_{..})^2\) and \(SSTO = \sum \sum (\bar{r}_{ij}- \bar{r}_{..})^2\)
- For large \(n_i\) (\(\ge 5\) observations) the Kruskal-Wallis statistic is approximated by a \(\chi^2_{a-1}\) distribution when all the treatment means are equal. Hence, reject \(H_0\) if \(\chi^2_{KW} > \chi^2_{(1-\alpha;a-1)}\).
- If sample sizes are small, one can exhaustively work out all possible distinct ways of assigning N ranks to the observations from a treatments and calculate the value of the KW statistic in each case (\(\frac{N!}{n_1!..n_a!}\) possible combinations). Under \(H_0\) all of these assignments are equally likely.
21.2.2 Friedman Test
When the responses \(Y_{ij} = 1,..,n, j = 1,..,r\) in a randomized complete block design are not normally distributed (or do not have constant variance), a nonparametric test is more helpful.
A distribution-free rank-based test for comparing the treatments in this setting is the Friedman test. Let \(F_{ij}\) be the CDF of random \(Y_{ij}\), corresponding to the observed value \(y_{ij}\)
Under the null hypothesis, \(F_{ij}\) are identical for all treatments j separately for each block i.
\[ \begin{aligned} &H_0: F_{i1} = F_{i2} = ... = F_{ir} \text{ for all i} \\ &H_a: F_{ij} < F_{ij'} \text{ for some } j \neq j' \text{ for all } i \end{aligned} \]
For location parameter distributions, treatment effects can be tested:
\[ \begin{aligned} &H_0: \tau_1 = \tau_2 = ... = \tau_r \\ &H_a: \tau_j > \tau_{j'} \text{ for some } j \neq j' \end{aligned} \]
Procedure
- Rank observations from the r treatments separately within each block (in ascending order; if ties, each tied observation is given the mean of ranks involved). Let the ranks be called \(r_{ij}\)
- Calculate the Friedman test statistic
\[ \chi^2_F = \frac{SSTR}{\frac{SSTR + SSE}{n(r-1)}} \] where \[ \begin{aligned} SSTR &= n \sum (\bar{r}_{.j}-\bar{r}_{..})^2 \\ SSE &= \sum \sum (r_{ij} - \bar{r}_{.j})^2 \\ \bar{r}_{.j} &= \frac{\sum_i r_{ij}}{n}\\ \bar{r}_{..} &= \frac{r+1}{2} \end{aligned} \]
If there is no ties, it can be rewritten as
\[ \chi^2_{F} = [\frac{12}{nr(n+1)}\sum_j r_{.j}^2] - 3n(r+1) \]
with large number of blocks, \(\chi^2_F\) is approximately \(\chi^2_{r-1}\) under \(H_0\). Hence, we reject \(H_0\) if \(\chi^2_F > \chi^2_{(1-\alpha;r-1)}\)
The exact null distribution for \(\chi^2_F\) can be derived since there are r! possible ways of assigning ranks 1,2,…,r to the r observations within each block. There are n blocks and thus \((r!)^n\) possible assignments to the ranks, which are equally likely when \(H_0\) is true.