## 21.2 Nonparametric ANOVA

### 21.2.1 Kruskal-Wallis

Generalization of independent samples Wilcoxon Rank sum test for 2 independent samples (like F-test of one-way ANOVA is a generalization to several independent samples of the two sample t-test)

Consider the one-way case:

We have

• $$a\ge2$$ treatments
• $$n_i$$ is the sample size for the $$i$$-th treatment
• $$Y_{ij}$$ is the $$j$$-th observation from the $$i$$-th treatment.
• we make no assumption of normality
• We only assume that observations on the $$i$$-th treatment are a random sample from the continuous CDF $$F_i$$, i = 1,..,n, and are mutually independent.

\begin{aligned} &H_0: F_1 = F_2 = ... = F_a \\ &H_a: F_i < F_j \text{ for some } i \neq j \end{aligned}

or if distribution is from the location-scale family, $$H_0: \theta_1 = \theta_2 = ... = \theta_a$$)

Procedure

• Rank all $$N = \sum_{i=1}^a n_i$$ observations in ascending order. Let $$r_{ij} = rank(Y_{ij})$$, note $$\sum_i \sum_j r_{ij} = 1 + 2 .. + N = \frac{N(N+1)}{2}$$
• Calculate the rank sums and averages:
$r_{i.} = \sum_{j=1}^{n_i} r_{ij}$ and $\bar{r}_{i.} = \frac{r_{i.}}{n_i}, i = 1,..,a$
• Calculate the test statistic on the ranks: $\chi_{KW}^2 = \frac{SSTR}{\frac{SSTO}{N-1}}$ where $$SSTR = \sum n_i (\bar{r}_{i.}- \bar{r}_{..})^2$$ and $$SSTO = \sum \sum (\bar{r}_{ij}- \bar{r}_{..})^2$$
• For large $$n_i$$ ($$\ge 5$$ observations) the Kruskal-Wallis statistic is approximated by a $$\chi^2_{a-1}$$ distribution when all the treatment means are equal. Hence, reject $$H_0$$ if $$\chi^2_{KW} > \chi^2_{(1-\alpha;a-1)}$$.
• If sample sizes are small, one can exhaustively work out all possible distinct ways of assigning N ranks to the observations from a treatments and calculate the value of the KW statistic in each case ($$\frac{N!}{n_1!..n_a!}$$ possible combinations). Under $$H_0$$ all of these assignments are equally likely.

### 21.2.2 Friedman Test

When the responses $$Y_{ij} = 1,..,n, j = 1,..,r$$ in a randomized complete block design are not normally distributed (or do not have constant variance), a nonparametric test is more helpful.

A distribution-free rank-based test for comparing the treatments in this setting is the Friedman test. Let $$F_{ij}$$ be the CDF of random $$Y_{ij}$$, corresponding to the observed value $$y_{ij}$$

Under the null hypothesis, $$F_{ij}$$ are identical for all treatments j separately for each block i.

\begin{aligned} &H_0: F_{i1} = F_{i2} = ... = F_{ir} \text{ for all i} \\ &H_a: F_{ij} < F_{ij'} \text{ for some } j \neq j' \text{ for all } i \end{aligned}

For location parameter distributions, treatment effects can be tested:

\begin{aligned} &H_0: \tau_1 = \tau_2 = ... = \tau_r \\ &H_a: \tau_j > \tau_{j'} \text{ for some } j \neq j' \end{aligned}

Procedure

• Rank observations from the r treatments separately within each block (in ascending order; if ties, each tied observation is given the mean of ranks involved). Let the ranks be called $$r_{ij}$$
• Calculate the Friedman test statistic
$\chi^2_F = \frac{SSTR}{\frac{SSTR + SSE}{n(r-1)}}$ where \begin{aligned} SSTR &= n \sum (\bar{r}_{.j}-\bar{r}_{..})^2 \\ SSE &= \sum \sum (r_{ij} - \bar{r}_{.j})^2 \\ \bar{r}_{.j} &= \frac{\sum_i r_{ij}}{n}\\ \bar{r}_{..} &= \frac{r+1}{2} \end{aligned}

If there is no ties, it can be rewritten as

$\chi^2_{F} = [\frac{12}{nr(n+1)}\sum_j r_{.j}^2] - 3n(r+1)$

with large number of blocks, $$\chi^2_F$$ is approximately $$\chi^2_{r-1}$$ under $$H_0$$. Hence, we reject $$H_0$$ if $$\chi^2_F > \chi^2_{(1-\alpha;r-1)}$$
The exact null distribution for $$\chi^2_F$$ can be derived since there are r! possible ways of assigning ranks 1,2,…,r to the r observations within each block. There are n blocks and thus $$(r!)^n$$ possible assignments to the ranks, which are equally likely when $$H_0$$ is true.