24.2 Nonparametric ANOVA

When assumptions of normality and equal variance are not satisfied, we use nonparametric ANOVA tests, which rank the data instead of using raw values.

24.2.1 Kruskal-Wallis Test (One-Way Nonparametric ANOVA)

The Kruskal-Wallis test is a generalization of the Wilcoxon rank-sum test to more than two independent samples. It is an alternative to one-way ANOVA when normality is not assumed.

Setup

$a \geq 2$ independent treatments.
$n_i$ is the sample size for the $i$ -th treatment.
$Y_{ij}$ is the $j$ -th observation from the $i$ -th treatment.
No assumption of normality.
Assume observations are independent random samples from continuous CDFs $F_1, F_2, \dots, F_a$ .

Hypotheses

$\begin{aligned} &H_0: F_1 = F_2 = \dots = F_a \quad \text{(All distributions are identical)} \\ &H_a: F_i < F_j \text{ for some } i \neq j \end{aligned}$ If the data come from a location-scale family, the hypothesis simplifies to:

$H_0: \theta_1 = \theta_2 = \dots = \theta_a$

Procedure

Rank all $N = \sum_{i=1}^a n_i$ observations in ascending order.
Let $r_{ij} = rank(Y_{ij})$
The sum of ranks must satisfy:

$\sum_i \sum_j r_{ij} = \frac{N(N+1)}{2}$
Compute rank sums and averages: $r_{i.} = \sum_{j=1}^{n_i} r_{ij}, \quad \bar{r}_{i.} = \frac{r_{i.}}{n_i}$
Calculate the test statistic:

$\chi_{KW}^2 = \frac{SSTR}{\frac{SSTO}{N-1}}$

where:
- Treatment Sum of Squares: $SSTR = \sum n_i (\bar{r}_{i.} - \bar{r}_{..})^2$
- Total Sum of Squares: $SSTO = \sum_i \sum_j (r_{ij} - \bar{r}_{..})^2$
- Overall Mean Rank: $\bar{r}_{..} = \frac{N+1}{2}$
Compare to a chi-square distribution:
- For large $n_i$ ( $\geq 5$ ), $\chi^2_{KW} \sim \chi^2_{a-1}$ .
- Reject $H_0$ if: $\chi^2_{KW} > \chi^2_{(1-\alpha; a-1)}$
Exact Test for Small Samples:
- Compute all possible rank assignments:
  $\frac{N!}{n_1! n_2! \dots n_a!}$
- Evaluate each Kruskal-Wallis statistic and determine the empirical p-value.

24.2.2 Friedman Test (Nonparametric Two-Way ANOVA)

The Friedman test is a distribution-free alternative to two-way ANOVA when data are measured in a randomized complete block design and normality cannot be assumed.

Setup

$Y_{ij}$ represents responses from $n$ blocks and $r$ treatments.
Assume no normality or homogeneity of variance.
Let $F_{ij}$ be the CDF of $Y_{ij}$ , corresponding to observed values.

Hypotheses

$\begin{aligned} &H_0: F_{i1} = F_{i2} = \dots = F_{ir} \quad \forall i \quad \text{(Identical distributions within each block)} \\ &H_a: F_{ij} < F_{ij'} \text{ for some } j \neq j' \quad \forall i \end{aligned}$

For location-scale families, the hypothesis simplifies to:

$\begin{aligned} &H_0: \tau_1 = \tau_2 = \dots = \tau_r \\ &H_a: \tau_j > \tau_{j'} \text{ for some } j \neq j' \end{aligned}$

Procedure

Rank observations within each block separately (ascending order).
- If there are ties, assign average ranks.
Compute test statistic:

$\chi^2_F = \frac{SSTR}{\frac{SSTR + SSE}{n(r-1)}}$

where:
- Treatment Sum of Squares: $SSTR = n \sum (\bar{r}_{.j} - \bar{r}_{..})^2$
- Error Sum of Squares: $SSE = \sum_i \sum_j (r_{ij} - \bar{r}_{.j})^2$
- Mean Ranks: $\bar{r}_{.j} = \frac{\sum_i r_{ij}}{n}, \quad \bar{r}_{..} = \frac{r+1}{2}$
Alternative Formula for Large Samples (No Ties):

If no ties, Friedman’s statistic simplifies to:

$\chi^2_F = \left[\frac{12}{nr(n+1)} \sum_j r_{.j}^2\right] - 3n(r+1)$
Compare to a chi-square distribution:
- For large $n$ , $\chi^2_F \sim \chi^2_{r-1}$ .
- Reject $H_0$ if: $\chi^2_F > \chi^2_{(1-\alpha; r-1)}$
Exact Test for Small Samples:
- Compute all possible ranking permutations: $(r!)^n$
- Evaluate each Friedman statistic and determine the empirical p-value.