Nonparametric Methods
10 Nonparametric Methods
Nonparametric methods are statistical techniques that do not rely on strict distributional assumptions, such as normality or known population parameters. These methods are particularly useful when dealing with small samples, ordinal or categorical data, or data that contain outliers and skewness [1]; [2].
Within the framework of statistical inference (see, Figure 10.1), nonparametric methods allow researchers to draw valid conclusions even when classical parametric assumptions are violated. As a result, they are widely applied in data science, business analytics, engineering, health sciences, and social research [3].
10.1 Role of Nonparametric
Statistical inference aims to draw conclusions about a population based on sample data while accounting for uncertainty. Traditional parametric inference relies on assumptions about population parameters, such as the mean and variance. When these assumptions are questionable, nonparametric inference provides a robust alternative [1].
Instead of focusing on parameters like the mean (\(\mu\)), nonparametric methods often emphasize:
- Medians
- Distributional equality
- Ranks or signs
- Frequencies
This shift allows inference to remain valid under broader conditions.
10.2 When to Use?
Nonparametric methods are recommended when:
- The data distribution is unknown or non-normal
- Sample size is small
- Data contain outliers
- Measurement scale is ordinal or nominal
- Variance homogeneity assumptions are violated
Parametric vs Parametric Methods:
| Aspect | Parametric Methods | Nonparametric Methods |
|---|---|---|
| Distribution assumption | Required | Not required |
| Sensitivity to outliers | High | Low |
| Data scale | Interval / Ratio | Ordinal / Nominal |
| Statistical power | Higher (if assumptions met) | Lower but more robust |
10.3 Nonparametric Hypotheses
As in parametric inference, nonparametric testing is based on statistical hypotheses:
- Null Hypothesis (H₀): No difference, no effect, or no association
- Alternative Hypothesis (H₁): A difference, effect, or association exists
However, these hypotheses typically concern medians, distributions, or ranks, rather than means [2].
Example:
\[ H_0: \text{Median satisfaction score is the same across services} \]
\[ H_1: \text{Median satisfaction score differs across services} \]
10.4 Common Nonparametric
10.4.1 Sign Test
The Sign Test is one of the simplest nonparametric tests and is used to test hypotheses about a population median or the median of paired differences. Unlike parametric tests, it does not require assumptions about the underlying data distribution and is highly robust to outliers.
The Sign Test is appropriate when:
- Observations are paired (before–after or matched samples)
- The distribution of differences is unknown or highly skewed
- Data contain outliers
- Only the direction of change is reliable
Key Characteristics:
- Uses only the sign (+ or −) of differences
- Extremely robust to non-normality and outliers
- Has relatively low statistical power
Hypotheses
\[ H_0: \text{Median difference} = 0 \]
\[ H_1: \text{Median difference} \neq 0 \]
Real-World Case: Manufacturing Quality Control
A manufacturing plant introduces a new machine calibration procedure aimed at reducing product defects. For each production batch, the number of defective items is recorded before and after calibration. Due to occasional machine failures, the defect counts include extreme values and are not normally distributed.
Because the magnitude of changes is unreliable, the Sign Test is applied to determine whether the median change in defect counts differs from zero.
Step 1: Compute Paired Differences
For each batch:
\[ d_i = \text{Defects}_{\text{after}} - \text{Defects}_{\text{before}} \]
# Defect counts before and after calibration
before <- c(12, 15, 10, 18, 20, 14, 16, 22, 11, 19)
after <- c(10, 14, 11, 15, 18, 13, 15, 20, 11, 17)
# Compute paired differences
diff <- after - before
diff [1] -2 -1 1 -3 -2 -1 -1 -2 0 -2
Step 2: Assign Signs
- \(d_i > 0\): Positive sign (+)
- \(d_i < 0\): Negative sign (−)
- \(d_i = 0\): Discard the observation
Let \(n\) be the number of non-zero differences.
# Remove zero differences
diff_nonzero <- diff[diff != 0]
# Assign signs
signs <- ifelse(diff_nonzero > 0, "+", "-")
data.frame(Difference = diff_nonzero, Sign = signs) Difference Sign
1 -2 -
2 -1 -
3 1 +
4 -3 -
5 -2 -
6 -1 -
7 -1 -
8 -2 -
9 -2 -
Step 3: Count Signs
- Number of positive signs: \(n_+\)
- Number of negative signs: \(n_-\)
n_pos <- sum(diff_nonzero > 0)
n_neg <- sum(diff_nonzero < 0)
n_pos[1] 1
n_neg[1] 8
Under \(H_0\), positive and negative signs are equally likely.
Step 4: Test Statistic
The test statistic is defined as:
\[ X = \min(n_+, n_-) \]
# Test statistic
X <- min(n_pos, n_neg)
X[1] 1
Under the null hypothesis:
\[ X \sim \text{Binomial}(n, 0.5) \]
# Perform Sign Test using binom.test()
binom.test(n_pos, n_pos + n_neg, p = 0.5, alternative = "two.sided")
Exact binomial test
data: n_pos and n_pos + n_neg
number of successes = 1, number of trials = 9, p-value = 0.03906
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
0.002809137 0.482496515
sample estimates:
probability of success
0.1111111
Step 5: Decision Rule
- Compute the p-value using the binomial distribution
- Reject \(H_0\) if:
\[ \text{p-value} < \alpha \]
Interpretation:
- Reject \(H_0\): There is sufficient evidence that the median difference is not zero, indicating that the calibration has a significant effect on defect counts.
- Fail to reject \(H_0\): There is insufficient evidence to conclude that the calibration affects product defects.
The Sign Test is a reliable and robust nonparametric method for analyzing paired data when distributional assumptions are violated. Although it has lower power than alternative tests, it remains valuable in real-world applications where data quality is limited.
10.4.2 Wilcoxon Signed-Rank Test
The Wilcoxon Signed-Rank Test is a nonparametric test used to compare paired samples and test hypotheses about the median of paired differences [1]. Unlike the Sign Test, it considers both the direction and magnitude of differences, making it more powerful while still avoiding strict distributional assumptions.
Wilcoxon Signed-Rank Test
The Wilcoxon Signed-Rank Test is commonly regarded as the nonparametric alternative to the paired t-test.
The Wilcoxon Signed-Rank Test is appropriate when:
- Observations are paired (before–after or matched samples)
- The distribution of differences is not normal
- Data are at least ordinal
- The distribution of differences is approximately symmetric
- The magnitude of changes is meaningful and reliable
Key Characteristics:
- Uses both the sign and rank of paired differences
- More powerful than the Sign Test
- Does not assume normality
- Sensitive to strong asymmetry
Hypotheses
\[ H_0: \text{Median difference} = 0 \]
\[ H_1: \text{Median difference} \neq 0 \]
Real-World Case: Manufacturing Quality Control
A manufacturing plant evaluates the effectiveness of a new machine calibration procedure designed to reduce product defects. For each production batch, the number of defective items is recorded before and after calibration.
Although the defect data are not normally distributed, the magnitude of change in defect counts is reliable and meaningful. Therefore, the Wilcoxon Signed-Rank Test is applied to assess whether the median change in defect counts differs significantly from zero.
Step 1: Compute Paired Differences
For each batch:
\[ d_i = \text{Defects}_{\text{after}} - \text{Defects}_{\text{before}} \]
# Defect counts before and after calibration
before <- c(12, 15, 10, 18, 20, 14, 16, 22, 11, 19)
after <- c(10, 14, 11, 15, 18, 13, 15, 20, 11, 17)
# Compute paired differences
diff <- after - before
diff [1] -2 -1 1 -3 -2 -1 -1 -2 0 -2
Step 2: Remove Zero Differences
- If \(d_i = 0\), discard the observation
- Let \(n\) be the number of non-zero differences
# Remove zero differences
diff_nonzero <- diff[diff != 0]
diff_nonzero[1] -2 -1 1 -3 -2 -1 -1 -2 -2
Step 3: Rank the Absolute Differences
- Compute \(|d_i|\) for each remaining pair
- Rank the values from smallest to largest
- If ties occur, assign average ranks
# Absolute differences
abs_diff <- abs(diff_nonzero)
# Rank absolute differences
ranks <- rank(abs_diff)
data.frame(
Difference = diff_nonzero,
AbsDifference = abs_diff,
Rank = ranks
) Difference AbsDifference Rank
1 -2 2 6.5
2 -1 1 2.5
3 1 1 2.5
4 -3 3 9.0
5 -2 2 6.5
6 -1 1 2.5
7 -1 1 2.5
8 -2 2 6.5
9 -2 2 6.5
Step 4: Assign Signs to Ranks
- If \(d_i > 0\), assign a positive rank
- If \(d_i < 0\), assign a negative rank
# Assign signed ranks
signed_ranks <- ifelse(diff_nonzero > 0, ranks, -ranks)
data.frame(
Difference = diff_nonzero,
Rank = ranks,
SignedRank = signed_ranks
) Difference Rank SignedRank
1 -2 6.5 -6.5
2 -1 2.5 -2.5
3 1 2.5 2.5
4 -3 9.0 -9.0
5 -2 6.5 -6.5
6 -1 2.5 -2.5
7 -1 2.5 -2.5
8 -2 6.5 -6.5
9 -2 6.5 -6.5
Step 5: Compute Test Statistic
Let:
- \(W^+\) = sum of positive ranks
- \(W^-\) = sum of negative ranks
The test statistic is defined as:
\[ W = \min(W^+, W^-) \]
# Sum of positive and negative ranks
W_pos <- sum(ranks[diff_nonzero > 0])
W_neg <- sum(ranks[diff_nonzero < 0])
W_pos[1] 2.5
W_neg[1] 42.5
# Test statistic
W <- min(W_pos, W_neg)
W[1] 2.5
For large samples (\(n > 20\)), the statistic may be approximated by a normal distribution.
Step 6: Decision Rule
Compute the p-value using the Wilcoxon distribution or its normal approximation
# Perform Wilcoxon Signed-Rank Test
wilcox.test(
after,
before,
paired = TRUE,
alternative = "two.sided",
exact = TRUE
)
Wilcoxon signed rank test with continuity correction
data: after and before
V = 2.5, p-value = 0.01868
alternative hypothesis: true location shift is not equal to 0
Reject \(H_0\) if:
\[ \text{p-value} < \alpha \]
Interpretation:
- Reject \(H_0\): There is sufficient evidence that the median difference is not zero, indicating that the calibration significantly affects defect counts.
- Fail to reject \(H_0\): There is insufficient evidence to conclude that the calibration has a significant effect.
The Wilcoxon Signed-Rank Test provides a balance between robustness and efficiency, making it a preferred nonparametric method for paired data when normality assumptions are violated.
10.4.3 Mann–Whitney U Test
The Mann–Whitney U Test is a nonparametric test used to compare two independent samples and assess whether they come from populations with the same central tendency. It is commonly regarded as the nonparametric alternative to the independent two-sample t-test.
Wilcoxon Signed-Rank Test
Rather than comparing means, the Mann–Whitney U Test evaluates whether observations from one group tend to be larger or smaller than those from the other group based on their ranks.
The Mann–Whitney U Test is appropriate when:
- Two samples are independent
- Data are at least ordinal
- The population distributions are not normal
- Sample sizes may be small or unequal
- The shapes of the two distributions are similar
Key Characteristics:
- Uses ranks instead of raw data
- Does not assume normality
- Robust to outliers
- Tests differences in distribution location
Hypotheses
\[ H_0: \text{The two populations have the same distribution} \]
\[ H_1: \text{The two populations have different distributions} \]
(If distribution shapes are similar, this is often interpreted as a test of median equality.)
Real-World Case: Business and Marketing Analytics
A company wants to compare customer satisfaction scores between two independent marketing strategies (Strategy A and Strategy B). Survey responses are collected using a Likert scale, producing ordinal data that do not satisfy normality assumptions.
Because the two customer groups are independent and the data are ordinal, the Mann–Whitney U Test is used to determine whether customer satisfaction differs significantly between the two strategies.
Step 1: Combine and Rank the Data
- Combine observations from both groups
- Rank all observations from smallest to largest
- Assign average ranks in the presence of ties
# Customer satisfaction scores (Likert scale: 1–5)
strategy_A <- c(3, 4, 4, 5, 3, 4, 5, 4)
strategy_B <- c(2, 3, 3, 4, 2, 3, 4, 3)
# Combine data
scores <- c(strategy_A, strategy_B)
group <- factor(c(rep("A", length(strategy_A)),
rep("B", length(strategy_B))))
# Rank combined data
ranks <- rank(scores)
data.frame(
Score = scores,
Group = group,
Rank = ranks
) Score Group Rank
1 3 A 5.5
2 4 A 11.5
3 4 A 11.5
4 5 A 15.5
5 3 A 5.5
6 4 A 11.5
7 5 A 15.5
8 4 A 11.5
9 2 B 1.5
10 3 B 5.5
11 3 B 5.5
12 4 B 11.5
13 2 B 1.5
14 3 B 5.5
15 4 B 11.5
16 3 B 5.5
Step 2: Compute Rank Sums
Let:
- \(R_1\) = sum of ranks for Group 1
- \(R_2\) = sum of ranks for Group 2
# Rank sums
R1 <- sum(ranks[group == "A"])
R2 <- sum(ranks[group == "B"])
R1[1] 88
R2[1] 48
Step 3: Compute the U Statistics
For sample sizes \(n_1\) and \(n_2\):
\[ U_1 = n_1 n_2 + \frac{n_1 (n_1 + 1)}{2} - R_1 \]
\[ U_2 = n_1 n_2 + \frac{n_2 (n_2 + 1)}{2} - R_2 \]
# Sample sizes
n1 <- length(strategy_A)
n2 <- length(strategy_B)
# Compute U statistics
U1 <- n1 * n2 + n1 * (n1 + 1) / 2 - R1
U2 <- n1 * n2 + n2 * (n2 + 1) / 2 - R2
U1[1] 12
U2[1] 52
Step 4: Test Statistic
The test statistic is:
\[ U = \min(U_1, U_2) \]
# Test statistic
U <- min(U1, U2)
U[1] 12
For large samples, \(U\) can be approximated by a normal distribution.
Step 5: Decision Rule
Compute the p-value from the Mann–Whitney distribution or its normal approximation
# Mann–Whitney U Test using R
wilcox.test(
strategy_A,
strategy_B,
alternative = "two.sided",
exact = TRUE
)
Wilcoxon rank sum test with continuity correction
data: strategy_A and strategy_B
W = 52, p-value = 0.03033
alternative hypothesis: true location shift is not equal to 0
Reject \(H_0\) if:
\[ \text{p-value} < \alpha \]
Interpretation:
- Reject \(H_0\): There is sufficient evidence that the two groups differ in central tendency or distribution location.
- Fail to reject \(H_0\): There is insufficient evidence to conclude a difference between the two groups.
The Mann–Whitney U Test is a powerful and flexible nonparametric method for comparing two independent groups when parametric assumptions are violated or data are ordinal.
10.4.4 Kruskal–Wallis Test
The Kruskal–Wallis Test is a nonparametric test used to compare three or more independent groups and determine whether they originate from the same population distribution. It is commonly regarded as the nonparametric alternative to one-way ANOVA.
Kruskal–Wallis Test
Instead of comparing group means, the Kruskal–Wallis Test evaluates differences in central tendency by comparing the ranks of observations across groups.
The Kruskal–Wallis Test is appropriate when:
- There are three or more independent samples
- Data are at least ordinal
- The population distributions are not normal
- Sample sizes may be unequal
- The shapes of the group distributions are similar
Key Characteristics:
- Uses ranks instead of raw data
- Does not assume normality
- Robust to outliers
- Suitable for small sample sizes
Hypotheses
\[ H_0: \text{All populations have the same distribution} \]
\[ H_1: \text{At least one population has a different distribution} \]
(If distribution shapes are similar, this is often interpreted as a test of median equality.)
Real-World Case: Engineering and Quality Control
A manufacturing company wants to compare product defect rates across three different production machines (Machine A, B, and C). The defect counts are skewed and contain outliers due to occasional machine malfunctions. Because the data are non-normal and the machines operate independently, the Kruskal–Wallis Test is applied to determine whether there are statistically significant differences in defect rates among the machines.
Step 1: Combine and Rank All Observations
- Combine observations from all groups into a single dataset
- Rank all values from smallest to largest
- Assign average ranks in the presence of ties
# Defect counts for each machine
machine_A <- c(5, 7, 6, 8, 9, 6, 7)
machine_B <- c(10, 12, 11, 9, 13, 10, 14)
machine_C <- c(4, 5, 6, 4, 5, 7, 6)
# Combine data
defects <- c(machine_A, machine_B, machine_C)
machine <- factor(c(rep("A", length(machine_A)),
rep("B", length(machine_B)),
rep("C", length(machine_C))))
# Rank all observations
ranks <- rank(defects)
data.frame(
Defects = defects,
Machine = machine,
Rank = ranks
) Defects Machine Rank
1 5 A 4.0
2 7 A 11.0
3 6 A 7.5
4 8 A 13.0
5 9 A 14.5
6 6 A 7.5
7 7 A 11.0
8 10 B 16.5
9 12 B 19.0
10 11 B 18.0
11 9 B 14.5
12 13 B 20.0
13 10 B 16.5
14 14 B 21.0
15 4 C 1.5
16 5 C 4.0
17 6 C 7.5
18 4 C 1.5
19 5 C 4.0
20 7 C 11.0
21 6 C 7.5
Step 2: Compute Rank Sums for Each Group
Let:
- \(R_j\) = sum of ranks for group \(j\)
- \(n_j\) = sample size of group \(j\)
- \(k\) = number of groups
- \(N = \sum_{j=1}^{k} n_j\)
# Sample sizes
n_A <- length(machine_A)
n_B <- length(machine_B)
n_C <- length(machine_C)
# Rank sums
R_A <- sum(ranks[machine == "A"])
R_B <- sum(ranks[machine == "B"])
R_C <- sum(ranks[machine == "C"])
R_A[1] 68.5
R_B[1] 125.5
R_C[1] 37
Step 3: Compute the Test Statistic
The Kruskal–Wallis test statistic is:
\[ H = \frac{12}{N(N+1)} \sum_{j=1}^{k} \frac{R_j^2}{n_j} - 3(N+1) \]
# Total sample size
N <- length(defects)
# Compute H statistic
H <- (12 / (N * (N + 1))) * (
(R_A^2 / n_A) +
(R_B^2 / n_B) +
(R_C^2 / n_C)
) - 3 * (N + 1)
H[1] 14.93321
Step 4: Sampling Distribution
For sufficiently large samples, the test statistic follows a chi-square distribution:
\[ H \sim \chi^2_{k-1} \]
# Degrees of freedom
df <- 3 - 1
# Compute p-value
p_value <- 1 - pchisq(H, df)
p_value[1] 0.0005718666
Step 5: Decision Rule
Compute the p-value from the chi-square distribution
# Kruskal–Wallis test using R
kruskal.test(defects ~ machine)
Kruskal-Wallis rank sum test
data: defects by machine
Kruskal-Wallis chi-squared = 15.14, df = 2, p-value = 0.0005158
Reject \(H_0\) if:
\[ \text{p-value} < \alpha \]
Interpretation:
- Reject \(H_0\): There is sufficient evidence that at least one group differs in central tendency or distribution location.
- Fail to reject \(H_0\): There is insufficient evidence to conclude a difference among the groups.
Post-Hoc Analysis
If \(H_0\) is rejected, post-hoc tests such as Dunn’s test may be conducted to identify which specific groups differ. The Kruskal–Wallis Test provides a robust and flexible approach for comparing multiple independent groups when parametric assumptions are violated.
10.4.5 Friedman Test
The Friedman Test is a nonparametric statistical test used to detect differences among three or more related (paired) groups. It is commonly regarded as the nonparametric alternative to the one-way repeated-measures ANOVA.
Friedman Test
Rather than comparing means, the Friedman Test evaluates differences in central tendency by comparing within-subject ranks across treatments or conditions.
The Friedman Test is appropriate when:
- The same subjects are measured under three or more conditions
- Observations are paired or repeated measures
- Data are at least ordinal
- The normality assumption for repeated-measures ANOVA is violated
- The magnitude of measurements is comparable across conditions
Key Characteristics:
- Uses ranks within each subject/block
- Does not assume normality
- Controls for subject-to-subject variability
- Suitable for small sample sizes
Hypotheses
\[ H_0: \text{All treatments have the same distribution} \]
\[ H_1: \text{At least one treatment has a different distribution} \]
(If distribution shapes are similar, this is often interpreted as a test of median equality across treatments.)
Real-World Case: Human Performance Evaluation
A company evaluates employee productivity under three different work schedules: fixed hours, flexible hours, and remote work. The same employees are evaluated under each schedule over separate periods.
Because the productivity scores are not normally distributed and measurements are repeated on the same individuals, the Friedman Test is used to determine whether productivity differs significantly across the three work conditions.
Step 1: Organize Data into Blocks
- Each row represents a subject (block)
- Each column represents a treatment or condition
# Defect counts for each batch under different settings
setting_A <- c(8, 7, 9, 6, 10)
setting_B <- c(6, 5, 7, 5, 8)
setting_C <- c(9, 8, 10, 7, 11)
# Create data frame (blocks = batches)
defects <- data.frame(
Batch = factor(1:5),
A = setting_A,
B = setting_B,
C = setting_C
)
defects Batch A B C
1 1 8 6 9
2 2 7 5 8
3 3 9 7 10
4 4 6 5 7
5 5 10 8 11
Step 2: Rank Data Within Each Block
- Rank the values within each subject from smallest to largest
- Assign average ranks in case of ties
# Rank within each batch
ranks <- t(apply(defects[, -1], 1, rank))
colnames(ranks) <- c("A", "B", "C")
ranks A B C
[1,] 2 1 3
[2,] 2 1 3
[3,] 2 1 3
[4,] 2 1 3
[5,] 2 1 3
Step 3: Compute Rank Sums for Each Treatment
Let:
- \(R_j\) = sum of ranks for treatment \(j\)
- \(n\) = number of subjects (blocks)
- \(k\) = number of treatments
# Rank sums
R <- colSums(ranks)
n <- nrow(defects) # number of blocks
k <- ncol(ranks) # number of treatments
R A B C
10 5 15
Step 4: Compute the Test Statistic
The Friedman test statistic is:
\[ Q = \frac{12}{n k (k+1)} \sum_{j=1}^{k} R_j^2 - 3n(k+1) \]
# Compute Friedman statistic manually
Q <- (12 / (n * k * (k + 1))) * sum(R^2) - 3 * n * (k + 1)
Q[1] 10
Step 5: Sampling Distribution
For sufficiently large samples, the test statistic follows a chi-square distribution:
\[ Q \sim \chi^2_{k-1} \]
# Degrees of freedom
df <- k - 1
# Compute p-value
p_value <- 1 - pchisq(Q, df)
p_value[1] 0.006737947
Step 6: Decision Rule
Compute the p-value from the chi-square distribution
# Friedman test using R
friedman.test(as.matrix(defects[, -1]))
Friedman rank sum test
data: as.matrix(defects[, -1])
Friedman chi-squared = 10, df = 2, p-value = 0.006738
Reject \(H_0\) if:
\[ \text{p-value} < \alpha \]
Interpretation:
- Reject \(H_0\): There is sufficient evidence that at least one treatment differs in central tendency.
- Fail to reject \(H_0\): There is insufficient evidence to conclude a difference among treatments.
Post-Hoc Analysis
If \(H_0\) is rejected, post-hoc procedures such as the Nemenyi test or pairwise Wilcoxon signed-rank tests with adjustment may be used to identify which treatments differ.
The Friedman Test is a powerful nonparametric method for analyzing repeated-measures or blocked data when parametric assumptions are violated.
10.4.6 Chi-Square Test
The Chi-Square Test is a nonparametric statistical test used to examine relationships between categorical variables. It is widely applied to determine whether observed frequencies differ significantly from expected frequencies under a specified hypothesis.
Chi-Square Test
The Chi-Square Test is commonly used for independence testing and goodness-of-fit analysis.
The Chi-Square Test is appropriate when:
- Data are categorical
- Observations are independent
- Frequencies (counts) are analyzed, not percentages
- Expected frequencies in each cell are sufficiently large (typically ≥ 5)
Key Characteristics:
- Based on frequency counts
- Does not assume normality
- Simple to compute and interpret
- Sensitive to sample size
Types of Chi-Square Tests
Chi-Square Test of Independence
Examines whether two categorical variables are associated.Chi-Square Goodness-of-Fit Test
Determines whether an observed distribution matches a theoretical distribution.
Hypotheses (Independence Test)
\[ H_0: \text{The two categorical variables are independent} \]
\[ H_1: \text{The two categorical variables are not independent} \]
Real-World Case: Social and Behavioral Sciences
A university wants to examine whether students’ study programs (Science, Engineering, Social Sciences) are associated with their preferred learning mode (online, hybrid, in-person).
Because both variables are categorical and the data consist of frequency counts, the Chi-Square Test of Independence is applied.
Step 1: Construct a Contingency Table
| Program / Learning Mode | Online | Hybrid | In-Person |
|---|---|---|---|
| Science | \(O_{11}\) | \(O_{12}\) | \(O_{13}\) |
| Engineering | \(O_{21}\) | \(O_{22}\) | \(O_{23}\) |
| Social Sciences | \(O_{31}\) | \(O_{32}\) | \(O_{33}\) |
# Observed frequencies
observed <- matrix(
c(40, 35, 25, # Science
30, 45, 25, # Engineering
50, 30, 20), # Social Sciences
nrow = 3,
byrow = TRUE
)
colnames(observed) <- c("Online", "Hybrid", "In-Person")
rownames(observed) <- c("Science", "Engineering", "Social Sciences")
observed Online Hybrid In-Person
Science 40 35 25
Engineering 30 45 25
Social Sciences 50 30 20
Step 2: Compute Expected Frequencies
For each cell:
\[ E_{ij} = \frac{(\text{Row Total})_i \times (\text{Column Total})_j}{\text{Grand Total}} \]
# Compute expected frequencies
expected <- chisq.test(observed)$expected
expected Online Hybrid In-Person
Science 40 36.66667 23.33333
Engineering 40 36.66667 23.33333
Social Sciences 40 36.66667 23.33333
Step 3: Compute the Test Statistic
The Chi-Square statistic is:
\[ \chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \]
where:
- \(O_{ij}\) = observed frequency
- \(E_{ij}\) = expected frequency
- \(r\) = number of rows
- \(c\) = number of columns
# Chi-square statistic
chisq_stat <- sum((observed - expected)^2 / expected)
chisq_stat[1] 8.896104
Step 4: Degrees of Freedom
\[ df = (r - 1)(c - 1) \]
# Degrees of freedom
df <- (nrow(observed) - 1) * (ncol(observed) - 1)
df[1] 4
Step 5: Decision Rule
Compute the p-value from the chi-square distribution
# Chi-Square Test of Independence using R
chisq.test(observed)
Pearson's Chi-squared test
data: observed
X-squared = 8.8961, df = 4, p-value = 0.06375
Reject \(H_0\) if:
\[ \text{p-value} < \alpha \]
Interpretation
- Reject \(H_0\): There is a significant association between the categorical variables.
- Fail to reject \(H_0\): There is insufficient evidence to conclude an association.
Effect Size (Optional)
For contingency tables, effect size can be measured using Cramér’s V:
\[ V = \sqrt{\frac{\chi^2}{N(k - 1)}} \]
where \(k\) is the smaller of \(r\) or \(c\).
The Chi-Square Test is a fundamental tool for analyzing categorical data and identifying relationships between qualitative variables in many applied research fields.
10.5 Advantages and Limitations
The following table presents a summary of the advantages and limitations of nonparametric statistical methods, which may be considered when selecting an appropriate analytical approach based on data characteristics and research objectives.
| Method | Advantages | Disadvantages |
|---|---|---|
| Sign Test | Fewer assumptions; extremely robust to outliers | Very low statistical power; ignores magnitude of differences |
| Wilcoxon Signed-Rank Test | More powerful than Sign Test; uses magnitude and direction | Requires symmetric distribution of differences |
| Mann–Whitney U Test | Suitable for ordinal data; robust to non-normality | Tests distributional shift rather than mean difference |
| Kruskal–Wallis Test | Extends Mann–Whitney to multiple groups; no normality assumption | Does not indicate which groups differ without post-hoc tests |
| Friedman Test | Suitable for repeated measures; controls subject variability | Less powerful than parametric repeated-measures ANOVA |
| Chi-Square Test | Ideal for categorical data; simple and intuitive | Sensitive to small expected frequencies; no direction of association |
10.6 Nonparametric Case Studies
This section presents several real-world case studies illustrating the application of nonparametric statistical methods in different fields. Each case highlights the characteristics of the data, the rationale for choosing a nonparametric approach, and the appropriate statistical test used to address the research question.
10.6.1 Case Study 1
Manufacturing Quality Control (Sign Test):
A manufacturing plant investigates whether a new machine calibration procedure reduces the number of defective products. Defect counts are recorded before and after calibration for the same production batches. The data contain extreme values due to occasional machine failures and do not satisfy normality assumptions. Objective: Test whether the median change in defect counts differs from zero.
10.6.2 Case Study 2
Medical Treatment Evaluation (Wilcoxon Signed-Rank Test):
A clinical researcher examines whether a new therapy reduces patient pain scores measured on an ordinal scale. Pain levels are recorded for the same patients before and after treatment. The distribution of differences is non-normal, but the magnitude of change is meaningful. Objective: Determine whether the median pain score after treatment differs from before treatment.
10.6.3 Case Study 3
Marketing Strategy Comparison (Mann–Whitney U Test):
A company compares customer satisfaction ratings between two independent marketing strategies. Survey responses are collected using a Likert scale from two separate customer groups. Objective: Assess whether customer satisfaction differs between the two strategies.
10.6.4 Case Study 4
Production Line Performance (Kruskal–Wallis Test):
An engineering team evaluates defect rates across three independent production machines. The defect data are skewed and contain outliers. Objective: Identify whether at least one machine has a different defect rate distribution.
10.6.5 Case Study 5
Human Performance Analysis (Friedman Test):
An organization studies employee productivity under three different work conditions (on-site, hybrid, remote). Productivity scores are measured for the same employees under each condition. Objective: Determine whether productivity differs across work conditions.
10.6.6 Case Study 6
Education and Learning Preferences (Chi-Square Test):
A university analyzes the relationship between students’ study programs and their preferred learning modes (online, hybrid, in-person). Data are collected as frequency counts. Objective: Examine whether learning preferences are associated with study programs.
These case studies demonstrate how nonparametric methods provide flexible and robust solutions when data violate parametric assumptions or involve ordinal and categorical measurements.
