4.5 Categorical Data Analysis

Categorical Data Analysis is used when the outcome variables are categorical.

Nominal Variables: Categories have no logical order (e.g., sex: male, female).
Ordinal Variables: Categories have a logical order, but the relative distances between values are not well defined (e.g., small, medium, large).

In categorical data, we often analyze how the distribution of one variable changes with the levels of another variable. For example, row percentages may differ across columns in a contingency table.

4.5.1 Association Tests

4.5.1.1 Small Samples

4.5.1.1.1 Fisher’s Exact Test

For small samples, the approximate tests based on the asymptotic normality of $\hat{p}_1 - \hat{p}_2$ (the difference in proportions) do not hold. In such cases, we use Fisher’s Exact Test to evaluate:

Null Hypothesis ( $H_0$ ): $p_1 = p_2$ (no association between variables),
Alternative Hypothesis ( $H_a$ ): $p_1 \neq p_2$ (an association exists).

Assumptions

$X_1$ and $X_2$ are independent Binomial random variables:
- $X_1 \sim \text{Binomial}(n_1, p_1)$ ,
- $X_2 \sim \text{Binomial}(n_2, p_2)$ .
$x_1$ and $x_2$ are the observed values (successes in each sample).
Total sample size is $n = n_1 + n_2$ .
Total successes are $m = x_1 + x_2$ .

By conditioning on $m$ , the total number of successes, the number of successes in sample 1 follows a Hypergeometric distribution.

Test Statistic

To test $H_0: p_1 = p_2$ against $H_a: p_1 \neq p_2$ , we use the test statistic:

$Z^2 = \left( \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1 - \hat{p}) \left( \frac{1}{n_1} + \frac{1}{n_2} \right)}} \right)^2 \sim \chi^2_{1, \alpha}$

where:

$\hat{p}_1$ and $\hat{p}_2$ are the observed proportions of successes in samples 1 and 2,
$\hat{p}$ is the pooled proportion: $\hat{p} = \frac{x_1 + x_2}{n_1 + n_2},$
$\chi^2_{1, \alpha}$ is the upper $\alpha$ critical value of the Chi-squared distribution with 1 degree of freedom.

Fisher’s Exact Test can be extended to a contingency table setting to test whether the observed frequencies differ significantly from the expected frequencies under the null hypothesis of no association.

# Create a 2x2 contingency table
data_table <- matrix(c(8, 2, 1, 5), nrow = 2, byrow = TRUE)
colnames(data_table) <- c("Success", "Failure")
rownames(data_table) <- c("Group 1", "Group 2")

# Display the table
data_table
#>         Success Failure
#> Group 1       8       2
#> Group 2       1       5

# Perform Fisher's Exact Test
fisher_result <- fisher.test(data_table)

# Display the results
fisher_result
#> 
#>  Fisher's Exact Test for Count Data
#> 
#> data:  data_table
#> p-value = 0.03497
#> alternative hypothesis: true odds ratio is not equal to 1
#> 95 percent confidence interval:
#>     1.008849 1049.791446
#> sample estimates:
#> odds ratio 
#>   15.46969

The output of fisher.test() includes:

p-value: The probability of observing such a contingency table under the null hypothesis.
Alternative Hypothesis: Indicates whether the test is two-sided or one-sided.
If the p-value is less than $\alpha$ , reject $H_0$ and conclude that there is a significant association between the two variables.

4.5.1.1.2 Exact Chi-Square Test

For small samples where the normal approximation does not apply, we can compute the exact Chi-Square test by using Fisher’s Exact Test or Monte Carlo simulation methods.

The Chi-Square test statistic in the 2x2 table is:

$\chi^2 = \sum_{i=1}^r \sum_{j=1}^c \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$

where:

$O_{ij}$ : Observed frequency in cell $(i, j)$ ,
$E_{ij}$ : Expected frequency under the null hypothesis,
$r$ : Number of rows,
$c$ : Number of columns.

4.5.1.2 Large Samples

4.5.1.2.1 Pearson Chi-Square Test

The Pearson Chi-Square Test is commonly used to test whether there is an association between two categorical variables. It compares the observed counts in a contingency table to the expected counts under the null hypothesis.

The test statistic is:

$\chi^2 = \sum_{\text{all cells}} \frac{(\text{observed} - \text{expected})^2}{\text{expected}}$

The test is applied in settings where multiple proportions or frequencies are compared across independent surveys or experiments.

Null Hypothesis ( $H_0$ ): The observed data are consistent with the expected values (no association or no deviation from a model).
Alternative Hypothesis ( $H_a$ ): The observed data differ significantly from the expected values.

Characteristics of the Test

Validation of Models:
In some cases, $H_0$ represents the model whose validity is being tested. The goal is not necessarily to reject the model but to check whether the data are consistent with it. Deviations may be due to random chance.
Strength of Association:
The Chi-Square Test detects whether an association exists but does not measure the strength of the association. For measuring strength, metrics like Cramér’s V or the Phi coefficient should be used.
Effect of Sample Size:
- The Chi-Square statistic reflects sample size. If the sample size is doubled (e.g., duplicating observations), the $\chi^2$ statistic will also double, even though the strength of the association remains unchanged.
- This sensitivity can sometimes lead to detecting significant results that are not practically meaningful.
Expected Cell Frequencies:
- The test is not appropriate if more than 20% of the cells in a contingency table have expected frequencies less than 5.
- For small sample sizes, Fisher’s Exact Test or exact p-values should be used instead.

Test for a Single Proportion
We test whether the observed proportion of successes equals 0.5.

$\begin{aligned} H_0&: p_J = 0.5 \\ H_a&: p_J < 0.5 \end{aligned}$

# Observed data
july.x <- 480
july.n <- 1000
# Test for single proportion
prop.test(
  x = july.x,
  n = july.n,
  p = 0.5,
  alternative = "less",
  correct = FALSE
)
#> 
#>  1-sample proportions test without continuity correction
#> 
#> data:  july.x out of july.n, null probability 0.5
#> X-squared = 1.6, df = 1, p-value = 0.103
#> alternative hypothesis: true p is less than 0.5
#> 95 percent confidence interval:
#>  0.0000000 0.5060055
#> sample estimates:
#>    p 
#> 0.48

Test for Equality of Proportions Between Two Groups: We test whether the proportions of successes in July and September are equal.

$\begin{aligned} H_0&: p_J = p_S \\ H_a&: p_j \neq p_S \end{aligned}$

# Observed data for two groups
sept.x <- 704
sept.n <- 1600
# Test for equality of proportions
prop.test(
  x = c(july.x, sept.x),
  n = c(july.n, sept.n),
  correct = FALSE
)
#> 
#>  2-sample test for equality of proportions without continuity correction
#> 
#> data:  c(july.x, sept.x) out of c(july.n, sept.n)
#> X-squared = 3.9701, df = 1, p-value = 0.04632
#> alternative hypothesis: two.sided
#> 95 percent confidence interval:
#>  0.0006247187 0.0793752813
#> sample estimates:
#> prop 1 prop 2 
#>   0.48   0.44

Comparison of Proportions for Multiple Groups
	Experiment 1	Experiment 2	…	Experiment k
Number of successes	$x_1$	$x_2$	…	$x_k$
Number of failures	$n_1 - x_1$	$n_2 - x_2$	…	$n_k - x_k$
Total	$n_1$	$n_2$	…	$n_k$

We test the null hypothesis:

$H_0: p_1 = p_2 = \dots = p_k$

against the alternative that at least one proportion differs.

Pooled Proportion

Assuming $H_0$ is true, we estimate the common value of the probability of success as:

$\hat{p} = \frac{x_1 + x_2 + \dots + x_k}{n_1 + n_2 + \dots + n_k}.$

The expected counts under $H_0$ are:

Expected counts for a goodness-of-fit test with estimated proportion
Success	$n_1 \hat{p}$	$n_2 \hat{p}$	…	$n_k \hat{p}$
Failure	$n_1(1-\hat{p})$	$n_2(1-\hat{p})$	…	$n_k(1-\hat{p})$
	$n_1$	$n_2$		$n_k$

The test statistic is:

$\chi^2 = \sum_{\text{all cells}} \frac{(\text{observed} - \text{expected})^2}{\text{expected}}$

with $k - 1$ degrees of freedom.

Two-Way Contingency Tables

When categorical data are cross-classified, we create a two-way table of observed counts.

General form contingency table with marginal totals
	1	2	…	j	…	c	Row Total
1	$n_{11}$	$n_{12}$	…	$n_{1j}$	…	$n_{1c}$	$n_{1.}$
2	$n_{21}$	$n_{22}$	…	$n_{2j}$	…	$n_{2c}$	$n_{2.}$
…	…	…	…	…	…	…	…
r	$n_{r1}$	$n_{r2}$	…	$n_{rj}$	…	$n_{rc}$	$n_{r.}$
Column Total	$n_{.1}$	$n_{.2}$	…	$n_{.j}$	…	$n_{.c}$	$n_{..}$

Sampling Designs

Design 1: Total Sample Size Fixed
- A single random sample of size $n$ is drawn from the population.
- Units are cross-classified into $r$ rows and $c$ columns. Both row and column totals are random variables.
- The cell counts $n_{ij}$ follow a multinomial distribution with probabilities $p_{ij}$ such that: $\sum_{i=1}^r \sum_{j=1}^c p_{ij} = 1.$
- Let $p_{ij} = P(X = i, Y = j)$ be the joint probability, where $X$ is the row variable and $Y$ is the column variable.
- Null Hypothesis of Independence: $H_0: p_{ij} = p_{i.} p_{.j}, \quad \text{where } p_{i.} = P(X = i) \text{ and } p_{.j} = P(Y = j).$
- Alternative Hypothesis: $H_a: p_{ij} \neq p_{i.} p_{.j}.$
Design 2: Row Totals Fixed
- Random samples of sizes $n_1, n_2, \dots, n_r$ are drawn independently from $r$ row populations.
- The row totals $n_{i.}$ are fixed, but column totals are random.
- Counts in each row follow independent multinomial distributions.
- The null hypothesis assumes that the conditional probabilities of the column variable $Y$ are the same across all rows: $H_0: p_{ij} = P(Y = j | X = i) = p_j \quad \text{for all } i \text{ and } j.$
- Alternatively: $H_0: (p_{i1}, p_{i2}, \dots, p_{ic}) = (p_1, p_2, \dots, p_c) \quad \text{for all } i.$
- Alternative Hypothesis: $H_a: (p_{i1}, p_{i2}, \dots, p_{ic}) \text{ are not the same for all } i.$

Comparison of fixed total sample size vs. fixed row total designs in categorical data analysis
Design	Total Sample Size Fixed	Row Totals Fixed
Scenario	A single dataset or experiment where all observations are collected together as one sample.	Observations are collected separately for each row, with fixed totals for each row population.
Example	Survey with 100 respondents randomly selected, recording responses based on two categorical variables (e.g., age group and gender).	Stratified survey with specific numbers of individuals sampled from predefined groups (e.g., 30 males, 40 females, 30 non-binary).
Why This Design?	Models situations where the total number of observations is fixed. Both row and column categories emerge randomly. Tests for independence between two categorical variables (row and column).	Models scenarios where sampling occurs independently within predefined strata or groups. Tests for homogeneity of column proportions across rows, ignoring differences in total counts between rows.
Practical Use Case	Market Research: Do customer demographics (rows) and purchase behavior (columns) show a dependence? Biology: Is there an association between species (rows) and habitat types (columns)?	Public Health: Are smoking rates (columns) consistent across age groups (rows)? Education: Do pass rates (columns) differ across schools (rows), controlling for the number of students in each school?

Why Both Designs?

Real-World Sampling Constraints:
- Sometimes, you have control over row totals (e.g., fixed group sizes in stratified sampling).
- Other times, you collect data without predefined group sizes, and totals emerge randomly.
Different Null Hypotheses:
- Design 1 tests whether two variables are independent (e.g., does one variable predict the other?).
- Design 2 tests whether column proportions are homogeneous across groups (e.g., are the groups similar?).

# Sampling Design 1: Total Sample Size Fixed
# Parameters for the multinomial distribution
r <- 3  # Number of rows
c <- 4  # Number of columns
n <- 100  # Total sample size
p <- matrix(c(0.1, 0.2, 0.1, 0.1,
              0.05, 0.15, 0.05, 0.1,
              0.05, 0.05, 0.025, 0.075), nrow = r, byrow = TRUE)

# Generate a single random sample
set.seed(123)  # For reproducibility
n_ij <- rmultinom(1, size = n, prob = as.vector(p))

# Reshape into a contingency table
contingency_table_fixed_total <-
    matrix(n_ij,
           nrow = r,
           ncol = c,
           byrow = TRUE)
rownames(contingency_table_fixed_total) <- paste0("Row", 1:r)
colnames(contingency_table_fixed_total) <- paste0("Col", 1:c)

# Hypothesis testing (Chi-squared test of independence)
chisq_test_fixed_total <- chisq.test(contingency_table_fixed_total)

# Display results
print("Contingency Table (Total Sample Size Fixed):")
#> [1] "Contingency Table (Total Sample Size Fixed):"
print(contingency_table_fixed_total)
#>      Col1 Col2 Col3 Col4
#> Row1    8    6    4   24
#> Row2   18    1    9    7
#> Row3    2    7    5    9
print("Chi-squared Test Results:")
#> [1] "Chi-squared Test Results:"
print(chisq_test_fixed_total)
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  contingency_table_fixed_total
#> X-squared = 28.271, df = 6, p-value = 8.355e-05

All counts in the contingency table come from a single multinomial sample where both row and column totals are random.
Conclusion: Reject Null. The data suggests significant dependence between row and column variables.

# Sampling Design 2: Row Totals Fixed
# Parameters for the fixed row totals
n_row <- c(30, 40, 30)  # Row totals
c <- 4  # Number of columns
p_col <- c(0.25, 0.25, 0.25, 0.25)  # Common column probabilities under H0

# Generate independent multinomial samples for each row
set.seed(123)  # For reproducibility
row_samples <- lapply(n_row, function(size)
    t(rmultinom(1, size, prob = p_col)))

# Combine into a contingency table
contingency_table_fixed_rows <- do.call(rbind, row_samples)
rownames(contingency_table_fixed_rows) <- paste0("Row", 1:length(n_row))
colnames(contingency_table_fixed_rows) <- paste0("Col", 1:c)

# Hypothesis testing (Chi-squared test of homogeneity)
chisq_test_fixed_rows <- chisq.test(contingency_table_fixed_rows)

# Display results
print("Contingency Table (Row Totals Fixed):")
#> [1] "Contingency Table (Row Totals Fixed):"
print(contingency_table_fixed_rows)
#>      Col1 Col2 Col3 Col4
#> Row1    6   10    7    7
#> Row2   13   13    4   10
#> Row3    8   10    6    6
print("Chi-squared Test Results:")
#> [1] "Chi-squared Test Results:"
print(chisq_test_fixed_rows)
#> 
#>  Pearson's Chi-squared test
#> 
#> data:  contingency_table_fixed_rows
#> X-squared = 3.2069, df = 6, p-value = 0.7825

Row totals are fixed, and column counts within each row follow independent multinomial distributions.
Conclusion: Fail to reject the null. The data does not provide evidence to suggest differences in column probabilities across rows.

Why Are the Results Different?

Data Generation Differences:
- In Design 1, the entire table is treated as a single multinomial sample. This introduces dependencies between counts in the table.
- In Design 2, rows are generated independently, and only the column probabilities are tested for consistency across rows.
Null Hypotheses:
- Design 1 tests independence between row and column variables (more restrictive).
- Design 2 tests homogeneity of column probabilities across rows (less restrictive).

Interpretation

The results are not directly comparable because the null hypotheses are different:
- Design 1 focuses on whether rows and columns are independent across the entire table.
- Design 2 focuses on whether column distributions are consistent across rows.
Real-World Implication:
- If you are testing for independence (e.g., whether two variables are unrelated), use Design 1.
- If you are testing for consistency across groups (e.g., whether proportions are the same across categories), use Design 2.

Takeaways

The tests use the same statistical machinery (Chi-squared test), but their interpretations differ based on the experimental design and null hypothesis.
For the same dataset, differences in assumptions can lead to different conclusions.

4.5.1.2.2 Chi-Square Test for Independence

The expected frequencies $\hat{e}_{ij}$ under the null hypothesis are:

$\hat{e}_{ij} = \frac{n_{i.} n_{.j}}{n_{..}},$

where $n_{i.}$ and $n_{.j}$ are the row and column totals, respectively, and $n_{..}$ is the total sample size.

The test statistic is:

$\chi^2 = \sum_{i=1}^r \sum_{j=1}^c \frac{(n_{ij} - \hat{e}_{ij})^2}{\hat{e}_{ij}} \sim \chi^2_{(r-1)(c-1)}.$

We reject $H_0$ at significance level $\alpha$ if:

$\chi^2 > \chi^2_{(r-1)(c-1), \alpha}.$

Notes on the Pearson Chi-Square Test

Purpose: Test for association or independence between two categorical variables.
Sensitivity to Sample Size: The $\chi^2$ statistic is proportional to sample size. Doubling the sample size doubles $\chi^2$ even if the strength of the association remains unchanged.
Assumption on Expected Frequencies: The test is not valid when more than 20% of the expected cell counts are less than 5. In such cases, exact tests are preferred.

# Create a contingency table
data_table <- matrix(c(30, 10, 20, 40), nrow = 2, byrow = TRUE)
colnames(data_table) <- c("Category 1", "Category 2")
rownames(data_table) <- c("Group 1", "Group 2")

# Display the table
print(data_table)
#>         Category 1 Category 2
#> Group 1         30         10
#> Group 2         20         40

# Perform Chi-Square Test
chi_result <- chisq.test(data_table)

# Display results
chi_result
#> 
#>  Pearson's Chi-squared test with Yates' continuity correction
#> 
#> data:  data_table
#> X-squared = 15.042, df = 1, p-value = 0.0001052

The output includes:

Chi-Square Statistic ( $\chi^2$ ): The test statistic measuring the deviation between observed and expected counts.
p-value: The probability of observing such a deviation under $H_0$ .
Degrees of Freedom: $(r-1)(c-1)$ for an $r \times c$ table.
Expected Frequencies: The table of expected counts under $H_0$ .
If the p-value is less than $\alpha$ , reject $H_0$ and conclude that there is a significant association between the row and column variables.

4.5.1.3 Key Takeaways

Comparison of tests for categorical association
Test	Purpose	Key Features	Sample Size Suitability	Statistical Assumptions
Fisher’s Exact Test	Tests association between two categorical variables in a 2x2 table.	Computes exact $p$ -values. Does not rely on asymptotic assumptions. Handles small sample sizes.	Small sample sizes	Observations are independent. Fixed marginal totals. No normality assumption.
Exact Chi-Square Test	Tests association in larger contingency tables using exact methods.	Generalization of Fisher’s Exact Test. Avoids asymptotic assumptions. Suitable for small to medium datasets.	Small to medium sample sizes	Observations are independent. Marginal totals may not be fixed. No normality assumption.
Pearson Chi-Square Test	Tests discrepancies between observed and expected frequencies.	Most common chi-square-based test. Includes independence and goodness-of-fit tests. Relies on asymptotic assumptions.	Large sample sizes	Observations are independent. Expected cell frequencies $\ge 5$ . Test statistic follows a chi-square distribution asymptotically.
Chi-Square Test for Independence	Tests independence between two categorical variables in a contingency table.	Application of Pearson Chi-Square Test. Same assumptions as asymptotic chi-square tests. Often used for larger contingency tables.	Medium to large sample sizes	Observations are independent. Expected cell frequencies $\ge 5$ . Random sampling.

Fisher’s Exact Test is specialized for small samples and fixed margins (2x2 tables).
Exact Chi-Square Test is a broader version of Fisher’s for larger tables but avoids asymptotic approximations.
Pearson Chi-Square Test is the general framework, and its applications include:
- Goodness-of-fit testing.
- Testing independence (same as the Chi-Square Test for Independence).
Chi-Square Test for Independence is a specific application of the Pearson Chi-Square Test.

In essence:

Fisher’s Exact Test and Exact Chi-Square Test are precise methods for small datasets.
Pearson Chi-Square Test and Chi-Square Test for Independence are interchangeable terms in many contexts, focusing on larger datasets.

4.5.2 Ordinal Association

Ordinal association refers to a relationship between two variables where the levels of one variable exhibit a consistent pattern of increase or decrease in response to the levels of the other variable. This type of association is particularly relevant when dealing with ordinal variables, which have naturally ordered categories, such as ratings (“poor”, “fair”, “good”, “excellent”) or income brackets (“low”, “medium”, “high”).

For example:

As customer satisfaction ratings increase from “poor” to “excellent,” the likelihood of recommending a product may also increase (positive ordinal association).
Alternatively, as stress levels move from “low” to “high,” job performance may tend to decrease (negative ordinal association).

Key Characteristics of Ordinal Association

Logical Ordering of Levels: The levels of both variables must follow a logical sequence. For instance, “small,” “medium,” and “large” are logically ordered, whereas categories like “blue,” “round,” and “tall” lack inherent order and are unsuitable for ordinal association.
Monotonic Trends: The association is typically monotonic, meaning that as one variable moves in a specific direction, the other variable tends to move in a consistent direction (either increasing or decreasing).
Tests for Ordinal Association: Specialized statistical tests assess ordinal association, focusing on how the rankings of one variable relate to those of the other. These tests require the data to respect the ordinal structure of both variables.

Practical Considerations

When using these tests, keep in mind:

Ordinal Data Handling: Ensure that the data respects the ordinal structure (e.g., categories are correctly ranked and coded).
Sample Size: Larger sample sizes provide more reliable estimates and stronger test power.
Contextual Relevance: Interpret results within the context of the data and the research question. For example, a significant Spearman’s correlation does not imply causation but rather a consistent trend.

4.5.2.1 Mantel-Haenszel Chi-square Test

The Mantel-Haenszel Chi-square Test is a statistical tool for evaluating ordinal associations, particularly when the data consists of multiple $2 \times 2$ contingency tables that examine the same association under varying conditions or strata. Unlike measures of association such as correlation coefficients, this test does not quantify the strength of the association but rather evaluates whether an association exists after controlling for stratification.

The Mantel-Haenszel Test is applicable to $2 \times 2 \times K$ contingency tables, where $K$ represents the number of strata. Each stratum is a $2 \times 2$ table corresponding to different conditions or subgroups.

For each stratum $k$ , let the marginal totals of the table be:

$n_{.1k}$ : Total observations in column 1
$n_{.2k}$ : Total observations in column 2
$n_{1.k}$ : Total observations in row 1
$n_{2.k}$ : Total observations in row 2
$n_{..k}$ : Total observations in the entire table

The observed cell count in row 1 and column 1 is denoted $n_{11k}$ . Given the marginal totals, the sampling distribution of $n_{11k}$ follows a hypergeometric distribution.

Under the assumption of conditional independence:

The expected value of $n_{11k}$ is: $m_{11k} = E(n_{11k}) = \frac{n_{1.k} n_{.1k}}{n_{..k}}$ The variance of $n_{11k}$ is: $var(n_{11k}) = \frac{n_{1.k} n_{2.k} n_{.1k} n_{.2k}}{n_{..k}^2 (n_{..k} - 1)}$

Mantel and Haenszel proposed the test statistic:

$M^2 = \frac{\left(|\sum_k n_{11k} - \sum_k m_{11k}| - 0.5\right)^2}{\sum_k var(n_{11k})} \sim \chi^2_{1}$

where

The 0.5 adjustment, known as a continuity correction, improves the approximation to the $\chi^2$ distribution.
The test statistic follows a $\chi^2$ distribution with 1 degree of freedom under the null hypothesis of conditional independence.

This method can be extended to general $I \times J \times K$ contingency tables, where $I$ and $J$ represent the number of rows and columns, respectively, and $K$ is the number of strata.

Null Hypothesis ( $H_0$ ):

There is no association between the two variables of interest across all strata, after controlling for the confounder.
In mathematical terms:

$H_0: \text{Odds Ratio (OR)} = 1 \; \text{or} \; \text{Risk Ratio (RR)} = 1$

Alternative Hypothesis ( $H_a$ ):

There is an association between the two variables of interest across all strata, after controlling for the confounder.
In mathematical terms:

$H_a: \text{Odds Ratio (OR)} \neq 1 \; \text{or} \; \text{Risk Ratio (RR)} \neq 1$

Let’s consider a scenario where a business wants to evaluate the relationship between customer satisfaction (Satisfied vs. Not Satisfied) and the likelihood of repeat purchases (Yes vs. No) across different regions (e.g., North, South, and West). The goal is to determine whether this relationship holds consistently across the regions.

# Create a 2 x 2 x 3 contingency table
CustomerData = array(
    c(40, 30, 200, 300, 35, 20, 180, 265, 50, 25, 250, 275),
    dim = c(2, 2, 3),
    dimnames = list(
        Satisfaction = c("Satisfied", "Not Satisfied"),
        RepeatPurchase = c("Yes", "No"),
        Region = c("North", "South", "West")
    )
)

# View marginal table (summarized across regions)
margin.table(CustomerData, c(1, 2))
#>                RepeatPurchase
#> Satisfaction    Yes  No
#>   Satisfied     125 630
#>   Not Satisfied  75 840

Calculate the overall odds ratio (ignoring strata):

library(samplesizeCMH)
marginal_table = margin.table(CustomerData, c(1, 2))
odds.ratio(marginal_table)
#> [1] 2.222222

Calculate the conditional odds ratios for each region:

apply(CustomerData, 3, odds.ratio)
#>    North    South     West 
#> 2.000000 2.576389 2.200000

The Mantel-Haenszel Test evaluates whether the relationship between customer satisfaction and repeat purchases remains consistent across regions:

mantelhaen.test(CustomerData, correct = TRUE)
#> 
#>  Mantel-Haenszel chi-squared test with continuity correction
#> 
#> data:  CustomerData
#> Mantel-Haenszel X-squared = 26.412, df = 1, p-value = 2.758e-07
#> alternative hypothesis: true common odds ratio is not equal to 1
#> 95 percent confidence interval:
#>  1.637116 3.014452
#> sample estimates:
#> common odds ratio 
#>          2.221488

Interpretation

Overall Odds Ratio: This provides an estimate of the overall association between satisfaction and repeat purchases, ignoring regional differences.
Conditional Odds Ratios: These show whether the odds of repeat purchases given satisfaction are similar across regions.
Mantel-Haenszel Test: A significant test result (e.g., $p < 0.05$ ) suggests that the relationship between satisfaction and repeat purchases is consistent across regions. Conversely, a non-significant result implies that regional differences may affect the association. By applying the Mantel-Haenszel Test, businesses can determine if a marketing or customer retention strategy should be uniformly applied or customized to account for regional variations.
1. There is strong evidence to suggest that the two variables of interest are associated across the strata (North, South, and West), even after accounting for potential confounding effects of stratification.
2. The common odds ratio of approximately $2.22$ indicates a substantial association, meaning that the outcome is more likely in the exposed group compared to the unexposed group.
3. The variability in the stratum-specific odds ratios suggests that the strength of the association may differ slightly by region, but the Mantel-Haenszel test assumes the association is consistent (homogeneous).

4.5.2.2 McNemar’s Test

McNemar’s Test is a special case of the Mantel-Haenszel Chi-square Test, designed for paired nominal data. It is particularly useful for evaluating changes in categorical responses before and after a treatment or intervention, or for comparing paired responses in matched samples. Unlike the Mantel-Haenszel Test, which handles stratified data, McNemar’s Test is tailored to situations with a single $2 \times 2$ table derived from paired observations.

McNemar’s Test assesses whether the proportions of discordant pairs (off-diagonal elements in a $2 \times 2$ table) are significantly different. Specifically, it tests the null hypothesis that the probabilities of transitioning from one category to another are equal.

Null Hypothesis ( $H_0$ ): $P(\text{Switch from A to B}) = P(\text{Switch from B to A})$ This implies that the probabilities of transitioning from one category to the other are equal, or equivalently, the off-diagonal cell counts ( $n_{12}$ and $n_{21}$ ) are symmetric: $H_0: n_{12} = n_{21}$
Alternative Hypothesis ( $H_A$ ): $P(\text{Switch from A to B}) \neq P(\text{Switch from B to A})$ This suggests that the probabilities of transitioning between categories are not equal, or equivalently, the off-diagonal cell counts ( $n_{12}$ and $n_{21}$ ) are asymmetric: $H_A: n_{12} \neq n_{21}$

For example, consider a business analyzing whether a new advertising campaign influences customer preference for two products (A and B). Each customer is surveyed before and after the campaign, resulting in the following $2 \times 2$ contingency table:

Before rows: Preference for Product A or B before the campaign.
After columns: Preference for Product A or B after the campaign.

Let the table structure be:

Customer switching behavior between Product A and Product B
	After A	After B
Before A	$n_{11}$	$n_{12}$
Before B	$n_{21}$	$n_{22}$

$n_{12}$ : Customers who switched from Product A to B.
$n_{21}$ : Customers who switched from Product B to A.

The test focuses on $n_{12}$ and $n_{21}$ , as they represent the discordant pairs.

The McNemar’s Test statistic is: $M^2 = \frac{(|n_{12} - n_{21}| - 0.5)^2}{n_{12} + n_{21}}$ where

The 0.5 is a continuity correction applied when sample sizes are small.
Under the null hypothesis of no preference change, $M^2$ follows a $\chi^2$ distribution with 1 degree of freedom.

Let’s analyze a voting behavior study where participants were surveyed before and after a campaign. The table represents:

Rows: Voting preference before the campaign (Yes, No).
Columns: Voting preference after the campaign (Yes, No).

# Voting preference before and after a campaign
vote = matrix(c(682, 22, 86, 810), nrow = 2, byrow = TRUE,
              dimnames = list(
                "Before" = c("Yes", "No"),
                "After" = c("Yes", "No")
              ))

# Perform McNemar's Test with continuity correction
mcnemar_result <- mcnemar.test(vote, correct = TRUE)
mcnemar_result
#> 
#>  McNemar's Chi-squared test with continuity correction
#> 
#> data:  vote
#> McNemar's chi-squared = 36.75, df = 1, p-value = 1.343e-09

The test provides:

Test statistic ( $M^2$ ): Quantifies the asymmetry in discordant pairs.
p-value: Indicates whether there is a significant difference in the discordant proportions.

Interpretation

Test Statistic: A large $M^2$ value suggests significant asymmetry in the discordant pairs.
p-value:
- A low p-value (e.g., $p < 0.05$ ) rejects the null hypothesis, indicating that the proportion of participants switching preferences (e.g., from Yes to No) is significantly different from those switching in the opposite direction (e.g., from No to Yes).
- A high p-value fails to reject the null hypothesis, suggesting no significant preference change.

McNemar’s Test is widely used in business and other fields:

Marketing Campaigns: Evaluating whether a campaign shifts consumer preferences or purchase intentions.
Product Testing: Determining if a new feature or redesign changes customer ratings.
Healthcare Studies: Analyzing treatment effects in paired medical trials.

4.5.2.3 McNemar-Bowker Test

The McNemar-Bowker Test is an extension of McNemar’s Test, designed for analyzing paired nominal data with more than two categories. It evaluates the symmetry of the full contingency table by comparing the off-diagonal elements across all categories. This test is particularly useful for understanding whether changes between categories are uniformly distributed or whether significant asymmetries exist.

Let the data be structured in an $r \times r$ square contingency table, where $r$ is the number of categories, and the off-diagonal elements represent transitions between categories.

The hypotheses for the McNemar-Bowker Test are:

Null Hypothesis ( $H_0$ ): $\begin{aligned} P(\text{Switch from Category } i \text{ to Category } j) &= P(\text{Switch from Category } j \text{ to Category } i) \\ & \forall i \neq j \end{aligned}$ This implies that the off-diagonal elements are symmetric, and there is no directional preference in category transitions.
Alternative Hypothesis ( $H_A$ ): $\begin{aligned} P(\text{Switch from Category } i \text{ to Category } j) &\neq P(\text{Switch from Category } j \text{ to Category } i) \\ & \text{for at least one pair } (i, j) \end{aligned}$ This suggests that the off-diagonal elements are not symmetric, indicating a directional preference in transitions between at least one pair of categories.

The McNemar-Bowker Test statistic is: $B^2 = \sum_{i < j} \frac{(n_{ij} - n_{ji})^2}{n_{ij} + n_{ji}}$

where

$n_{ij}$ : Observed count of transitions from category $i$ to category $j$ .
$n_{ji}$ : Observed count of transitions from category $j$ to category $i$ .

Under the null hypothesis, the test statistic $B^2$ approximately follows a $\chi^2$ distribution with $\frac{r(r-1)}{2}$ degrees of freedom (corresponding to the number of unique pairs of categories).

For example, a company surveys customers about their satisfaction before and after implementing a new policy. Satisfaction is rated on a scale of 1 to 3 (1 = Low, 2 = Medium, 3 = High). The paired responses are summarized in the following $3 \times 3$ contingency table.

# Satisfaction ratings before and after the intervention
satisfaction_table <- matrix(c(
    30, 10, 5,  # Before: Low
    8, 50, 12,  # Before: Medium
    6, 10, 40   # Before: High
), nrow = 3, byrow = TRUE,
dimnames = list(
    "Before" = c("Low", "Medium", "High"),
    "After" = c("Low", "Medium", "High")
))

# Function to perform McNemar-Bowker Test
mcnemar_bowker_test <- function(table) {
  if (!all(dim(table)[1] == dim(table)[2])) {
    stop("Input must be a square matrix.")
  }
  
  # Extract off-diagonal elements
  n <- nrow(table)
  stat <- 0
  df <- 0
  
  for (i in 1:(n - 1)) {
    for (j in (i + 1):n) {
      nij <- table[i, j]
      nji <- table[j, i]
      stat <- stat + (nij - nji)^2 / (nij + nji)
      df <- df + 1
    }
  }
  
  p_value <- pchisq(stat, df = df, lower.tail = FALSE)
  return(list(statistic = stat, df = df, p_value = p_value))
}

# Run the test
result <- mcnemar_bowker_test(satisfaction_table)

# Print results
cat("McNemar-Bowker Test Results:\n")
#> McNemar-Bowker Test Results:
cat("Test Statistic (B^2):", result$statistic, "\n")
#> Test Statistic (B^2): 0.4949495
cat("Degrees of Freedom:", result$df, "\n")
#> Degrees of Freedom: 3
cat("p-value:", result$p_value, "\n")
#> p-value: 0.9199996

The output includes:

Test Statistic ( $B^2$ ): A measure of the asymmetry in the off-diagonal elements.
p-value: The probability of observing the data under the null hypothesis of symmetry.

Interpretation

Test Statistic: A large $B^2$ value suggests substantial asymmetry in transitions between categories.
p-value:
- If the p-value is less than the significance level (e.g., $p < 0.05$), we reject the null hypothesis, indicating significant asymmetry in the transitions between at least one pair of categories.
- If the p-value is greater than the significance level, we fail to reject the null hypothesis, suggesting that the category transitions are symmetric.

The McNemar-Bowker Test has broad applications in business and other fields:

Customer Feedback Analysis: Evaluating changes in customer satisfaction levels before and after interventions.
Marketing Campaigns: Assessing shifts in brand preferences across multiple brands in response to an advertisement.
Product Testing: Understanding how user preferences among different product features change after a redesign.

4.5.2.4 Stuart-Maxwell Test

The Stuart-Maxwell Test is used for analyzing changes in paired categorical data with more than two categories. It is a generalization of McNemar’s Test, applied to square contingency tables where the off-diagonal elements represent transitions between categories. Unlike the McNemar-Bowker Test, which tests for symmetry across all pairs, the Stuart-Maxwell Test focuses on overall marginal homogeneity.

The test evaluates whether the marginal distributions of paired data are consistent across categories. This is particularly useful when investigating whether the distribution of responses has shifted between two conditions, such as before and after an intervention.

Hypotheses for the Stuart-Maxwell Test

Null Hypothesis ( $H_0$ ): $\text{The marginal distributions of the paired data are homogeneous (no difference).}$
Alternative Hypothesis ( $H_A$ ): $\text{The marginal distributions of the paired data are not homogeneous (there is a difference).}$

The Stuart-Maxwell Test statistic is calculated as: $M^2 = \mathbf{b}' \mathbf{V}^{-1} \mathbf{b}$ where:

$\mathbf{b}$ : Vector of differences between the marginal totals of paired categories.
$\mathbf{V}$ : Covariance matrix of $\mathbf{b}$ under the null hypothesis.

The test statistic $M^2$ follows a $\chi^2$ distribution with $(r - 1)$ degrees of freedom, where $r$ is the number of categories.

A company surveys employees about their satisfaction levels (Low, Medium, High) before and after implementing a new workplace policy. The results are summarized in the following $3 \times 3$ contingency table.

# Employee satisfaction data before and after a policy change
satisfaction_table <- matrix(c(
    40, 10, 5,  # Before: Low
    8, 50, 12,  # Before: Medium
    6, 10, 40   # Before: High
), nrow = 3, byrow = TRUE,
dimnames = list(
    "Before" = c("Low", "Medium", "High"),
    "After" = c("Low", "Medium", "High")
))

# Function to perform the Stuart-Maxwell Test
stuart_maxwell_test <- function(table) {
  if (!all(dim(table)[1] == dim(table)[2])) {
    stop("Input must be a square matrix.")
  }
  
  # Marginal totals for each category
  row_totals <- rowSums(table)
  col_totals <- colSums(table)
  
  # Vector of differences between row and column marginal totals
  b <- row_totals - col_totals
  
  # Covariance matrix under the null hypothesis
  total <- sum(table)
  V <- diag(row_totals + col_totals) - 
       (outer(row_totals, col_totals, "+") / total)
  
  # Calculate the test statistic
  M2 <- t(b) %*% solve(V) %*% b
  df <- nrow(table) - 1
  p_value <- pchisq(M2, df = df, lower.tail = FALSE)
  
  return(list(statistic = M2, df = df, p_value = p_value))
}

# Run the Stuart-Maxwell Test
result <- stuart_maxwell_test(satisfaction_table)

# Print the results
cat("Stuart-Maxwell Test Results:\n")
#> Stuart-Maxwell Test Results:
cat("Test Statistic (M^2):", result$statistic, "\n")
#> Test Statistic (M^2): 0.01802387
cat("Degrees of Freedom:", result$df, "\n")
#> Degrees of Freedom: 2
cat("p-value:", result$p_value, "\n")
#> p-value: 0.9910286

Interpretation

Test Statistic: Measures the extent of marginal differences in the table.
p-value:
- A low p-value (e.g., $p < 0.05$ ) indicates significant differences between the marginal distributions, suggesting a change in the distribution of responses.
- A high p-value suggests no evidence of marginal differences, meaning the distribution is consistent across conditions.

Practical Applications of the Stuart-Maxwell Test

Employee Surveys: Analyzing shifts in satisfaction levels before and after policy changes.
Consumer Studies: Evaluating changes in product preferences before and after a marketing campaign.
Healthcare Research: Assessing changes in patient responses to treatments across categories.

4.5.2.5 Cochran-Mantel-Haenszel (CMH) Test

The Cochran-Mantel-Haenszel (CMH) Test is a generalization of the Mantel-Haenszel Chi-square Test. It evaluates the association between two variables while controlling for the effect of a third stratifying variable. This test is particularly suited for ordinal data, allowing researchers to detect trends and associations across strata.

The CMH Test addresses scenarios where:

Two variables (e.g., exposure and outcome) are ordinal or nominal.
A third variable (e.g., a demographic or environmental factor) stratifies the data into $K$ independent groups.

The test answers: Is there a consistent association between the two variables across the strata defined by the third variable?

The CMH Test has three main variations depending on the nature of the data:

Correlation Test for Ordinal Data: Assesses whether there is a linear association between two ordinal variables across strata.
General Association Test: Tests for any association (not necessarily ordinal) between two variables while stratifying by a third.
Homogeneity Test: Checks whether the strength of the association between the two variables is consistent across strata.

Hypotheses

Null Hypothesis ( $H_0$ ): There is no association between the two variables across all strata, or the strength of the association is consistent across strata.
Alternative Hypothesis ( $H_A$ ): There is an association between the two variables in at least one stratum, or the strength of the association varies across strata.

The CMH test statistic is: $CMH = \frac{\left( \sum_{k} \left(O_k - E_k \right)\right)^2}{\sum_{k} V_k}$ Where:

$O_k$ : Observed counts in stratum $k$ .
$E_k$ : Expected counts in stratum $k$ , calculated under the null hypothesis.
$V_k$ : Variance of the observed counts in stratum $k$ .

The test statistic follows a $\chi^2$ distribution with 1 degree of freedom under the null hypothesis.

A company evaluates whether sales performance (Low, Medium, High) is associated with product satisfaction (Low, Medium, High) across three experience levels (Junior, Mid-level, Senior). The data is organized into a $3 \times 3 \times 3$ contingency table.

# Sales performance data
sales_data <- array(
    c(20, 15, 10, 12, 18, 15, 8, 12, 20,   # Junior
      25, 20, 15, 20, 25, 30, 10, 15, 20,  # Mid-level
      30, 25, 20, 28, 32, 35, 15, 20, 30), # Senior
    dim = c(3, 3, 3),
    dimnames = list(
        SalesPerformance = c("Low", "Medium", "High"),
        Satisfaction = c("Low", "Medium", "High"),
        ExperienceLevel = c("Junior", "Mid-level", "Senior")
    )
)

# Load the vcd package for the CMH test
library(vcd)

# Perform CMH Test
cmh_result <- mantelhaen.test(sales_data, correct = FALSE)
cmh_result
#> 
#>  Cochran-Mantel-Haenszel test
#> 
#> data:  sales_data
#> Cochran-Mantel-Haenszel M^2 = 22.454, df = 4, p-value = 0.0001627

Interpretation

Test Statistic: A large CMH statistic suggests a significant association between sales performance and satisfaction after accounting for experience level.
p-value:
- A low p-value (e.g., $p < 0.05$ ) indicates a significant association between the two variables across strata.
- A high p-value suggests no evidence of association or that the relationship is consistent across all strata.

Practical Applications of the CMH Test

Business Performance Analysis: Investigating the relationship between customer satisfaction and sales performance across different demographic groups.
Healthcare Studies: Assessing the effect of treatment (e.g., dosage) on outcomes while controlling for patient characteristics (e.g., age groups).
Educational Research: Analyzing the relationship between test scores and study hours, stratified by teaching method.

4.5.2.6 Summary Table of Tests

The following table provides a concise guide on when and why to use each test:

Tests for categorical association
Test Name	When to Use	Key Question Addressed	Data Requirements
Mantel-Haenszel Chi-square Test	When testing for association between two binary variables across multiple strata.	Is there a consistent association across strata?	Binary variables in $2 \times 2 \times K$ tables.
McNemar’s Test	When analyzing marginal symmetry in paired binary data.	Are the proportions of discordant pairs equal?	Paired binary responses ( $2 \times 2$ table).
McNemar-Bowker Test	When testing for symmetry in paired nominal data with more than two categories.	Are the off-diagonal elements symmetric across all categories?	Paired nominal data in $r \times r$ tables.
Cochran-Mantel-Haenszel (CMH) Test	When testing ordinal or general associations while controlling for a stratifying variable.	Is there an association between two variables after stratification?	Ordinal or nominal data in $I \times J \times K$ tables.
Stuart-Maxwell Test	When analyzing marginal homogeneity in paired nominal data with more than two categories.	Are the marginal distributions of paired data homogeneous?	Paired nominal data in $r \times r$ tables.

How to Choose the Right Test

Paired vs. Stratified Data:
- Use McNemar’s Test or McNemar-Bowker Test for paired data.
- Use Mantel-Haenszel Chi-square Test or Cochran-Mantel-Haenszel (CMH) Test for stratified data.
Binary vs. Multi-category Variables:
- Use McNemar’s Test for binary data.
- Use McNemar-Bowker Test or Stuart-Maxwell Test for multi-category data.
Ordinal Trends:
- Use the Cochran-Mantel-Haenszel (CMH) Test if testing for ordinal associations while controlling for a stratifying variable.

4.5.3 Ordinal Trend

When analyzing ordinal data, it is often important to determine whether a consistent trend exists between variables. Tests for trend are specifically designed to detect monotonic relationships where changes in one variable are systematically associated with changes in another. These tests are widely used in scenarios involving ordered categories, such as customer satisfaction ratings, income brackets, or educational levels.

The primary objectives of trend tests are:

To detect monotonic relationships: Determine if higher or lower categories of one variable are associated with higher or lower categories of another variable.
To account for ordinal structure: Leverage the inherent order in the data to provide more sensitive and interpretable results compared to tests designed for nominal data.

Key Considerations for Trend Tests

Data Structure:
- Ensure that the variables have a natural order and are treated as ordinal.
- Verify that the trend test chosen matches the data structure (e.g., binary outcome vs. multi-level ordinal variables).
Assumptions:
- Many tests assume monotonic trends, meaning that the relationship should not reverse direction.
Interpretation:
- A significant result indicates the presence of a trend but does not imply causality.
- The direction and strength of the trend should be carefully interpreted in the context of the data.

4.5.3.1 Cochran-Armitage Test

The Cochran-Armitage Test for Trend is a statistical method designed to detect a linear trend in proportions across ordered categories of a predictor variable. It is particularly useful in $2 \times J$ contingency tables, where there is a binary outcome (e.g., success/failure) and an ordinal predictor variable with $J$ ordered levels.

The Cochran-Armitage Test evaluates whether the proportion of a binary outcome changes systematically across the levels of an ordinal predictor. This test leverages the ordinal nature of the predictor to enhance sensitivity and power compared to general chi-square tests.

Hypotheses

Null Hypothesis ( $H_0$ ): $\text{The proportion of the binary outcome is constant across the levels of the ordinal predictor.}$
Alternative Hypothesis ( $H_A$ ): $\text{There is a linear trend in the proportion of the binary outcome across the levels of the ordinal predictor.}$

The Cochran-Armitage Test statistic is calculated as:

$Z = \frac{\sum_{j=1}^{J} w_j (n_{1j} - N_j \hat{p})}{\sqrt{\hat{p} (1 - \hat{p}) \sum_{j=1}^{J} w_j^2 N_j}}$

Where:

$n_{1j}$ : Count of the binary outcome (e.g., “success”) in category $j$ .
$N_j$ : Total number of observations in category $j$ .
$\hat{p}$ : Overall proportion of the binary outcome, calculated as: $\hat{p} = \frac{\sum_{j=1}^{J} n_{1j}}{\sum_{j=1}^{J} N_j}$
$w_j$ : Score assigned to the $j$ th category of the ordinal predictor, often set to $j$ for equally spaced levels.

The test statistic $Z$ follows a standard normal distribution under the null hypothesis.

Key Assumptions

Ordinal Predictor: The categories of the predictor variable must have a natural order.
Binary Outcome: The response variable must be dichotomous (e.g., success/failure).
Independent Observations: Observations within and across categories are independent.

Let’s consider a study examining whether the success rate of a marketing campaign varies across three income levels (Low, Medium, High). The data is structured in a $2 \times 3$ contingency table:

Marketing campaign success by income level in a contingency table
Income Level	Success	Failure	Total
Low	20	30	50
Medium	35	15	50
High	45	5	50

# Data: Success and Failure counts by Income Level
income_levels <- c("Low", "Medium", "High")
success <- c(20, 35, 45)
failure <- c(30, 15, 5)
total <- success + failure

# Scores for ordinal levels (can be custom weights)
scores <- 1:length(income_levels)

# Cochran-Armitage Test
# Function to calculate Z statistic
cochran_armitage_test <- function(success, failure, scores) {
  N <- success + failure
  p_hat <- sum(success) / sum(N)
  weights <- scores
  
  # Calculate numerator
  numerator <- sum(weights * (success - N * p_hat))
  
  # Calculate denominator
  denominator <- sqrt(p_hat * (1 - p_hat) * sum(weights^2 * N))
  
  # Z statistic
  Z <- numerator / denominator
  p_value <- 2 * (1 - pnorm(abs(Z)))
  
  return(list(Z_statistic = Z, p_value = p_value))
}

# Perform the test
result <- cochran_armitage_test(success, failure, scores)

# Print results
cat("Cochran-Armitage Test for Trend Results:\n")
#> Cochran-Armitage Test for Trend Results:
cat("Z Statistic:", result$Z_statistic, "\n")
#> Z Statistic: 2.004459
cat("p-value:", result$p_value, "\n")
#> p-value: 0.04502088

Interpretation

Test Statistic ( $Z$ ):
- The $Z$ value indicates the strength and direction of the trend.
- Positive $Z$ : Proportions increase with higher categories.
- Negative $Z$ : Proportions decrease with higher categories.
p-value:
- A low p-value (e.g., $p < 0.05$ ) rejects the null hypothesis, indicating a significant linear trend.
- A high p-value fails to reject the null hypothesis, suggesting no evidence of a trend.

Practical Applications

Marketing: Analyzing whether customer success rates vary systematically across income levels or demographics.
Healthcare: Evaluating the dose-response relationship between medication levels and recovery rates.
Education: Studying whether pass rates improve with higher levels of educational support.

4.5.3.2 Jonckheere-Terpstra Test

The Jonckheere-Terpstra Test is a nonparametric test designed to detect ordered differences between groups. It is particularly suited for ordinal data where both the predictor and response variables exhibit a monotonic trend. Unlike general nonparametric tests like the Kruskal-Wallis test, which assess any differences between groups, the Jonckheere-Terpstra Test specifically evaluates whether the data follows a prespecified ordering.

The Jonckheere-Terpstra Test determines whether:

There is a monotonic trend in the response variable across ordered groups of the predictor.
The data aligns with an a priori hypothesized order (e.g., group 1 < group 2 < group 3).

Hypotheses

Null Hypothesis ( $H_0$ ): $\text{There is no trend in the response variable across the ordered groups.}$
Alternative Hypothesis ( $H_A$ ): $\text{The response variable exhibits a monotonic trend across the ordered groups.}$

The trend can be increasing, decreasing, or as otherwise hypothesized.

The Jonckheere-Terpstra Test statistic is based on the number of pairwise comparisons ( $U$ ) that are consistent with the hypothesized trend. For $k$ groups:

Compare all possible pairs of observations across groups.
Count the number of pairs where the values are consistent with the hypothesized order.

The test statistic $T$ is the sum of all pairwise comparisons: $T = \sum_{i < j} T_{ij}$ Where $T_{ij}$ is the number of concordant pairs between groups $i$ and $j$ .

Under the null hypothesis, $T$ follows a normal distribution with:

Mean: $\mu_T = \frac{N (N - 1)}{4}$
Variance: $\sigma_T^2 = \frac{N (N - 1) (2N + 1)}{24}$ Where $N$ is the total number of observations.

The standardized test statistic is: $Z = \frac{T - \mu_T}{\sigma_T}$

Key Assumptions

Ordinal or Interval Data: The response variable must be at least ordinal, and the groups must have a logical order.
Independent Groups: Observations within and between groups are independent.
Consistent Hypothesis: The trend (e.g., increasing or decreasing) must be specified in advance.

Let’s consider a study analyzing whether customer satisfaction ratings (on a scale of 1 to 5) improve with increasing levels of service tiers (Basic, Standard, Premium). The data is grouped by service tier, and we hypothesize that satisfaction ratings increase with higher service tiers.

# Example Data: Customer Satisfaction Ratings by Service Tier
satisfaction <- list(
  Basic = c(2, 3, 2, 4, 3),
  Standard = c(3, 4, 3, 5, 4),
  Premium = c(4, 5, 4, 5, 5)
)

# Prepare data
ratings <- unlist(satisfaction)
groups <- factor(rep(names(satisfaction), times = sapply(satisfaction, length)))

# Calculate pairwise comparisons
manual_jonckheere <- function(ratings, groups) {
  n_groups <- length(unique(groups))
  pairwise_comparisons <- 0
  total_pairs <- 0
  
  # Iterate over group pairs
  for (i in 1:(n_groups - 1)) {
    for (j in (i + 1):n_groups) {
      group_i <- ratings[groups == levels(groups)[i]]
      group_j <- ratings[groups == levels(groups)[j]]
      
      # Count concordant pairs
      for (x in group_i) {
        for (y in group_j) {
          if (x < y) pairwise_comparisons <- pairwise_comparisons + 1
          if (x == y) pairwise_comparisons <- pairwise_comparisons + 0.5
          total_pairs <- total_pairs + 1
        }
      }
    }
  }
  
  # Compute test statistic
  T <- pairwise_comparisons
  N <- length(ratings)
  mu_T <- total_pairs / 2
  sigma_T <- sqrt(total_pairs * (N + 1) / 12)
  
  Z <- (T - mu_T) / sigma_T
  p_value <- 2 * (1 - pnorm(abs(Z)))
  
  return(list(T_statistic = T, Z_statistic = Z, p_value = p_value))
}

# Perform the test
result <- manual_jonckheere(ratings, groups)

# Print results
cat("Jonckheere-Terpstra Test Results:\n")
#> Jonckheere-Terpstra Test Results:
cat("T Statistic (Sum of Concordant Pairs):", result$T_statistic, "\n")
#> T Statistic (Sum of Concordant Pairs): 49.5
cat("Z Statistic:", result$Z_statistic, "\n")
#> Z Statistic: 1.2
cat("p-value:", result$p_value, "\n")
#> p-value: 0.2301393

Interpretation

Test Statistic ( $T$ ):
- Represents the sum of all pairwise comparisons consistent with the hypothesized order.
- Includes 0.5 for tied pairs.
$Z$ Statistic:
- A standardized measure of the strength of the trend.
- Calculated using $T$ , the expected value of $T$ under the null hypothesis ( $\mu_T$ ), and the variance of $T$ ( $\sigma_T^2$ ).
p-value:
- A low p-value (e.g., $p < 0.05$ ) rejects the null hypothesis, indicating a significant trend in the response variable across ordered groups.
- A high p-value fails to reject the null hypothesis, suggesting no evidence of a trend.

Practical Applications

Customer Experience Analysis: Assessing whether customer satisfaction increases with higher service levels or product tiers.
Healthcare Studies: Testing whether recovery rates improve with increasing doses of a treatment.
Education Research: Analyzing whether test scores improve with higher levels of educational intervention.

4.5.3.3 Mantel Test for Trend

The Mantel Test for Trend is a statistical method designed to detect a linear association between two ordinal variables. It is an extension of the Mantel-Haenszel Chi-square Test and is particularly suited for analyzing trends in ordinal contingency tables, such as $I \times J$ tables where both variables are ordinal.

The Mantel Test for Trend evaluates whether an increasing or decreasing trend exists between two ordinal variables. It uses the ordering of categories to assess linear relationships, making it more sensitive to trends compared to general association tests like chi-square.

Hypotheses

Null Hypothesis ( $H_0$ ): $\text{There is no linear association between the two ordinal variables.}$
Alternative Hypothesis ( $H_A$ ): $\text{There is a significant linear association between the two ordinal variables.}$

The Mantel Test is based on the Pearson correlation between the row and column scores in an ordinal contingency table. The test statistic is: $M = \frac{\sum_{i} \sum_{j} w_i w_j n_{ij}}{\sqrt{\sum_{i} w_i^2 n_{i\cdot} \sum_{j} w_j^2 n_{\cdot j}}}$

Where:

$n_{ij}$ : Observed frequency in cell $(i, j)$ .
$n_{i\cdot}$ : Row marginal total for row $i$ .
$n_{\cdot j}$ : Column marginal total for column $j$ .
$w_i$ : Score for the $i$ th row.
ore for the $j$ th column.

The test statistic $M$ is asymptotically normally distributed under the null hypothesis.

Key Assumptions

Ordinal Variables: Both variables must have a natural order.
Linear Trend: Assumes a linear relationship between the scores assigned to the rows and columns.
Independence: Observations must be independent.

Let’s consider a marketing study evaluating whether customer satisfaction levels (Low, Medium, High) are associated with increasing purchase frequency (Low, Medium, High).

# Customer satisfaction and purchase frequency data
data <- matrix(
  c(10, 5, 2, 15, 20, 8, 25, 30, 12), 
  nrow = 3, 
  byrow = TRUE,
  dimnames = list(
    Satisfaction = c("Low", "Medium", "High"),
    Frequency = c("Low", "Medium", "High")
  )
)

# Assign scores for rows and columns
row_scores <- 1:nrow(data)
col_scores <- 1:ncol(data)

# Compute Mantel statistic manually
mantel_test_manual <- function(data, row_scores, col_scores) {
  numerator <- sum(outer(row_scores, col_scores, "*") * data)
  row_marginals <- rowSums(data)
  col_marginals <- colSums(data)
  row_variance <- sum(row_scores^2 * row_marginals)
  col_variance <- sum(col_scores^2 * col_marginals)
  
  M <- numerator / sqrt(row_variance * col_variance)
  z_value <- M
  p_value <- 2 * (1 - pnorm(abs(z_value))) # Two-tailed test
  
  return(list(Mantel_statistic = M, p_value = p_value))
}

# Perform the Mantel Test
result <- mantel_test_manual(data, row_scores, col_scores)

# Display results
cat("Mantel Test for Trend Results:\n")
#> Mantel Test for Trend Results:
cat("Mantel Statistic (M):", result$Mantel_statistic, "\n")
#> Mantel Statistic (M): 0.8984663
cat("p-value:", result$p_value, "\n")
#> p-value: 0.368937

Interpretation

Test Statistic ( $M$ ):
- Represents the strength and direction of the linear association.
- Positive $M$ : Increasing trend.
- Negative $M$ : Decreasing trend.
p-value:
- A low p-value (e.g., $p < 0.05$ ) indicates a significant linear association.
- A high p-value suggests no evidence of a trend.

Practical Applications

Marketing Analysis: Investigating whether satisfaction levels are associated with purchase behavior or loyalty.
Healthcare Research: Testing for a dose-response relationship between treatment levels and outcomes.
Social Sciences: Analyzing trends in survey responses across ordered categories.

4.5.3.4 Chi-square Test for Linear Trend

The Chi-square Test for Linear Trend is a statistical method used to detect a linear relationship between an ordinal predictor and a binary outcome. It is an extension of the chi-square test, designed specifically for ordered categories, making it more sensitive to linear trends in proportions compared to a general chi-square test of independence.

The Chi-square Test for Linear Trend evaluates whether the proportions of a binary outcome (e.g., success/failure) change systematically across ordered categories of a predictor variable. It is widely used in situations such as analyzing dose-response relationships or evaluating trends in survey responses.

Hypotheses

Null Hypothesis ( $H_0$ ): There is no linear trend in the proportions of the binary outcome across ordered categories.
Alternative Hypothesis ( $H_A$ ): There is a significant linear trend in the proportions of the binary outcome across ordered categories.

The test statistic is:

$X^2_{\text{trend}} = \frac{\left( \sum_{j=1}^J w_j (p_j - \bar{p}) N_j \right)^2}{\sum_{j=1}^J w_j^2 \bar{p} (1 - \bar{p}) N_j}$

Where: - $J$ : Number of ordered categories. - $w_j$ : Scores assigned to the $j$ th category (typically $j = 1, 2, \dots, J$ ). - $p_j$ : Proportion of success in the $j$ th category. - $\bar{p}$ : Overall proportion of success across all categories. - $N_j$ : Total number of observations in the $j$ th category.

The test statistic follows a chi-square distribution with 1 degree of freedom under the null hypothesis.

Key Assumptions

Binary Outcome: The response variable must be binary (e.g., success/failure).
Ordinal Predictor: The predictor variable must have a natural order.
Independent Observations: Data across categories must be independent.

Let’s consider a study analyzing whether the proportion of customers who recommend a product increases with customer satisfaction levels (Low, Medium, High).

# Example Data: Customer Satisfaction and Recommendation
satisfaction_levels <- c("Low", "Medium", "High")
success <- c(20, 35, 50)  # Number of customers who recommend the product
failure <- c(30, 15, 10)  # Number of customers who do not recommend the product
total <- success + failure

# Assign ordinal scores
scores <- 1:length(satisfaction_levels)

# Calculate overall proportion of success
p_hat <- sum(success) / sum(total)

Interpretation

Chi-square Statistic ( $X^2_{\text{trend}}$ ):
- Indicates the strength of the linear trend in the proportions.
p-value:
- A low p-value (e.g., $p < 0.05$ ) rejects the null hypothesis, indicating a significant linear trend.
- A high p-value suggests no evidence of a linear trend.

Practical Applications

Marketing: Analyzing whether customer satisfaction levels predict product recommendations or repurchase intentions.
Healthcare: Evaluating dose-response relationships in clinical trials.
Education: Testing whether higher levels of intervention improve success rates.

4.5.3.5 Key Takeways

Test	Purpose	Key Assumptions	Use Cases
Cochran-Armitage Test	Tests for a linear trend in proportions across ordinal categories.	Binary response variable. Predictor variable is ordinal.	Evaluating dose-response relationships, comparing proportions across ordinal groups.
Jonckheere-Terpstra Test	Tests for a monotonic trend in a response variable across ordered groups.	Response variable is continuous or ordinal. Predictor variable is ordinal.	Comparing medians or distributions across ordinal groups, e.g., treatment levels.
Mantel Test for Trend	Evaluates a linear association between an ordinal predictor and response.	Ordinal variables. Linear trend expected.	Determining trends in stratified or grouped data.
Chi-square Test for Linear Trend	Tests for linear trends in categorical data using contingency tables.	Contingency table with ordinal predictor. Sufficient sample size (expected frequencies $> 5$ ).	Analyzing trends in frequency data, e.g., examining disease prevalence by age groups.