16.3 False Discovery Rate
When conducting multiple hypothesis tests simultaneously, we increase the probability of false positives (Type I errors). Traditional correction methods like Bonferroni correction are too conservative, reducing statistical power.
The False Discovery Rate (FDR), introduced by Benjamini and Hochberg (1995), is a more flexible and powerful approach that controls the proportion of false discoveries (incorrect rejections of the null hypothesis) while maintaining a reasonable chance of detecting true effects.
Suppose we perform m independent hypothesis tests, each with a significance level α. The probability of making at least one Type I error (false positive) is:
P(at least one false positive)=1−(1−α)m
For example, with α=0.05 and m=20 tests:
P(at least one false positive)=1−(0.95)20≈0.64
Thus, if we do not adjust for multiple testing, we are highly likely to reject at least one true null hypothesis just by chance.
Family-Wise Error Rate vs. False Discovery Rate
Approach | Controls | Method | Pros | Cons |
---|---|---|---|---|
Bonferroni Correction | FWER (Probability of ≥1 false positive) | Adjusts α: α/m | Very conservative, reduces false positives | Low power, increases false negatives |
False Discovery Rate | Expected proportion of false discoveries | Adjusts p-values dynamically | Higher power, fewer false negatives | Some false positives allowed |
Why FDR?
- FWER control (Bonferroni, Holm) is too strict, reducing true discoveries.
- FDR control allows a small fraction of false positives while keeping most discoveries valid.
Let:
m = total number of hypotheses tested
V = number of false discoveries (Type I errors)
R = total number of rejected null hypotheses
Then, the False Discovery Rate is:
FDR=E[Vmax
- If no null hypotheses are rejected (R = 0), we define FDR = 0.
- Unlike FWER, which controls the probability of any false positives, FDR controls the expected proportion of false positives.
16.3.1 Benjamini-Hochberg Procedure
The Benjamini-Hochberg (BH) procedure is the most widely used FDR-controlling method (Benjamini and Hochberg 1995). It works as follows:
Step-by-Step Algorithm:
Perform m hypothesis tests and obtain p-values: p_1, p_2, ..., p_m.
Rank the p-values in ascending order: p_{(1)} \leq p_{(2)} \leq ... \leq p_{(m)}.
Calculate the Benjamini-Hochberg critical value for each test:
p_{(i)} \leq \frac{i}{m} \alpha
Find the largest i where p_{(i)} \leq \frac{i}{m} \alpha.
Reject all hypotheses with p \leq p_{(i)}.
Interpretation:
- This ensures that the expected proportion of false discoveries is controlled at level \alpha.
- Unlike Bonferroni, it does not require independence of tests, making it more powerful.
16.3.1.1 Example 1: FDR Correction on Simulated Data
set.seed(123)
# Generate 20 random p-values
p_values <-
runif(20, 0, 0.1) # Simulating p-values from multiple tests
# Apply FDR correction (Benjamini-Hochberg)
adjusted_p <- p.adjust(p_values, method = "BH")
# Compare raw and adjusted p-values
data.frame(Raw_p = round(p_values, 3),
Adjusted_p = round(adjusted_p, 3)) |> head()
#> Raw_p Adjusted_p
#> 1 0.029 0.095
#> 2 0.079 0.096
#> 3 0.041 0.095
#> 4 0.088 0.096
#> 5 0.094 0.096
#> 6 0.005 0.046
Adjusted p-values control the expected proportion of false discoveries.
If an adjusted p-value is below \alpha, we reject the null hypothesis.
16.3.1.2 Example 2: FDR Correction in Gene Expression Analysis
FDR is widely used in genomics, where thousands of genes are tested for differential expression.
library(multtest)
# Simulated gene expression study with 1000 genes
set.seed(42)
p_values <- runif(100, 0, 0.1)
# Apply different multiple testing corrections
# Bonferroni (very strict)
p_bonf <- p.adjust(p_values, method = "bonferroni")
# Holm's method
p_holm <- p.adjust(p_values, method = "holm")
# Benjamini-Hochberg (FDR)
p_fdr <- p.adjust(p_values, method = "BH")
# Compare significance rates
sum(p_bonf < 0.05) # Strictest correction
#> [1] 3
sum(p_holm < 0.05) # Moderately strict
#> [1] 3
sum(p_fdr < 0.05) # Most discoveries, controlled FDR
#> [1] 5
Bonferroni results in few discoveries (low power).
Benjamini-Hochberg allows more discoveries, while controlling the proportion of false positives.
Use FDR when:
You perform many hypothesis tests (e.g., genomics, finance, A/B testing).
You want to balance false positives and false negatives.
Bonferroni is too strict, leading to low power.
Do not use FDR if:
Strict control of any false positives is required (e.g., drug approval studies).
There are only a few tests, where Bonferroni is appropriate.
16.3.2 Benjamini-Yekutieli Procedure
The Benjamini-Yekutieli (BY) method modifies the Benjamini-Hochberg (BH) procedure to account for correlated test statistics (Benjamini and Yekutieli 2001).
The key adjustment is a larger critical value for significance, making it more conservative than BH.
Similar to BH, the BY procedure ranks m p-values in ascending order:
p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(m)}
Instead of using \alpha \frac{i}{m} as in BH, BY introduces a correction factor:
p_{(i)} \leq \frac{i}{m C(m)} \alpha
where:
C(m) = \sum_{j=1}^{m} \frac{1}{j} \approx \ln(m) + 0.577
- C(m) is the harmonic series correction (ensures control under dependence).
- This makes the BY threshold larger than BH, reducing false positives.
- Recommended when tests are positively correlated (e.g., in fMRI, finance).
We can apply BY correction using the built-in p.adjust()
function.
set.seed(123)
# Simulate 50 random p-values
p_values <- runif(50, 0, 0.1)
# Apply BY correction
p_by <- p.adjust(p_values, method = "BY")
# Compare raw and adjusted p-values
data.frame(Raw_p = round(p_values, 3), Adjusted_BY = round(p_by, 3))
#> Raw_p Adjusted_BY
#> 1 0.029 0.430
#> 2 0.079 0.442
#> 3 0.041 0.430
#> 4 0.088 0.442
#> 5 0.094 0.442
#> 6 0.005 0.342
#> 7 0.053 0.442
#> 8 0.089 0.442
#> 9 0.055 0.442
#> 10 0.046 0.430
#> 11 0.096 0.442
#> 12 0.045 0.430
#> 13 0.068 0.442
#> 14 0.057 0.442
#> 15 0.010 0.429
#> 16 0.090 0.442
#> 17 0.025 0.430
#> 18 0.004 0.342
#> 19 0.033 0.430
#> 20 0.095 0.442
#> 21 0.089 0.442
#> 22 0.069 0.442
#> 23 0.064 0.442
#> 24 0.099 0.447
#> 25 0.066 0.442
#> 26 0.071 0.442
#> 27 0.054 0.442
#> 28 0.059 0.442
#> 29 0.029 0.430
#> 30 0.015 0.429
#> 31 0.096 0.442
#> 32 0.090 0.442
#> 33 0.069 0.442
#> 34 0.080 0.442
#> 35 0.002 0.342
#> 36 0.048 0.430
#> 37 0.076 0.442
#> 38 0.022 0.430
#> 39 0.032 0.430
#> 40 0.023 0.430
#> 41 0.014 0.429
#> 42 0.041 0.430
#> 43 0.041 0.430
#> 44 0.037 0.430
#> 45 0.015 0.429
#> 46 0.014 0.429
#> 47 0.023 0.430
#> 48 0.047 0.430
#> 49 0.027 0.430
#> 50 0.086 0.442
Comparison: BH vs. BY
p_bh <- p.adjust(p_values, method = "BH") # BH correction
# Visualize differences
data.frame(Raw_p = round(p_values, 3),
BH_Adjusted = round(p_bh, 3),
BY_Adjusted = round(p_by, 3)) |> head()
#> Raw_p BH_Adjusted BY_Adjusted
#> 1 0.029 0.096 0.430
#> 2 0.079 0.098 0.442
#> 3 0.041 0.096 0.430
#> 4 0.088 0.098 0.442
#> 5 0.094 0.098 0.442
#> 6 0.005 0.076 0.342
Benjamini-Yekutieli is more conservative than Benjamini-Hochberg.
If BH identifies significant results but BY does not, the tests are likely correlated.
16.3.3 Storey’s q-value Approach
Storey’s q-value method directly estimates False Discovery Rate rather than adjusting individual p-values (Benjamini and Yekutieli 2001).
Unlike BH/BY, it does not assume a fixed threshold (\alpha) but estimates the FDR dynamically from the data.
Define:
m_0: Number of true null hypotheses.
\pi_0: Estimated proportion of true nulls in the dataset.
Storey’s q-value adjusts p-values based on: q(p) = \frac{\pi_0 \cdot m \cdot p}{\sum_{i=1}^{m} 1_{p_i \leq p}} where:
\pi_0 is estimated from the distribution of large p-values.
Unlike BH/BY, Storey’s method dynamically estimates the null proportion.
# devtools::install_github("jdstorey/qvalue")
library(qvalue)
# Simulated data: 1000 hypothesis tests
set.seed(123)
p_values <- runif(1000, 0, 0.1)
# Compute q-values
qvals <- qvalue_truncp(p_values)$qvalues
# Summary of q-values
summary(qvals)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.3126 0.9689 0.9771 0.9684 0.9847 1.0000
Comparison: BH vs. BY vs. Storey
# Apply multiple corrections
p_bh <- p.adjust(p_values, method = "BH") # Benjamini-Hochberg
p_by <- p.adjust(p_values, method = "BY") # Benjamini-Yekutieli
q_vals <- qvalue_truncp(p_values)$qvalues # Storey's q-value
# Compare significance rates
data.frame(
Raw_p = round(p_values[1:10], 3),
BH = round(p_bh[1:10], 3),
BY = round(p_by[1:10], 3),
q_value = round(q_vals[1:10], 3)
)
#> Raw_p BH BY q_value
#> 1 0.029 0.097 0.725 0.969
#> 2 0.079 0.098 0.737 0.985
#> 3 0.041 0.097 0.725 0.969
#> 4 0.088 0.099 0.740 0.989
#> 5 0.094 0.100 0.746 0.998
#> 6 0.005 0.094 0.700 0.936
#> 7 0.053 0.098 0.737 0.985
#> 8 0.089 0.099 0.740 0.989
#> 9 0.055 0.098 0.737 0.985
#> 10 0.046 0.098 0.731 0.977
16.3.4 Summary: False Discovery Rate Methods
FDR control methods balance Type I and Type II errors, making them more powerful than conservative Family-Wise Error Rate (FWER) methods like Bonferroni.
Method | Type | Strength | Best for |
---|---|---|---|
Benjamini-Hochberg | Adjusted p-values | Most powerful | Independent tests (e.g., surveys, psychology) |
Benjamini-Yekutieli | Adjusted p-values | More conservative | Correlated tests (e.g., fMRI, finance) |
Storey’s q-value | Direct FDR estimation | Most flexible | Large-scale studies (e.g., genomics, proteomics) |
Bonferroni | Family-Wise Error Rate (FWER) | Very conservative | Small number of tests, strict control |
Holm’s Method | FWER (less strict than Bonferroni) | Moderate | Moderately strict correction |