16.3 False Discovery Rate

When conducting multiple hypothesis tests simultaneously, we increase the probability of false positives (Type I errors). Traditional correction methods like Bonferroni correction are too conservative, reducing statistical power.

The False Discovery Rate (FDR), introduced by Benjamini and Hochberg (1995), is a more flexible and powerful approach that controls the proportion of false discoveries (incorrect rejections of the null hypothesis) while maintaining a reasonable chance of detecting true effects.

Suppose we perform $m$ independent hypothesis tests, each with a significance level $\alpha$ . The probability of making at least one Type I error (false positive) is:

$P(\text{at least one false positive}) = 1 - (1 - \alpha)^m$

For example, with $\alpha = 0.05$ and $m = 20$ tests:

$P(\text{at least one false positive}) = 1 - (0.95)^{20} \approx 0.64$

Thus, if we do not adjust for multiple testing, we are highly likely to reject at least one true null hypothesis just by chance.

Family-Wise Error Rate vs. False Discovery Rate
Approach	Controls	Method	Pros	Cons
Bonferroni Correction	FWER (Probability of $\ge 1$ false positive)	Adjusts $\alpha$ : $\alpha/m$	Very conservative, reduces false positives	Low power, increases false negatives
False Discovery Rate	Expected proportion of false discoveries	Adjusts $p$ -values dynamically	Higher power, fewer false negatives	Some false positives allowed

Why FDR?

FWER control (Bonferroni, Holm) is too strict, reducing true discoveries.
FDR control allows a small fraction of false positives while keeping most discoveries valid.

Let:

$m$ = total number of hypotheses tested
$V$ = number of false discoveries (Type I errors)
$R$ = total number of rejected null hypotheses

Then, the False Discovery Rate is:

$\text{FDR} = E\left[\frac{V}{\max(R,1)}\right]$

If no null hypotheses are rejected ( $R = 0$ ), we define FDR = 0.
Unlike FWER, which controls the probability of any false positives, FDR controls the expected proportion of false positives.

16.3.1 Benjamini-Hochberg Procedure

The Benjamini-Hochberg (BH) procedure is the most widely used FDR-controlling method (Benjamini and Hochberg 1995). It works as follows:

Step-by-Step Algorithm:

Perform $m$ hypothesis tests and obtain $p$ -values: $p_1, p_2, ..., p_m$ .
Rank the $p$ -values in ascending order: $p_{(1)} \leq p_{(2)} \leq ... \leq p_{(m)}$ .
Calculate the Benjamini-Hochberg critical value for each test:

$p_{(i)} \leq \frac{i}{m} \alpha$
Find the largest $i$ where $p_{(i)} \leq \frac{i}{m} \alpha$ .
Reject all hypotheses with $p \leq p_{(i)}$ .

Interpretation:

This ensures that the expected proportion of false discoveries is controlled at level $\alpha$ .
Unlike Bonferroni, it does not require independence of tests, making it more powerful.

16.3.1.1 Example 1: FDR Correction on Simulated Data

set.seed(123)

# Generate 20 random p-values
p_values <-
    runif(20, 0, 0.1)  # Simulating p-values from multiple tests

# Apply FDR correction (Benjamini-Hochberg)
adjusted_p <- p.adjust(p_values, method = "BH")

# Compare raw and adjusted p-values
data.frame(Raw_p = round(p_values, 3),
           Adjusted_p = round(adjusted_p, 3)) |> head()
#>   Raw_p Adjusted_p
#> 1 0.029      0.095
#> 2 0.079      0.096
#> 3 0.041      0.095
#> 4 0.088      0.096
#> 5 0.094      0.096
#> 6 0.005      0.046

Adjusted p-values control the expected proportion of false discoveries.
If an adjusted p-value is below $\alpha$ , we reject the null hypothesis.

16.3.1.2 Example 2: FDR Correction in Gene Expression Analysis

FDR is widely used in genomics, where thousands of genes are tested for differential expression.

library(multtest)

# Simulated gene expression study with 1000 genes
set.seed(42)
p_values <- runif(100, 0, 0.1)

# Apply different multiple testing corrections

# Bonferroni (very strict)
p_bonf <- p.adjust(p_values, method = "bonferroni")  

# Holm's method
p_holm <- p.adjust(p_values, method = "holm")        

# Benjamini-Hochberg (FDR)
p_fdr <- p.adjust(p_values, method = "BH")           

# Compare significance rates
sum(p_bonf < 0.05)  # Strictest correction
#> [1] 3
sum(p_holm < 0.05)  # Moderately strict
#> [1] 3
sum(p_fdr < 0.05)   # Most discoveries, controlled FDR
#> [1] 5

Bonferroni results in few discoveries (low power).
Benjamini-Hochberg allows more discoveries, while controlling the proportion of false positives.

Use FDR when:

You perform many hypothesis tests (e.g., genomics, finance, A/B testing).
You want to balance false positives and false negatives.
Bonferroni is too strict, leading to low power.

Do not use FDR if:

Strict control of any false positives is required (e.g., drug approval studies).
There are only a few tests, where Bonferroni is appropriate.

16.3.2 Benjamini-Yekutieli Procedure

The Benjamini-Yekutieli (BY) method modifies the Benjamini-Hochberg (BH) procedure to account for correlated test statistics (Benjamini and Yekutieli 2001).

The key adjustment is a larger critical value for significance, making it more conservative than BH.

Similar to BH, the BY procedure ranks $m$ p-values in ascending order:

$p_{(1)} \leq p_{(2)} \leq \dots \leq p_{(m)}$

Instead of using $\alpha \frac{i}{m}$ as in BH, BY introduces a correction factor:

$p_{(i)} \leq \frac{i}{m C(m)} \alpha$

where:

$C(m) = \sum_{j=1}^{m} \frac{1}{j} \approx \ln(m) + 0.577$

$C(m)$ is the harmonic series correction (ensures control under dependence).
This makes the BY threshold larger than BH, reducing false positives.
Recommended when tests are positively correlated (e.g., in fMRI, finance).

We can apply BY correction using the built-in p.adjust() function.

set.seed(123)

# Simulate 50 random p-values
p_values <- runif(50, 0, 0.1)

# Apply BY correction
p_by <- p.adjust(p_values, method = "BY")

# Compare raw and adjusted p-values
data.frame(Raw_p = round(p_values, 3), Adjusted_BY = round(p_by, 3))
#>    Raw_p Adjusted_BY
#> 1  0.029       0.430
#> 2  0.079       0.442
#> 3  0.041       0.430
#> 4  0.088       0.442
#> 5  0.094       0.442
#> 6  0.005       0.342
#> 7  0.053       0.442
#> 8  0.089       0.442
#> 9  0.055       0.442
#> 10 0.046       0.430
#> 11 0.096       0.442
#> 12 0.045       0.430
#> 13 0.068       0.442
#> 14 0.057       0.442
#> 15 0.010       0.429
#> 16 0.090       0.442
#> 17 0.025       0.430
#> 18 0.004       0.342
#> 19 0.033       0.430
#> 20 0.095       0.442
#> 21 0.089       0.442
#> 22 0.069       0.442
#> 23 0.064       0.442
#> 24 0.099       0.447
#> 25 0.066       0.442
#> 26 0.071       0.442
#> 27 0.054       0.442
#> 28 0.059       0.442
#> 29 0.029       0.430
#> 30 0.015       0.429
#> 31 0.096       0.442
#> 32 0.090       0.442
#> 33 0.069       0.442
#> 34 0.080       0.442
#> 35 0.002       0.342
#> 36 0.048       0.430
#> 37 0.076       0.442
#> 38 0.022       0.430
#> 39 0.032       0.430
#> 40 0.023       0.430
#> 41 0.014       0.429
#> 42 0.041       0.430
#> 43 0.041       0.430
#> 44 0.037       0.430
#> 45 0.015       0.429
#> 46 0.014       0.429
#> 47 0.023       0.430
#> 48 0.047       0.430
#> 49 0.027       0.430
#> 50 0.086       0.442

Comparison: BH vs. BY

p_bh <- p.adjust(p_values, method = "BH")  # BH correction

# Visualize differences
data.frame(Raw_p = round(p_values, 3),
           BH_Adjusted = round(p_bh, 3),
           BY_Adjusted = round(p_by, 3)) |> head()
#>   Raw_p BH_Adjusted BY_Adjusted
#> 1 0.029       0.096       0.430
#> 2 0.079       0.098       0.442
#> 3 0.041       0.096       0.430
#> 4 0.088       0.098       0.442
#> 5 0.094       0.098       0.442
#> 6 0.005       0.076       0.342

Benjamini-Yekutieli is more conservative than Benjamini-Hochberg.
If BH identifies significant results but BY does not, the tests are likely correlated.

16.3.3 Storey’s q-value Approach

Storey’s q-value method directly estimates False Discovery Rate rather than adjusting individual $p$ -values (Benjamini and Yekutieli 2001).

Unlike BH/BY, it does not assume a fixed threshold ( $\alpha$ ) but estimates the FDR dynamically from the data.

Define:

$m_0$ : Number of true null hypotheses.
$\pi_0$ : Estimated proportion of true nulls in the dataset.

Storey’s q-value adjusts $p$ -values based on: $q(p) = \frac{\pi_0 \cdot m \cdot p}{\sum_{i=1}^{m} 1_{p_i \leq p}}$ where:

$\pi_0$ is estimated from the distribution of large p-values.
Unlike BH/BY, Storey’s method dynamically estimates the null proportion.

# remotes::install_github("StoreyLab/qvalue")
library(qvalue)

# Simulated data: 1000 hypothesis tests
set.seed(123)
p_values <- runif(1000, 0, 0.1)  

# Compute q-values
qvals <- qvalue_truncp(p = p_values)$qvalues

# Summary of q-values
summary(qvals)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.3126  0.9689  0.9771  0.9684  0.9847  1.0000

Comparison: BH vs. BY vs. Storey

# Apply multiple corrections
p_bh <- p.adjust(p_values, method = "BH")   # Benjamini-Hochberg
p_by <- p.adjust(p_values, method = "BY")   # Benjamini-Yekutieli
q_vals <- qvalue_truncp(p_values)$qvalues   # Storey's q-value

# Compare significance rates
data.frame(
  Raw_p = round(p_values[1:10], 3),
  BH = round(p_bh[1:10], 3),
  BY = round(p_by[1:10], 3),
  q_value = round(q_vals[1:10], 3)
)
#>    Raw_p    BH    BY q_value
#> 1  0.029 0.097 0.725   0.969
#> 2  0.079 0.098 0.737   0.985
#> 3  0.041 0.097 0.725   0.969
#> 4  0.088 0.099 0.740   0.989
#> 5  0.094 0.100 0.746   0.998
#> 6  0.005 0.094 0.700   0.936
#> 7  0.053 0.098 0.737   0.985
#> 8  0.089 0.099 0.740   0.989
#> 9  0.055 0.098 0.737   0.985
#> 10 0.046 0.098 0.731   0.977

16.3.4 Summary of False Discovery Rate Methods

FDR control methods balance Type I and Type II errors, making them more powerful than conservative Family-Wise Error Rate (FWER) methods like Bonferroni.

Summary of False Discovery Rate Methods
Method	Type	Strength	Best for
Benjamini-Hochberg	Adjusted $p$ -values	Most powerful	Independent tests (e.g., surveys, psychology)
Benjamini-Yekutieli	Adjusted $p$ -values	More conservative	Correlated tests (e.g., fMRI, finance)
Storey’s q-value	Direct FDR estimation	Most flexible	Large-scale studies (e.g., genomics, proteomics)
Bonferroni	Family-Wise Error Rate (FWER)	Very conservative	Small number of tests, strict control
Holm’s Method	FWER (less strict than Bonferroni)	Moderate	Moderately strict correction

References

Benjamini, Yoav, and Yosef Hochberg. 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society: Series B (Methodological) 57 (1): 289–300.

Benjamini, Yoav, and Daniel Yekutieli. 2001. “The Control of the False Discovery Rate in Multiple Testing Under Dependency.” Annals of Statistics, 1165–88.