20 Multiple Testing
20.1 An example from genetics
We consider a gene expression dataset available from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7621). The corresponding paper was published with PLOS Genetics (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0030098). In the experiment, substantia nigra brain tissue from normal subjects (9 of them) and Parkinson diseased subjects (16 of them) were used for RNA extraction and hybridization on Affymetrix microarrays. Both cohorts included males and females. (We only look at the first 50000 locuses for the sake of simplicity. Some the remaining ones have missing values.)
load("data/parkinson.rda")
= 50000
m = as.matrix(parkinson[1:m, -1]) # remove the ID
Data = matrix(as.numeric(unlist(Data)), nrow = m) Data
We perform a Student–Welch test for each gene/locus.
= numeric(m)
pval for (i in 1:m) {
= t.test(Data[i, 1:9], Data[i, -(1:9)])$p.value
pval[i] }
20.1.1 Adjusted p-values
Here are the corrected p-values:
= p.adjust(pval, "bon") # Bonferroni
pval.bon = p.adjust(pval, "holm") # Holm
pval.holm = p.adjust(pval, "hoch") # Hochberg
pval.hoch = p.adjust(pval, "BH") # Benjamini-- Hochberg
pval.bh = p.adjust(pval, "BY") # Benjamini--Yekutieli pval.by
20.1.2 Rejections
We start with rejections at the 10% level without adjustment for multiple testing.
= (pval <= 0.10)
reject = sum(reject) # total number of rejections
R R
[1] 9067
We now consider rejections while controlling FWER at 10%. (Note that the last one does not guarantee control since an assumption of independence is dubious here.) In this particular example, very few rejections are made and they happen to be the same for all three procedures.
= (pval.bon <= 0.10)
reject.bon = sum(reject.bon)
R.bon = (pval.holm <= 0.10)
reject.holm = sum(reject.holm)
R.holm = (pval.hoch <= 0.10)
reject.hoch = sum(reject.hoch)
R.hoch c(R.bon, R.holm, R.hoch)
[1] 3 3 3
We finally consider rejections while controlling FDR at 10%. (Note that the first one does not guarantee control since an assumption of independence is dubious here.)
= (pval.bh <= 0.10)
reject.bh = sum(reject.bh) # many more rejections
R.bh = (pval.by <= 0.10)
reject.by = sum(reject.by)
R.by c(R.bh, R.by)
[1] 202 3
The BH method produces many more rejections compared to all the other methods, although there is no guarantee that the FDR is controlled at the desired level due to possible dependencies.
20.2 An example from meta-analysis
We use the dataset described in Table 3 of “Controlled low protein diets in chronic renal insufficiency” published in the BMJ in 1992 (https://www.bmj.com/content/304/6821/216.short).
= array(c(95, 110, 15, 8, 2, 5, 7, 5, 194, 209, 32, 21, 8, 14, 17, 11, 25, 30, 13, 4, 21, 21, 11, 12), dim = c(2, 2, 6), dimnames = list(diet = c("control", "treatment"), outcome = c("survived", "died"), study = 1:6)) Data
20.2.1 Cochran–Mantel–Haenszel test
mantelhaen.test(Data) # p-value relies on asymptotic theory
Mantel-Haenszel chi-squared test with continuity correction
data: Data
Mantel-Haenszel X-squared = 10.407, df = 1, p-value = 0.001255
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
0.3619622 0.7727168
sample estimates:
common odds ratio
0.5288613
20.2.2 Combination tests
It is also appropriate to take a multiple testing stance, applying a test to each contingency table (we use Fisher’s exact test), and then applying a combination test to the resulting p-values.
= apply(Data, 3, fisher.test)
out = lapply(out, "[[", 1)
pval = unlist(pval, use.names = FALSE) pval
We first apply the Fisher combination test, which we code by hand
= function(p) {
fisher.combine = -2*sum(log(p))
stat = 2*length(p)
df return(pchisq(stat, df, lower.tail = FALSE))
}fisher.combine(pval) # some significance
[1] 0.04301827
Next, we apply the Simes combination test, which we also code by hand
= function(p) {
simes.combine = length(p)
m = m * min(sort(p)/(1:m))
stat return(punif(stat))
}simes.combine(pval) # no much significance
[1] 0.1770333