20 Multiple Testing

20.1 An example from genetics

We consider a gene expression dataset available from the Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE7621). The corresponding paper was published with PLOS Genetics (https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.0030098). In the experiment, substantia nigra brain tissue from normal subjects (9 of them) and Parkinson diseased subjects (16 of them) were used for RNA extraction and hybridization on Affymetrix microarrays. Both cohorts included males and females. (We only look at the first 50000 locuses for the sake of simplicity. Some the remaining ones have missing values.)

load("data/parkinson.rda")
m = 50000
Data = as.matrix(parkinson[1:m, -1]) # remove the ID
Data = matrix(as.numeric(unlist(Data)), nrow = m)

We perform a Student–Welch test for each gene/locus.

pval = numeric(m)
for (i in 1:m) {
  pval[i] = t.test(Data[i, 1:9], Data[i, -(1:9)])$p.value
}

20.1.1 Adjusted p-values

Here are the corrected p-values:

pval.bon = p.adjust(pval, "bon") # Bonferroni
pval.holm = p.adjust(pval, "holm") # Holm
pval.hoch = p.adjust(pval, "hoch") # Hochberg
pval.bh = p.adjust(pval, "BH") # Benjamini-- Hochberg
pval.by = p.adjust(pval, "BY") # Benjamini--Yekutieli

20.1.2 Rejections

We start with rejections at the 10% level without adjustment for multiple testing.

reject = (pval <= 0.10)
R = sum(reject) # total number of rejections
R
[1] 9067

We now consider rejections while controlling FWER at 10%. (Note that the last one does not guarantee control since an assumption of independence is dubious here.) In this particular example, very few rejections are made and they happen to be the same for all three procedures.

reject.bon = (pval.bon <= 0.10)
R.bon = sum(reject.bon) 
reject.holm = (pval.holm <= 0.10)
R.holm = sum(reject.holm) 
reject.hoch = (pval.hoch <= 0.10)
R.hoch = sum(reject.hoch) 
c(R.bon, R.holm, R.hoch)
[1] 3 3 3

We finally consider rejections while controlling FDR at 10%. (Note that the first one does not guarantee control since an assumption of independence is dubious here.)

reject.bh = (pval.bh <= 0.10)
R.bh = sum(reject.bh) # many more rejections
reject.by = (pval.by <= 0.10)
R.by = sum(reject.by)
c(R.bh, R.by)
[1] 202   3

The BH method produces many more rejections compared to all the other methods, although there is no guarantee that the FDR is controlled at the desired level due to possible dependencies.

20.2 An example from meta-analysis

We use the dataset described in Table 3 of “Controlled low protein diets in chronic renal insufficiency” published in the BMJ in 1992 (https://www.bmj.com/content/304/6821/216.short).

Data = array(c(95, 110, 15, 8, 2, 5, 7, 5, 194, 209, 32, 21, 8, 14, 17, 11, 25, 30, 13, 4, 21, 21, 11, 12), dim = c(2, 2, 6), dimnames = list(diet = c("control", "treatment"), outcome = c("survived", "died"), study = 1:6))

20.2.1 Cochran–Mantel–Haenszel test

mantelhaen.test(Data) # p-value relies on asymptotic theory

    Mantel-Haenszel chi-squared test with continuity correction

data:  Data
Mantel-Haenszel X-squared = 10.407, df = 1, p-value = 0.001255
alternative hypothesis: true common odds ratio is not equal to 1
95 percent confidence interval:
 0.3619622 0.7727168
sample estimates:
common odds ratio 
        0.5288613 

20.2.2 Combination tests

It is also appropriate to take a multiple testing stance, applying a test to each contingency table (we use Fisher’s exact test), and then applying a combination test to the resulting p-values.

out = apply(Data, 3, fisher.test)
pval = lapply(out, "[[", 1)
pval = unlist(pval, use.names = FALSE)

We first apply the Fisher combination test, which we code by hand

fisher.combine = function(p) {
  stat = -2*sum(log(p))
  df = 2*length(p)
  return(pchisq(stat, df, lower.tail = FALSE))
}
fisher.combine(pval) # some significance
[1] 0.04301827

Next, we apply the Simes combination test, which we also code by hand

simes.combine = function(p) {
  m = length(p)
  stat = m * min(sort(p)/(1:m))
  return(punif(stat))
}
simes.combine(pval) # no much significance
[1] 0.1770333