23.1 Natural Experiments
Reusing the same natural experiments for research, particularly when employing identical methods to determine the treatment effect in a given setting, can pose problems for hypothesis testing.
Simulations show that when \(N_{\text{Outcome}} >> N_{\text{True effect}}\), more than 50% of statistically significant findings may be false positives (Heath et al. 2023, 2331).
Solutions:
Bonferroni correction
Romano and Wolf (2005) and Romano and Wolf (2016) correction: recommended
Benjamini and Yekutieli (2001) correction
Alternatively, refer to the rules of thumb from Table AI (Heath et al. 2023, 2356).
When applying multiple testing corrections, we can either use (but they will give similar results anyway (Heath et al. 2023, 2335)):
Chronological Sequencing: Outcomes are ordered by the date they were first reported, with multiple testing corrections applied in this sequence. This method progressively raises the statistical significance threshold as more outcomes are reviewed over time.
Best Foot Forward Policy: Outcomes are ordered from most to least likely to be rejected based on experimental data. Used primarily in clinical trials, this approach gives priority to intended treatment effects, which are subjected to less stringent statistical requirements. New outcomes are added to the sequence as they are linked to the primary treatment effect.
# Romano-Wolf correction
library(fixest)
library(wildrwolf)
head(iris)
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
fit1 <- feols(Sepal.Width ~ Sepal.Length , data = iris)
fit2 <- feols(Petal.Length ~ Sepal.Length, data = iris)
fit3 <- feols(Petal.Width ~ Sepal.Length, data = iris)
res <- rwolf(
models = list(fit1, fit2, fit3),
param = "Sepal.Length",
B = 500
)
#>
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
res
#> model Estimate Std. Error t value Pr(>|t|) RW Pr(>|t|)
#> 1 1 -0.0618848 0.04296699 -1.440287 0.1518983 0.139720559
#> 2 2 1.858433 0.08585565 21.64602 1.038667e-47 0.001996008
#> 3 3 0.7529176 0.04353017 17.29645 2.325498e-37 0.001996008
For all other tests, one can use multtest::mt.rawp2adjp
which includes:
- Bonferroni
- Holm (1979)
- Šidák (1967)
- Hochberg (1988)
- Benjamini and Hochberg (1995)
- Benjamini and Yekutieli (2001)
- Adaptive Benjamini and Hochberg (2000)
- Two-stage Benjamini, Krieger, and Yekutieli (2006)
Permutation adjusted p-values for simple multiple testing procedures
# BiocManager::install("multtest")
library(multtest)
procs <-
c("Bonferroni",
"Holm",
"Hochberg",
"SidakSS",
"SidakSD",
"BH",
"BY",
"ABH",
"TSBH")
mt.rawp2adjp(
# p-values
runif(10),
procs) |> causalverse::nice_tab()
#> adjp.rawp adjp.Bonferroni adjp.Holm adjp.Hochberg adjp.SidakSS adjp.SidakSD
#> 1 0.12 1 1 0.75 0.72 0.72
#> 2 0.22 1 1 0.75 0.92 0.89
#> 3 0.24 1 1 0.75 0.94 0.89
#> 4 0.29 1 1 0.75 0.97 0.91
#> 5 0.36 1 1 0.75 0.99 0.93
#> 6 0.38 1 1 0.75 0.99 0.93
#> 7 0.44 1 1 0.75 1.00 0.93
#> 8 0.59 1 1 0.75 1.00 0.93
#> 9 0.65 1 1 0.75 1.00 0.93
#> 10 0.75 1 1 0.75 1.00 0.93
#> adjp.BH adjp.BY adjp.ABH adjp.TSBH_0.05 index h0.ABH h0.TSBH
#> 1 0.63 1 0.63 0.63 2 10 10
#> 2 0.63 1 0.63 0.63 6 10 10
#> 3 0.63 1 0.63 0.63 8 10 10
#> 4 0.63 1 0.63 0.63 3 10 10
#> 5 0.63 1 0.63 0.63 10 10 10
#> 6 0.63 1 0.63 0.63 1 10 10
#> 7 0.63 1 0.63 0.63 7 10 10
#> 8 0.72 1 0.72 0.72 9 10 10
#> 9 0.72 1 0.72 0.72 5 10 10
#> 10 0.75 1 0.75 0.75 4 10 10