2.1 Good practices in data analysis (X)
2.1.1 Why reproducability?
- Terminology: Replication, replication (King 1995)
- But replication vs. reproduction (terminology!)
- Errors
- A crisis… (e.g. Open Science Collaboration 2015) that should be avoided (e.g. Psychoticism)
- Manual steps (e.g. manual copy/paste) introduces errors
- Reproducible documents allow for automatization (counter argument?)
- Access:
- Taxpayers (= researchers) pay for research → should have access
- Better all humans → human progress! (Sci-hub controversy)
- Implies relying on open-source software
- Access in 100 years.. will STATA still exist?
- Memory
- You will forget what you did.. think of others..
- Reproducable document helps you trace your steps
- Ideally all stages of workflow
- Efficiency
- Automatization → paper revisions much faster
2.1.2 Reproducability: My current approach
- Every researcher has his own optimized setup..
- Mine is summarized in this template: Writing a reproducible paper with R Markdown and Pagedown
- Please use this for term paper and follow the corresponding recommendations etc.
- See also (P. Bauer 2018)
- Tools: R, Rmarkdown and Pagedown
- Final product (e.g. scientific article, statistical report) produced by single .rmd file
- Potentially is reproducable in 100 years! (open-sou)
- Ideally encompasses all stages of workflow (not always possible)
- Cache estimations (some contain randomness)!
- The criteria of “good” evidence change3
- Initiatives such as…
- ROpenSci
- Center for Open Science + Open Science Framework
- Pre-registration (Pros & Cons)
- Harvard dataverse
References
Acharya, Avidit, Matthew Blackwell, and Maya Sen. 2016. “Explaining Causal Findings Without Bias: Detecting and Assessing Direct Effects.” Am. Polit. Sci. Rev. 110 (3): 512–29.
Bauer, Paul. 2018. “Writing a Reproducible Paper in R Markdown,” May.
Gill, Jeff. 1999. “The Insignificance of Null Hypothesis Significance Testing.” Polit. Res. Q. 52 (3): 647–74.
King, Gary. 1995. “Replication, Replication.” PS, Political Science & Politics 28 (3): 444–52.
———. 2010. “A Hard Unsolved Problem? Post-Treatment Bias in Big Social Science Questions.” In Hard Problems in Social Science” Symposium, Harvard University. scholar.harvard.edu.
Open Science Collaboration. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716.
See for instance the discussion surrounding the use of p-values/statistical significance(e.g. Gill 1999) and current discussion about post-treatment bias (e.g. King 2010; Acharya, Blackwell, and Sen 2016).↩︎