Reproducability in practice
- One folder for all files/project
- …folder can be zipped and be shared/uploaded
- I prefer no subfolders
- Filenames should be logic
- Main file with text/code: “paper.rmd”, “report.rmd”
- Data files: “data_xxxxxx.*"
- Image files: “fig_xxxxxx.*"
- Tables files: “table_xxxx.*"
- etc.
- Important: Use document outline in R studio: Ctrl + Shift + O
- Name rchunks according to what they do or produce
- “fig-…” for chunks producing figures
- “table-…” for chunks producing tables
- “model-…” for chunks producing model estimates
- “import-…” for chunks importing data
- “recoding-…” for chunks in which data is recoded
- Use “really” informative variable names
- Q: What do you think does the variable trstep measure?
- How could we call this variable instead?
- In part this happens automatically, because those names are used in tables etc.
- Use unique identifiers in the final document, e.g. for models “M1”, “M2” etc.
- These should also appear in the published paper
- …it will help others if you do the same for figures, tables etc.
- ALWAYS store the raw data, even if you scrape it from websites (they might disappear)