3.4 Engineering tools and tips

3.4.1 Using R note books and rmarkdown

A problem with writing lots of small individual R scripts is it can be hard to make sense of what they are all about, even if we insert a few comments. For this reason RStudio offers a very powerful alternative in the form of RMarkdown. We use RMarkdown as our primary way of writing and sharing R code for this module and strongly recommend it for all data analysis.

RMarkdown is a specialised form of markdown which is a text-based simple markup language (cf hypertext markup language or html). There are three significant factors:

  1. the formatting markup is simple and human readable
  2. code of many programming languages including R can be embedded
  3. it can be rendered into many formats including html and pdf

filename.Rmd for RMarkdown

To give an idea of the power and flexibility of markdown, this book is entirely written using a variant of RMarkdown called Bookdown and the {bookdown} package within RStudio.

3.4.2 Version control and backup

As you can imagine it’s extremely easy to become overwhelmed with lots of files and versions of your data and the code to analyse it. Without care this can quickly get out of control leading to latest versions being overwritten by obsolete versions or where there is a team of data scientists, inconsistent versions of a file being used.

Clearly the argument for some kind of control system is overwhelming. The most popular (and free) tool to achieve this is based on git and the associated storage system of GitHub. Although git can appear a little daunting to the novice and comes with its own seemingly arcane terminology (e.g., repo, commit, pull, fork, etc) it is well worth the effort to master it. It provides discipline and control for any project, an effective means of sharing via the GitHub website and ultimate backup in the event of disaster.

There are many good introductory resources including:
- some useful introductory slides
- a brief introduction and tutorial on git and GitHub
- how to use GitHub from RStudio
- a GitHub Handbook
- a GitHub cheatsheet
For an extremely thorough book on Git and GitHub see (Scott and Ben 2014).

References

Scott, Chacon, and Straub Ben. 2014. Pro Git. 2nd ed. https://git-scm.com/book/en/v2.