good practice
Apply best practices when coding!
- Write readable code that is easy to maintain (see below).
- Use version control with a single source of truth (SSOT).
- For reports / presentations, use quarto instead of separate files with scripts, Jupyter notebook, images, code outputs, word document.
organisation / workflow
- Use Rstudio projects (File -> New project). They set the working directory and manage settings & opened scripts.
- Never use
setwd()
: others don’t have that path and neither do you, after rearranging folders.
- Use relative path names, e.g.
read.table("datafolder/file.txt")
instead of"C:/Users/berry/Desktop/Project/datafolder/file.txt"
- Put
source("functions.R")
in your main (quarto) script, or write your own package.
code format
- Follow a style guide consistantly (example).
- Choose short but descriptive object names.
- Use expressive verbs for function names. Functions do something.
- Functions should call each other, instead of being one big multi-purpose monster.
- Use Rstudio script sections
\# 1 clean data ----
for an outline (CTRL
+SHIFT
+O
). - Use line breaks to avoid horizontal scrolling (margin settings).
code quality
- Vectorize code whenever possible.
- If not, use
lapply/sapply
instead offor
loops (lesson 4.3 and 8.3). - DRY: don’t repeat yourself
- Write defensive code that checks inputs (lesson 8.1).
- Use arrays for all-numeric data (lesson 4.4).
- Do not load >2 packages from the library, instead use
pack::fun
. - Install packages conditionally.
- Do not create more objects than needed, clean up with
rm
. - Make sure your code runs in a clean session:
CTRL
+SHIFT
+F10
to restart R with a clean workspace (Rdata settings)source()
the entire script withCTRL
+SHIFT
+S
To practice writing good R code, improve the examples in elegant code.
saveload
Store the results of long-running computations on disc.
The next time a script is run, they are loaded quickly.
if( file.exists("objects.Rdata") )
{
load("objects.Rdata") # load previously saved objects
} else
{
obj1 <- mean(rnorm(2e7)) # in the first run,
obj2 <- median(rnorm(2e7)) # compute the objects
save(obj1, obj2, file="objects.Rdata") # and write them to disc
}
If you need to rerun an analysis if the last run is older then 6 hours, this could be the condition:
For a single object, a good alternative to save
and load
is:
More on this topic from Rcrastinate