good practice

Apply best practices when coding!

  • Write readable code that is easy to maintain (see below).
  • Use version control with a single source of truth (SSOT).
  • For reports / presentations, use quarto instead of separate files with scripts, Jupyter notebook, images, code outputs, word document.

good code

  • clear
  • simple (one job per function)
  • documented
  • performant
  • bug-free
  • well tested

organisation / workflow

  • Use Rstudio projects (File -> New project). They set the working directory and manage settings & opened scripts.
  • Never use setwd(): others don’t have that path and neither do you, after rearranging folders.
  • Use relative path names, e.g. read.table("datafolder/file.txt") instead of "C:/Users/berry/Desktop/Project/datafolder/file.txt"
  • Put source("functions.R") in your main (quarto) script, or write your own package.

code format

  • Follow a style guide consistantly (example).
  • Choose short but descriptive object names.
  • Use expressive verbs for function names. Functions do something.
  • Functions should call each other, instead of being one big multi-purpose monster.
  • Use Rstudio script sections \# 1 clean data ---- for an outline (CTRL+SHIFT+O).
  • Use line breaks to avoid horizontal scrolling (margin settings).

code quality

  • Vectorize code whenever possible.
  • If not, use lapply/sapply instead of for loops (lesson 4.3 and 8.3).
  • DRY: don’t repeat yourself
  • Write defensive code that checks inputs (lesson 8.1).
  • Use arrays for all-numeric data (lesson 4.4).
  • Do not load >2 packages from the library, instead use pack::fun.
  • Install packages conditionally.
  • Do not create more objects than needed, clean up with rm.
  • Make sure your code runs in a clean session:
    • CTRL+SHIFT+F10 to restart R with a clean workspace (Rdata settings)
    • source() the entire script with CTRL+SHIFT+S

To practice writing good R code, improve the examples in elegant code.

saveload

Store the results of long-running computations on disc.
The next time a script is run, they are loaded quickly.

if(  file.exists("objects.Rdata")  )
  {
  load("objects.Rdata") # load previously saved objects
  }  else
  {
  obj1 <-   mean(rnorm(2e7))             # in the first run,
  obj2 <- median(rnorm(2e7))             # compute the objects
  save(obj1, obj2, file="objects.Rdata") # and write them to disc
  }

If you need to rerun an analysis if the last run is older then 6 hours, this could be the condition:

difftime(Sys.time(), file.mtime("objects.Rdata"), units="h") > 6

For a single object, a good alternative to save and load is:

saveRDS(one_single_object, "object.Rdata")
explicit_object_name <- readRDS("object.Rdata")

More on this topic from Rcrastinate