good practice
I review a lot of code from students, clients or colleagues and keep giving the same basic recommendations over and over again.
Now I collect important things here so I can easily point to them. Expect this to be expanded :).
source scripts
Before you send someone any code whatsoever, do the following:
CTRL
+SHIFT
+F10
to restart R with a clean workspace. Remember to have the Rstudio settings atTools - Global Options - General
set to: Restore .Rdata into workspace at startup OFF and Save workspace to .RData on exit NEVER- copy the script to some other folder than your working place
source()
the entire script withCTRL
+SHIFT
+S
- If something fails, fix it! E.g. stop messing with
setwd
(see below), copy data files to be read, rename all instances of a renamed object, … - Only now send the file (or folder with scripts, data files, etc. as zip file)
Rproj
- Use Rstudio projects. They set the wd upon opening and keep projects separate.
- Never use
setwd()
because others won’t have that exact path.
Even you yourself might not, after rearranging folders.
- Use relative path names, e.g.
read.table("datafolder/file.txt")
instead of"C:/Users/berry/Desktop/Project/datafolder/file.txt"
packages
Follow the package recommendations in the packages section.
saveload
Store the results of long-running computations on disc.
The next time a script is run, they are loaded quickly.
if( file.exists("objects.Rdata") )
{
load("objects.Rdata") # load previously saved objects
} else
{
obj1 <- mean(rnorm(2e7)) # in the first run,
obj2 <- median(rnorm(2e7)) # compute the objects
save(obj1, obj2, file="objects.Rdata") # and write them to disc
}
If you need to rerun an analysis if the last run is older then 6 hours, this could be the condition:
For a single object, a good alternative to save
and load
is:
More on this topic from Rcrastinate