Session 11 Programming with style, and further topics.
Here are some examples of code that, arguably, is not well written. Can you say what’s wrong, and improve it?
11.1 Indentation and spacing
Let your code breathe.
Leave space around operators.
# BAD:
x<-2*3
# GOOD:
x <- 2 * 3
- Indent loops and if statements.
# opening paranthesis after the for
# closing parenthesis on own line
# indent two spaces between
n <- 10
for(i in 1:n){
print(i)
}
print("Done!")
# if / else like this:
if(x == 5){
print("x is 5")
} else {
print("x is not 5")
}
# this makes it easy to chain if/else statements
# but don't make the chains too long!
if(x == 5){
print("x is 5")
} else if(x == 6){
print("x is 6")
} else {
print("x is not 5 or 6")
}
11.2 Naming functions and variables
- Use a name that concisely describes the function or variable
# BAD:
function1 <- function(a){
a^2
}
# GOOD:
square <- function(a){
a^2
}
- Naming convention: choose one, and avoid mixing:
snake_case
orcamelCase
but avoidseparating.with.dots
.
11.3 Don’t repeat yourself
- If you copy/paste some code and ‘tweak’ it, that’s a sign you need to rewrite your code, e.g. using a loop or a function.
# BAD:
first <- df$result[1]^2
second <- df$result[2]^2
third <- df$result[3]^2
# GOOD:
results <- df$result[1:3]^2
11.4 Avoid writing cryptic and unclear code
Prefer to use piping to give a step-by-step explanation of what code is doing.
Code will be read many more times than it is written.
# BAD:
TRUE * 1
# GOOD:
as.numeric(TRUE)
11.6 Avoid Hard coding
h <- c(1, 2, 3)
# BAD:
#Codes from hard coding are not re-useable
h.mean <- sum(h) / 3
# GOOD:
h.mean <- sum(h) / length(h)
11.7 Other assorted advice, and further reading.
Functions should have one job. If a function does many jobs, it should be split into many functions.
Functions should be short and clear. Rule of thumb: if you can’t see a whole function on the screen, split into smaller functions.
Avoid ‘spaghetti code’. Keep functions separate from analysis code.
The google style guide is a very good one: https://google.github.io/styleguide/Rguide.xml
There is also a free eBook for the tidyverse style: https://style.tidyverse.org/
Always remember: someone else is likely to need to read your code in the future. The most likely person is you in six months time. Six-months-ago-you doesn’t reply to emails.
11.8 Further Topics
To keep this course brief, there are a number of topics that we omitted from the course. Some examples of these are below, just to give a flavour of what is possible. Further concepts in R, specific to statistics and machine learning, will be introduced on other modules.
Joining together tables with dplyr. It is often necessary to combine datasets from multiple sources. There may be information about the same person spread across multiple tables or sources, or information on different patients. Here you will explore the various joins available using dplyr (or any other way to merge datasets together). If you have past experience with SQL this will help for this project. Example Resource: https://stat545.com/bit001_dplyr-cheatsheet.html
Tidying data with tidyr. We often receive data in a format that is not useful for analysis, and we need to reshape our data frame or tibble. Here you will explore functions such as
pivot_longer
andpivot_wider
to change data between wide and long formats. Example resource: https://r4ds.hadley.nz/data-tidy.htmlHandling dates and times in R. It can be challenging to get R to recognise dates and times. There are various packages to help with this such as the lubridate package. Here you will explore how you can help R to recognise dates and times, and why this is useful. Example resource: https://r4ds.hadley.nz/datetimes.html
Building interactive / web dashboards. When presenting analysis to an end user we can make nice, interactive visualisations that are presented as dashboards. The R package shiny is one tool that can do this. Example resource: https://shiny.rstudio.com/tutorial/
Obtaining data from websites with R we can ‘scrape’ data from websites to use in our analyses. Example resources: https://www.analyticsvidhya.com/blog/2017/03/beginners-guide-on-web-scraping-in-r-using-rvest-with-hands-on-knowledge/
Building and presenting maps R can visualise maps and these can even be interactive. Example resource: https://rstudio.github.io/leaflet/
Efficient data manipulation with data.table. data.table is a similar package to dplyr, but considerably more powerful for dealing with larger datasets. Example resource: https://cloud.r-project.org/web/packages/data.table/vignettes/datatable-intro.html
Writing more efficient R code Writing code that runs quickly can be a challenge – but we can learn how to evaluate our code for speed (benchmarking) and general tips and tricks for running code quickly. Example resource: https://csgillespie.github.io/efficientR/
Tools for text mining Example resource: https://www.tidytextmining.com/
Network graphs e.g. displaying a social network. Example resource: https://kateto.net/network-visualization
Embedding C++ code in R using Rcpp Recommended only for those who already code in C++. Example resource: http://adv-r.had.co.nz/Rcpp.html
Making your own R package An R package is a great way for you to collect, share and reuse your own R functions. Example resource: http://r-pkgs.had.co.nz/
Iteration using the purrr package Example resource: https://jennybc.github.io/purrr-tutorial/
More on reproducible environments See the Binder tutorial by Andrew Stewart.
11.5 Comments should say why, not what