3.3 Literate programming and RMarkdown
The term “literate programming” was coined by Donald Knuth Knuth (1984) based on the idea that a computer program should be documented in a manner such that it is readable by humans. This idea has subsequently gained a good deal of traction not least because it is powerful and deceptively simple.
Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want the computer to do.
– (Knuth 1984)
His ideas are encapsulated in three principles (although of course there is a lot more detail in his paper):
- move away from writing programs to ‘please’ the computer
- instead, focus on communication and understanding
- create a single document to integrate data analysis (executable code) with textual documentation, linking data, code, and explanation
See (Wickham 2012) to find some arguments as to why you should care about writing readable R code.
3.3.1 Structuring your code
Readable code tips include (Wickham 2012)
- names
- comments
- layout e.g., indentation and spacing
- prettyprinting
- user-defined functions
3.3.2 User-defined functions
One of the best ways to improve your reach as a data scientist is to write functions. Functions allow you to automate common tasks in a more powerful and general way than copy-and-pasting.
— (Grolemund and Wickham 2018)
User defined functions offer many advantages including
- abstraction, i.e., the ability to hide details to help understandability e.g., checkInput()
- placing related code in one place
- reuse and avoiding lots of copy-and-paste type coding
- maintainability i.e., the need to only make changes in one place in the event of an error or update in required functionality
# An example of user-defined function named myFunction
myFunction <- function(num) {
num <- num * 3
}
In R, the return value of a function is always the very last expression that is evaluated. Because the chars variable is the last expression that is evaluated in this function, that becomes the return value of the function.
Note that there is a return() function that can be used to return an explicitly value from a function. If in doubt we recommend using return()
since it makes the intentions of the function developer absolutely clear..
Finally, in calling or using the above function, the user must specify the value of the argument num. If it is not specified by the user, R will throw an error.
References
Grolemund, Garrett, and Hadley Wickham. 2018. “R for Data Science.”
Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.
Wickham, Hadley. 2012. “Style Guide.” http://adv-r.had.co.nz/Style.html.