To understand computations in R, two slogans are helpful:
- Everything that exists is an object.
- Everything that happens is a function call.
to illustrate the difference between data and functions in R. In this context, we emphasized that functions can be thought of as verbs that take data objects as their inputs and return other data objects as their output. Whereas other objects merely exist, it is functions that make things happen.
Since then, we have been using functions in all chapters of this book. Many of these functions — like
mean() — are included in R (so-called base R functions), and the others — like
ggplot() — were defined in additional packages (e.g., functions for transforming data of dplyr and tidyr, functions for manipulating text and time data in stringr and lubridate, and the visualization functions of ggplot2).
The fact that R packages mostly provide additional functions shows that functions are a pretty big deal in R — they essentially are tools that allow us solving computational tasks. Just like the tools from a hardware store (e.g., a hammer or pliers), new functions extend the scope of tasks that can be solved. To use a tool, we typically do not need to understand how it was designed and built, but rather for which task it is suited. It is no accident that asking the question Which problem can be solved with this tool? is sometimes described as a “functional” approach.
While we do not need to know how functions work in order to use them, we need to understand more about their internal structure if we want to create new functions for tackling new tasks. Thus, for creating new tools in R — and for making new things happen — we need to know how to write functions.
After working through this chapter, you should be able to
- explain what functions are and why they are useful,
- use base R to define new functions,
- describe and check functions,
- control the flow of information by using conditional statements.
11.1.2 The function of functions
What is a function? Mathematically, a function is a mapping between sets, or from a set of elements \(X\) to those of a set \(Y\). In the context of computer programming, a function maps inputs to outputs. To use a function, we only need to know its name and purpose, which inputs it takes, and what outputs it returns.54 Unless we want to understand exactly how a function works or modify its behavior, we can learn new functions without ever seeing how they are defined. Thus, as long as we successfully use functions, we can think of them as a black boxes that are defined by their purpose and input-output relations. Only when functions do not work as we wish (e.g., by not accepting certain arguments, getting sluggish, or yielding unexpected or erroneous results), are we prompted to look into their definitions and can consider possible improvements.55
When transitioning from using functions to writing them, functions are a great way to automate repeated tasks. As each function handles a task and can be understood solely by its mapping from inputs to outputs, functions are powerful tools of abstraction and encapsulation. In a functional programming paradigm (like R), functions are the basic problem solving units: Each function deals with a task (i.e., performs an action, which is why we can think of them as verbs). To use a new set of functions (e.g., a new R package) we primarily need to know its overall goal (i.e., which challenge or problem does it address?) and its key functions (i.e., which main functions does it provide? What does each of them do?).
11.1.3 Data used
As this chapter teaches how to write new functions, we do not need any particular dataset.
11.1.4 Getting ready
This chapter formerly assumed that you have read and worked through Chapter 19: Functions of the r4ds book (Wickham & Grolemund, 2017). It now can be read by itself, but reading Chapter 19 of r4ds is still recommended.
Please do the following to get started:
Structure your document by inserting headings and empty lines between different parts. Here’s an example how your initial file could look:
Create an initial code chunk below the header of your
.Rmdfile that loads the R packages of the tidyverse (and see Section F.3.3 if you want to get rid of the messages and warnings of this chunk in your HTML output).
Save your file (e.g., as
11_functions.Rmdin the R folder of your current project) and remember saving and knitting it regularly as you keep adding content to it.
To start writing our first functions, we need to familiarize ourselves with the
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz
When some process is explained by its purpose without considering its content or mechanism, we call this a functional explanation.↩
We can print the definition of any function by typing it at the prompt of the console (without parentheses). However, the fact that we have been using functions like
gathermany times without ever looking up their definitions indicates that we typically rely on examples and documentation for understanding functions.↩