11.2 Essentials of functions and conditionals

11.2.1 Defining new functions

The basic structure for writing a new function looks as follows:

Thus, writing a function initially requires 3 considerations:

  1. What would be a good name for the function? Function names should clearly convey their purpose (“Which task does this function solve?”), resemble verbs (ideally say what the function does, e.g., plot, or state its output, e.g., mean), and should be short (as they will be typed many times).

  2. Which inputs does the function take? Any input that is necessary for the function to perform its task must to be supplied as a list of arguments <args>.

    • Arguments typically include data (e.g., scalars, vectors, tables, or anything else that the function uses to perform its task) and can contain additional details or parameters (e.g., instructions on how something is to be done: Which type of calculation is desired? How should NA values be treated? Which color should be used? etc.). Data arguments are typically listed first.

    • Any argument can be optional (by including a default value that is used when no argument is provided) or mandatory (by not including a default value). It is common to make data arguments mandatory, and use optional arguments for details.

    • If a function calls other functions and we want to allow users to pass arguments to these latter functions, we can add ... (dot-dot-dot) as a special argument. However, this requires that the functions used within the function can deal with the specific arguments that are provided later, and can cause problems and unexpected results.

  3. The <body> of a function uses the inputs provided by <args> to perform the task for which the function is created.

    • Although some functions are only used for their side-effects (e.g., load or save an object, create a graph, etc.), most functions are created to (also) return some sort of output (e.g., some computed result). This can be done by calling the special function return in the function <body> (e.g., return(result)), typically as its last statement.37 When the function does not contain an explicit return statement, it typically returns the result of the last expression evaluated in the function.

    • The output of a function can assume many different data types (scalar, vector, list, table, etc.). For instance, the result of 1:3 + 3:1 is a vector 4, 4, 4.

Thus, are more detailed template of a typical function is:

In this template, arg_1 is a mandatory argument (as no default is provided) and arg_2 is an optional argument (with a default value of TRUE). The arguments are checked (e.g., whether they have appropriate type and values) and then used to solve the task. Its result is then assigned to result and returned by return(result).

Typical uses of this function could look as follows:

To render the interplay of functions and its arguments more concrete, let’s consider some examples of simple functions.

A power function

Consider the following definition of a power() function:

We can explicate the definition this function as follows:

  • The function power() has 2 arguments: A mandatory argument x and a optional argument exp with a default value of 1.

  • The function computes x^exp and returns the result (as this is the final statement in the function’s body).

  • Thus, the task addressed by power is to raise x to the exp-th power.

  • Although the function does not verify anything, x and exp are assumed to be numeric.

  • Given what we know about R, the function probably also works for vector inputs of x. However, it is harder to guess what happens if exp is a vector, or if both arguments are vectors.

Checking a function

It is very important to run a range of checks after writing a new function. When doing these checks, examples that use unusual inputs (like NA, NULL, or arguments of different types) are particularly informative.

Here are some possible ways of checking how power works:

When writing a new function, it always is a good idea to test its limits. Here are some calls that would result in warning or error messages:

It is not necessarily problematic when functions return warnings or errors — in fact, they can be very informative for understanding functions. As a function’s author or designer (i.e., programmer), you primarily need to decide whether returning an error is justified, given the intended use of the function. If a user enters no arguments or the wrong type of argument to a function, yielding an error can be the appropriate way of saying: This doesn’t work. But good programmers also try to view their functions from the perspective of their future users. Do the names of the function and its arguments clearly signal their purposes? Will users know which types and shapes of data are to be provided as inputs? A lot of this can be handled by choosing transparent names and providing good documentation to a function. And if you anticipate many misunderstandings for a function or its arguments, it may be polite to check user inputs for their type and issue a message to the user if something was missing or wrong.

Omitting argument names

In R, it is possible to omit the argument names of functions. If this is done, the values provided are assigned to the arguments in the order in which they are provided:

Although this saves typing, it is typically more informative to explicitly state the arguments. This makes it more transparent to future readers of your code (including yourself) which value is assigned to which argument and has the advantage that you can enter arguments in any order:

Thus, it is good practice to always provide argument names (as long as you want others to understand your code).


  1. Write a new function that computes the n-th root of a number (or vector of numbers) x, then check it and explore its limits.

  2. The first function we encountered in this book (in Chapter 1.2.2) was sum(). Incidentally, sum() is more complex than it first seemed, as its arguments can be a combination of vectors and scalars:

We now have learned that values are assigned by their position to function arguments when we omit argument names. Explain why the following yield different results:

Hint: What do sum(1, 2, 3, NA) and sum(TRUE) evaluate to?

Explicit return

A more explicit version of our power function from above could look as follows:

Whereas the shorter version of the function relied on returning its last (and only) expression, this version makes it more explicit what is being computed and returned. As functions get larger and more complicated, it is generally a good idea to include explicit return statements. Importantly, a function can include multiple return statements and is exited as soon as a return is reached. For instance, the following variant would never print its final disclaimer:

11.2.2 Understanding functions

How can we understand a function? Even when we are completely clueless about a function, we can always try to understand it by using it with different inputs and see what happens. This treats the function as a black box and is exactly what we did to explore the plot_fn() and plot_fun() functions of ds4psy (in Section 1.2.2 and Exercise 1 of Chapter 1). But having progressed further in our R career, we now dare to look inside the black box and ask: How does this function transform its inputs into outputs? Asking and answering such how questions promotes a mechanistic understanding of a function, that not only provide us with an idea about the function’s purpose (or ``function’’), but also enables us to criticize and improve it.


Let’s define a new function describe() and try to understand what it does by asking how it transforms its inputs into outputs:

This example illustrates the difference between using and understanding a function — and that the definition of a function can get long and complicated. As you can think of a function as a program to tackle a specific task, it is not uncommon for function bodies to stretch over dozens or hundreds of lines of code. The longer and more complicated a function gets, the more difficult it is to understand and — from a programmer’s perspective — to write and to debug. For this reason, programmers typically try to structure long and complex functions into smaller parts, which can then be delegated to shorter functions. But understanding a function that calls many other functions then implies that we also need to understand these other functions. By contrast, using a very long and complex function does not need to be difficult. In fact, it would be marvelous (and somewhat spooky) if we could simply call some do_stats() or write_paper() function and let the computer do our job. But as long as artificial intelligence has not yet solved these tasks, we need to learn how to use, write, and understand functions.


Before you continue reading, try to describe the describe() function (defined in the previous code chunk):

  • What types of inputs does it take?
  • What do its different parts (a) to (d) do?
  • What outputs will it return?
  • What is the purpose of this function?
  • Which calls will yield errors?


Gaining a mechanistic understanding of a function implies that we understand how its outputs depend on its inputs. Eventually, this should also indicate the function’s purpose, but first we simply describe what a function does with its inputs.

The describe() function could be described as follows:

  • The function describe() has 2 arguments: A mandatory argument v and a optional argument rm_na with a default value of TRUE.

  • The function first examines its input argument v:

    • Does v evaluate to NA or NULL? If so, it simply returns v (i.e., NA or NULL, respectively).
    • Is v non-numeric? If so, it prints a message to the user and returns v.
  • The function then computes 7 different statistical measures. This illustrates that functions can do multiple things and typically use other functions to do so. For some of these functions (e.g., mean), the describe function passes the value of its optional argument rm_na to another function’s na.rm argument. However, for another function (quantile), the describe function does not use its rm_na argument, but always provides na.rm = TRUE.

  • The function then creates a numeric vector out that includes the 7 computed measures in a specific order and adds names to the vector.

  • The function then returns the vector out.

  • Overall, the task addressed by describe() is to provide a range of descriptive statistics of a numeric vector v.

11.2.3 Checking functions

Once we gained a basic understanding of a function, we can check both the function and our understanding of it by using it with a variety of arguments. Ideally, we should use our understanding to predict what happens when calling the function with a specific argument and then use the function to verify or falsify our prediction. As we saw above, the results of such checks are more informative if you use the function not only with its intended inputs. Using unusual and probably unintended inputs (like NA, NULL, or inputs of different data types) will show you the limits of a function.


  • Predict the result of describe(c(NULL, NA)). Then evaluate the expression and explain its result.

11.2.4 Issues of style

In any art or craft, issues of style are important, partially a matter of taste, and largely a matter of practice and experience. Just as the work of architects, designers, and authors tends to mature in an exchange with colleagues, and with more time and multiple revisions, programming tends to benefit from feedback and well-organized teams. But even for an individual programmer, writing good functions is a journey, rather than a destination, and a life-long aspiration.

The primary goal of programming new functions is providing some required functionality in a clear and transparent fashion. Here are some general guidelines towards achieving this goal:

  1. Goal/purpose: Be aware of the goal or purpose of your function: What does the function do?

  2. Arguments: Consider the requirements of your function: What does the function need to achieve its goal? Which arguments are necessary, which ones are optional?

  3. Naming: Make sure that you choose good names for both your functions and their arguments. Ideally, a function’s name should clearly state its purpose.

  4. Format: Code each function so that its structure is clear and it is easy to read and understand (i.e., use blank lines between different parts and automatic indentation).

  5. Documentation: Provide clear comments that explain what the function does, why it is doing it in a particular way, and anything else that may be difficult to understand.

Finally, programming functions also involves some psychology: Who is the audience of your function? Try to anticipate what needs and wishes the users of your functions will have. Will people be able to understand and use your function? How robust is your function when users provide different data types or may misinterpret its purpose or scope? Even when you are the primary user of your own function, anticipating possible misconceptions is an important aspect of ensuring that your function will be useful in the future.

11.2.5 Flow control

Whereas most of our scripts so far relied on being executed linearly (in a top-down, left-to-right, line-by-line fashion), using functions implies jumping around in large amounts of code.38 In addition, writing new functions often requires controlling the flow of information within the body of a function. We can distinguish between several ways how this can be achieved:

  • Special functions (e.g., like return, print, or stop) cause side-effects or skip code (e.g., by exiting the function).

  • Functions often incorporate iteration and loops, which are covered in the next chapter (i.e., Chapter 21: Iteration of r4ds).

  • Testing input arguments or distinguishing between several cases requires the conditional execution of code, which is discussed next.

11.2.6 Conditionals

In the definition of describe() above, we have seen that functions frequently require checking some properties of its inputs, distinguishing between cases, and controlling the flow of data processing based on test results. This is the job of conditional statements, which exist in many different forms. In this section, we only cover the most essential types.


A conditional statement conducts a test (which evaluates to either TRUE or FALSE) and executes additional code based on the value of the test. The simplest conditional in R is the if function, which implements the logic of if-then in the following if (test) {...} structure:

Here, test must evaluate to a single Boolean value (i.e., either TRUE or FALSE). If test is TRUE the code in the subsequent {...} is executed (here: "ok" is printed to the Console) – otherwise the code in the subsequent {...} is skipped, as if it was not there or commented out:

Note that if test is a Boolean value, we do not need to ask for the condition test == TRUE.


If a test fails, we often want something else to happen. To accommodate this desire, a slightly more complicated form of if statement includes an additional {...} after an else statement:

Here, the truth value of test determines whether the 1st or the 2nd {...} is executed. As test must be either TRUE or FALSE, we either see “case 1” printed (if test is TRUE) or “case 2” printed (if test is FALSE).

The following sequence illustrates how tests work (and can fail to work):

Vectorized ifelse

A crucial limitation of R’s basic if statement is that its test only assumes a single TRUE of FALSE as its output. However, when writing functions, we often want to make them work with vectors of input values, rather than a single input. Testing multiple values at once is possible with the ifelse(test, yes, no) function that uses vectorized test, yes, and no arguments (which are recycled to the same length):

Note that the yes, and no values used with ifelse should typically be of the same type, and NA values remain NA:

More complex tests

The condition test of a conditional statement can contain multiple tests. If so, each individual test must evaluate to either TRUE or FALSE and the different tests are linked with && or ||, which work like the logical connectors & and |, but are evaluated sequentially (from left to right):


Here’s a way to fix our problem from above (i.e., evaluating “grandmother” as “male”) by implementing a more comprehensive test:

A vectorized version of this if-then-else statement can be written with ifelse(), but will still mis-classify anything not considered when designing the test (e.g., stepmothers, broomsticks, etc.):

More cases

As we can replace any {...} in a conditional statement if (test) {...} else {...} by another conditional statement, we can distinguish more than 2 cases:

Here, 2 cases are contingent on their corresponding condition being TRUE, otherwise the final {...} is reached and "else" is being printed. Thus, an “else case” often serves as a generic case that occurs when none of the earlier tests are true.

Note that the following variant of this conditional is different:

Here, the final {...} is contingent on another test_3 being TRUE. Thus, the conditions that the final "else" is being printed are not only that test_1 and test_2 are both FALSE but also that test_3 is TRUE. If all 3 tests fail, none of the cases is reached and nothing is printed.

  • When a test evaluates to TRUE, the corresponding {...} is evaluated and any later instances of test and {...} are skipped. Thus, only a single case of {...} is evaluated, even if multiple tests would evaluate to TRUE.


A conditional nursery rhyme

Consider the following check_flow function:

The function appears to implement some nursery rhyme, but is really messy, unfortunately. Hence, we need to clean it up before trying to understand it.

  1. Format the function so that it becomes easier to read and parse.

A possible solution would indent commands, place any } on a new line, and generally introduce lots of white space, as follows:

  1. Describe and try to understand this function. What does it do and how does it do it?

  2. Answer the following questions:

    • Which cases does the 1st conditional statement distinguish?
    • When is the 1st switch statement reached? When is the 2nd switch statement reached?
    • What is the difference between the print and the return statements?
    • Under which conditions does the function return "raus bist du"?
    • What happens when you call check_flow() or check_flow(NA)?
  3. Test your predictions by evaluating the following calls:

  1. Strictly speaking, we have also been using statements that were parsed from right to left (e.g., assignments like x <- 1) or bottom-to-top (e.g., when assigning a multi-line pipe of dplyr statements to an object). Also, given that we have been using functions all along, we really have been jumping around in base R code since our very first session.