11.2 Essentials of functions

Explicating that functions are pretty powerful tools (see Section 11.1) really whet our appetite for creating our own functions. Thus, the next section illustrates how new functions can be defined in R (Section 11.2.1). However, the process of defining new functions is closely connected to our ability for understanding functions (Section 11.2.2) and checking functions (Section 11.2.3). We conclude by mentioning some issues of style that matter for creating understandable and usable functions (Section 11.2.4).

11.2.1 Defining functions

The basic structure for writing a new function looks as follows:

Thus, writing a function initially requires 3 considerations:

  1. What would be a good name for the function? Function names should clearly convey their purpose (“Which task does this function solve?”), resemble verbs (ideally say what the function does, e.g., plot, or state its output, e.g., mean), and should be short (as they will be typed many times).

  2. Which inputs does the function take? Any input that is necessary for the function to perform its task must to be supplied as a list of arguments <args>.

  • Arguments typically include data (e.g., scalars, vectors, tables, or anything else that the function uses to perform its task) and can contain additional details or parameters (e.g., instructions on how something is to be done: Which type of calculation is desired? How should NA values be treated? Which color should be used? etc.). Data arguments are typically listed first.

  • Any argument can be optional (by including a default value that is used when no argument is provided) or mandatory (by not including a default value). It is common to make data arguments mandatory, and use optional arguments for details.

  • If a function calls other functions and we want to allow users to pass arguments to these latter functions, we can add ... (dot-dot-dot) as a special argument. However, this requires that the functions used within the function can deal with the specific arguments that are provided later, and can cause problems and unexpected results.

  1. The <body> of a function uses the inputs provided by <args> to perform the task for which the function is created.
  • Although some functions are only used for their side-effects (e.g., load or save an object, create a graph, etc.), most functions are created to (also) return some sort of output (e.g., some computed result). This can be done by calling the special function return in the function <body> (e.g., return(result)), typically as its last statement.55 When the function does not contain an explicit return statement, it typically returns the result of the last expression evaluated in the function.

  • The output of a function can assume many different data types (scalar, vector, list, table, etc.). For instance, the result of 1:3 + 3:1 is a vector 4, 4, 4.

Thus, are more detailed template of a typical function is:

In this template, arg_1 is a mandatory argument (as no default is provided) and arg_2 is an optional argument (with a default value of TRUE). The arguments are checked (e.g., whether they have appropriate type and values) and then used to solve the task. Its result is then assigned to result and returned by return(result).

Typical uses of this function could look as follows:

To render the interplay of functions and its arguments more concrete, let’s consider some examples of simple functions.

A power function

Consider the following definition of a power() function:

We can explicate the definition this function as follows:

  • The function power() has 2 arguments: A mandatory argument x and a optional argument exp with a default value of 1.

  • The function computes x^exp and returns the result (as this is the final statement in the function’s body).

  • Thus, the task addressed by power is to raise x to the exp-th power.

  • Although the function does not verify anything, x and exp are assumed to be numeric.

  • Given what we know about R, the function probably also works for vector inputs of x. However, it is harder to guess what happens if exp is a vector, or if both arguments are vectors.

Checking our function

It is very important to run a range of checks after writing a new function. When doing these checks, examples that use unusual inputs (like NA, NULL, or arguments of different types) are particularly informative.

Here are some possible ways of checking how power works:

When writing a new function, it always is a good idea to test its limits. Here are some calls that would result in warning or error messages:

It is not necessarily problematic when functions return warnings or errors — in fact, they can be very informative for understanding functions. As a function’s author or designer (i.e., programmer), you primarily need to decide whether returning an error is justified, given the intended use of the function. If a user enters no arguments or the wrong type of argument to a function, yielding an error can be the appropriate way of saying: This doesn’t work. But good programmers also try to view their functions from the perspective of their future users. Do the names of the function and its arguments clearly signal their purposes? Will users know which types and shapes of data are to be provided as inputs? A lot of this can be handled by choosing transparent names and providing good documentation to a function. And if you anticipate many misunderstandings for a function or its arguments, it may be polite to check user inputs for their type and issue a message to the user if something was missing or wrong.

Omitting argument names

In R, it is possible to omit the argument names of functions. If this is done, the values provided are assigned to the arguments in the order in which they are provided:

Although this saves typing, it is typically more informative to explicitly state the arguments. This makes it more transparent to future readers of your code (including yourself) which value is assigned to which argument and has the advantage that you can enter arguments in any order:

Thus, it is good practice to always provide argument names (as long as you want others to understand your code).

Practice

  1. Write a new function that computes the n-th root of a number (or vector of numbers) x, then check it and explore its limits.

  2. The first function we encountered in this book (in Chapter 1.2.2) was sum(). Incidentally, sum() is more complex than it first seemed, as its arguments can be a combination of vectors and scalars:

We now have learned that values are assigned by their position to function arguments when we omit argument names. Explain why the following yield different results:

Hint: What do sum(1, 2, 3, NA) and sum(TRUE) evaluate to?

Explicit return

A more explicit version of our power function from above could look as follows:

Whereas the shorter version of the function relied on returning its last (and only) expression, this version makes it more explicit what is being computed and returned. As functions get larger and more complicated, it is generally a good idea to include explicit return statements. Importantly, a function can include multiple return statements and is exited as soon as a return is reached. For instance, the following variant would never print its final disclaimer:

Practice

  • Test the power_joke() function (by evaluating it with various arguments) and try to obtain the "Just kidding" line as an output.

Solution

The following expressions are suited to explore the power_joke() function:

These tests show that the final expression (i.e., the character string “just kidding”) is never reached.

  • In a 2nd step, comment out the line return(result) and re-run the same checks.

11.2.2 Understanding functions

How can we understand a function? Even when we are completely clueless about a function, we can always try to understand it by using it with different inputs and see what happens. This treats the function as a black box and is exactly what we did to explore the plot_fn() and plot_fun() functions of ds4psy (in Section 1.2.2 and Exercise 1 of Chapter 1). But having progressed further in our R career, we now dare to look inside the black box and ask: How does this function transform its inputs into outputs? Asking and answering such how questions promotes a mechanistic understanding of a function, that not only provide us with an idea about the function’s purpose (or ``function’’), but also enables us to criticize and improve it.

Example

Let’s define a new function describe() and try to understand what it does by asking how it transforms its inputs into outputs:

This example illustrates the difference between using and understanding a function — and that the definition of a function can get long and complicated. As you can think of a function as a program to tackle a specific task, it is not uncommon for function bodies to stretch over dozens or hundreds of lines of code. The longer and more complicated a function gets, the more difficult it is to understand and — from a programmer’s perspective — to write and to debug. For this reason, programmers typically try to structure long and complex functions into smaller parts, which can then be delegated to shorter functions. But understanding a function that calls many other functions then implies that we also need to understand these other functions.

By contrast, using a very long and complex function does not need to be difficult. In fact, when calling functions like mutate(), ggplot(), or summarise() we typically do not notice that we implicitly call upon the mighty machinery of the entire dplyr and ggplot packages. It is conceivable (either as a spooky dystopia, or as a marvelous feat of ‘artificial intelligence’) that we could simply run some do_stats() or write_paper() function and let the computer do our job. But as long as other programmers and machine learning have not yet solved these tasks, we need to learn how to use, write, and understand functions to address them.

Practice

Before reading on, describe the describe() function (defined in the previous code chunk):

  • What types of inputs does it take?
  • What do its different parts (a) to (d) do?
  • What outputs will it return?
  • What is the purpose of this function?
  • Which calls will yield errors?

Check your predictions by copying and calling the function with various arguments.

Solution

Gaining a mechanistic understanding of a function implies that we understand how its outputs depend on its inputs. Eventually, this should also indicate the function’s purpose, but first we simply describe what a function does with its inputs.

The describe() function could be described as follows:

  • The function describe() has 2 arguments: A mandatory argument v and a optional argument rm_na with a default value of TRUE.

  • The function first examines its input argument v:

  • Does v evaluate to NA or NULL? If so, it simply returns v (i.e., NA or NULL, respectively).
  • Is v non-numeric? If so, it prints a message to the user and returns v.

  • The function then computes 7 different statistical measures. This illustrates that functions can do multiple things and typically use other functions to do so. For some of these functions (e.g., mean), the describe function passes the value of its optional argument rm_na to another function’s na.rm argument. However, for another function (quantile), the describe function does not use its rm_na argument, but always provides na.rm = TRUE.

  • The function then creates a numeric vector out that includes the 7 computed measures in a specific order and adds names to the vector.

  • The function then returns the vector out.

  • Overall, the task addressed by describe() is to provide a range of descriptive statistics of a numeric vector v.

11.2.3 Checking functions

Once we gained a basic understanding of a function, we can check both the function and our understanding of it by using it with a variety of arguments. Ideally, we should use our understanding to predict what happens when calling the function with a specific argument and then use the function to verify or falsify our prediction.

As we saw above, the results of such checks are more informative if you use the function not only with its intended inputs. Using unusual and probably unintended inputs (like NA, NULL, or inputs of different data types) will show you the limits of a function. And given the importance of vectors in R, a good question to ask about a new function is: Does this function only work with scalar inputs, or does it also works with vectors?
#### Example {-}

Here are some possible ways of checking how the describe() function works:

Actually, describe() is — apart from subtle differences — quite similar to the base R function summary():

Practice

  • Predict the result of describe(c(NULL, NA)). Then evaluate the expression and explain its result.

11.2.4 Issues of style

Creating new functions only makes sense when someone can understand and use them. Hence, writing a new function always needs to take into account the viewpoint of its users, even if those will mostly be our future selves.

In any art or craft, issues of style are important, partially a matter of taste, and largely a matter of practice and experience. Just as the work of architects, designers, and authors tends to mature in an exchange with colleagues, and with more time and multiple revisions, computer programming tends to benefit from feedback and well-organized teams. But even for an individual programmer, writing good functions is a journey, rather than a destination, and a life-long aspiration.

The primary goal of programming new functions is providing some required functionality in a clear and transparent fashion. Here are some general guidelines towards and questions that help achieving this goal:

  1. Goal/purpose: Be aware of the goal and purpose of your function. Explicating these “functional” aspects involves answering a range of related questions:

    • Task: What task does the function perform?
    • Mechanism: How does the function solve this task?
    • Audience: For whom does the function solve this task?
  1. Arguments: Consider the requirements of your function:

    • What does the function need to achieve its goal?
    • Of what class or type are the objects that the function uses as its inputs?
    • Which arguments are necessary, which ones are optional?
  1. Result: Consider the output of your function:

    • What should the function return?
    • Of what class or type are the objects that the function provides as its outputs?
    • Are there any side-effects to consider?
  1. Naming: Make sure that you choose good names for both your functions and their arguments.

    • Do the names clearly convey the purpose of the function and its arguments?
    • Are all names succinct and can easily be typed?
    • Do all names correspond to the names of related functions?
  1. Format: Code each function so that its structure is clear and it is easy to read and understand (i.e., use blank lines between different parts and automatic indentation).
  1. Documentation: Provide clear comments that explain what the function does, why it is doing it in a particular way, and anything else that may be difficult to understand.

Ultimately, programming functions always involves a considerable portion of psychology: Who is the audience of your function? Try to anticipate what needs, preferences, and wishes the users of your functions will have. Will people be able to understand and use your function? How robust is your function when users provide different data types or may misinterpret its purpose or scope? Although you probably are the primary user of your new function, anticipating possible misconceptions and responding to user feedback are important aspects of ensuring that it will remain useful in the future.


  1. In most functions, the return() statement is the final statement of the function <body>. See Chapter 19.6: Return values for special cases in which it makes sense to provide earlier and multiple return statements or provide invisible return values (e.g., to write pipeable functions).