## 11.2 Essentials of functions

Explicating that functions are pretty powerful tools (see Section 11.1) really whet our appetite for creating our own functions. Thus, the next section illustrates how new functions can be defined in R (Section 11.2.1). However, the process of *defining* new functions is closely connected to our ability for *understanding* functions (Section 11.2.2) and *checking* functions (Section 11.2.3).
We conclude by mentioning some issues of style that matter for creating understandable and usable functions (Section 11.2.4).

### 11.2.1 Defining functions

The basic structure for writing a new function looks as follows:

`<- function(<args>) {<body>} name `

Before going into any more detail, let’s pause for a moment and note some remarkable facts:
In R, functions are objects, just like any other object (e.g., data, parameters, or variables).
Consequently, a new function is defined — just like any other object — by choosing its `name`

and assigning some content to it.
Thus, the key difference between simpler data objects and the “action object” of a function must be in its content:
Rather than only describing what an object *is*, defining a function must include an instruction what to *do*.

Defining an instruction “what to do” immediately raises the question: Do *what* with *what*?
This question helps us disentangling the abstract structure of a function definition into its parts.
Writing a function requires three distinct considerations:

*Do what?*: What is the task to be performed by the function? And what would be a good`name`

for the function? Function names should clearly convey their purpose (“Which task does this function solve?”), resemble verbs (ideally say what the function does, e.g.,`plot`

, or state its output, e.g.,`mean`

), and should be short (as they will be typed many times).*With what?*: What does the function work with? Which*inputs*does it take? Any input that is needed for the function to perform its task must to be supplied as an argument in`<args>`

. Note that`<args>`

are enclosed by round parentheses`(...)`

and multiple arguments are separated by commas. We further characterize and distinguish between different types of arguments:Arguments typically include

*data*(e.g., scalars, vectors, tables, or anything else that the function uses to perform its task) and can contain additional*details*or*parameters*(e.g., instructions on how something is to be done: Which type of calculation is desired? How should`NA`

values be treated? Which color should be used? etc.). Data arguments are typically listed first.Any argument can be

*optional*(by including a default value that is used when no argument is provided) or*mandatory*(by not including a default value). It is common to make data arguments mandatory, and use optional arguments for details.If a function calls other functions and we want to allow users to pass arguments to these latter functions, we can add

`...`

(aka. “dot-dot-dot”) as a special argument. However, this requires that the functions used within the function can deal with the specific arguments that are provided later, and can cause problems and unexpected results.

*Do how*? Once we identified what task is to be performed with what input, the key question remaining is:*How*is the task to be solved? The`<body>`

of a function uses the*inputs*provided by`<args>`

to perform the task for which the function is created. Thus, the part in the curly brackets`{...}`

is typically the longest and most complicated part of a function. Again, this part can be split into multiple parts:Although some functions are only used for their side-effects (e.g., load or save an object, create a graph, etc.), most functions are created to (also) return some sort of

*output*(e.g., some computed`result`

). This can be done by calling the special function`return`

in the function`<body>`

(e.g.,`return(result)`

), typically as its last statement.^{66}When the function does not contain an explicit`return`

statement, it typically returns the result of the last expression evaluated in the function.The

*output*of a function can assume many different data types (scalar, vector, list, table, etc.). For instance, the result of`1:3 + 3:1`

is a vector`4, 4, 4`

.

When taking these considerations into account, are more detailed template of a typical R function is:

```
<- function(arg_1, arg_2 = FALSE) {
task_name
# 1. Check inputs:
{Verify that arg_1 and arg_2 are appropriate inputs.}
# 2. Solve task by using arguments:
{Use arg_1 and arg_2 to solve the task.}
# 3. Collect and assign outputs:
<- {The task solution.}
result
# 4. Return output:
return(result)
}
```

In this template, a new function `task_name()`

is being defined.
Its task is solved by accepting two arguments: `arg_1`

is a mandatory argument (as no default is provided) and `arg_2`

is an optional argument (with a default value of `TRUE`

).
In the function body, the arguments are first checked (e.g., whether they have appropriate type and values) and then used to solve the task. The task solution is then assigned to an object `result`

and then returned by `return(result)`

.

Typical uses of this function `task_name()`

could look as follows:

```
# Using this function:
task_name(arg_1 = x) # providing only required argument
task_name(arg_1 = y, arg_2 = FALSE) # providing both arguments
task_name(arg_2 = TRUE, arg_1 = z) # providing both arguments in reverse order
task_name(z, FALSE) # providing arguments by argument order
task_name(z) # providing only the mandatory (1st) argument
```

We see that, although the function `task_name()`

was purely abstract, we can say quite a bit about its functionality by merely knowing its arguments.
To render the interplay of functions and its arguments more concrete, let’s consider some examples of simple functions.

#### A power function

Consider the following definition of a `power()`

function:

```
<- function(x, exp = 1) {
power ^exp
x }
```

Note that the function’s definition only includes a single line of code.
Thus, our `power()`

function really is just a wrapper for R’s arithmetic operator `^`

, but still allows illustrating all key features of a function. We can explicate its definition as follows:

The function

`power()`

has two arguments: A mandatory argument`x`

and a optional argument`exp`

with a default value of`1`

.The function computes

`x^exp`

and returns the result (as this is the final statement in the function’s body).Thus, the task performed by

`power`

consists in raising`x`

to the`exp`

-th power.

Perhaps note quite as obvious are the following three observations about our `power()`

function:

Although the function does not check or verify its inputs,

`x`

and`exp`

are assumed to be numeric. Providing other types of inputs may cause errors.Naming the function’s 2nd argument

`exp`

(for “exponent”) works, but is easily confused with the**base**R function`exp()`

for computing exponential values (e.g.,`exp(1)`

\(= e \approx 2.718\)).Given what we know about R, the function probably also works for vector inputs of

`x`

. However, it is harder to guess what happens if`exp`

is a vector, or if both arguments are vectors.

#### Checking a function

It is very important to run a range of checks after writing a new function.
When doing these checks, examples that use unusual inputs (like `NA`

, `NULL`

, or arguments of different types) are particularly informative.
Here are some possible ways of checking how `power()`

works:

```
# Check:
power(x = 3)
#> [1] 3
power(x = 3, exp = 2)
#> [1] 9
# Note that the function also works for vector inputs:
power(x = 1:5, exp = 2)
#> [1] 1 4 9 16 25
power(x = 2, exp = 1:4)
#> [1] 2 4 8 16
# Note what happens when both x and exp are vectors:
power(x = 1:3, exp = 1:3)
#> [1] 1 4 27
power(x = 1:2, exp = 1:4)
#> [1] 1 4 1 16
# Note what happens with NA values:
power(x = NA)
#> [1] NA
power(x = 3, exp = NA)
#> [1] NA
# => NA values are 'contagious'.
```

When creating a new function, it is always a good idea to explore and test its limits. Here are some boundary cases that would result in warning or error messages:

```
# Warning:
power(x = 1:2, exp = 1:3)
# Errors:
power() # no argument(s)
power(x = "A") # x is non-numeric argument
power(x = 3, exp = "B") # exp is non-numeric argument
```

It is not necessarily problematic when functions return warnings or errors — in fact, they can be very informative for understanding functions.
As a function’s author or designer (i.e., programmer), we primarily need to decide whether returning an error is justified, given the intended use of the function. If a user enters no arguments or the wrong type of argument to a function, yielding an error can be the appropriate way of saying: *This is wrong and does not work.*
But good programmers also aim to see their functions from the perspectives of their future users.
Do the names of the function and its arguments clearly signal their purposes? Will users know which types and shapes of data are to be provided as inputs? What else may users want to do with this function?
Many misunderstandings can be avoided by choosing transparent names (for both the function and its arguments) and providing good documentation and examples to a function. And if you anticipate many unconventional uses of a function or its arguments, it may be polite to check user inputs for their shape or type, and issue messages or warnings to the user if something was unexpected, missing, or wrong.

#### Omitting argument names

In R, it is possible to omit the argument names of functions.
If this is done, the values provided are assigned to the arguments in the *order* in which they are provided:

```
# Omitting argument names:
power(3) # names can be omitted, and
#> [1] 3
power(3, 2) # arguments are used in order given.
#> [1] 9
power(1:5, 2)
#> [1] 1 4 9 16 25
power(2, 1:4)
#> [1] 2 4 8 16
```

Although omitting argument names saves typing, it is typically more informative to explicitly state the arguments. This makes it more transparent to future readers of your code (including yourself) which value is assigned to which argument and has the advantage that you can enter arguments in any order:

```
# When arguments are named:
power(exp = 3, x = 2) # order is irrelevant.
#> [1] 8
# => Recommendation: Always provide argument names!
```

Thus, it is good practice to always provide argument names (as long as you want others to understand your code).

#### Practice

Let’s try defining some first functions:

- Write a new function that computes the
`n`

-th root of a number (or vector of numbers)`x`

, then check it and explore its limits.

**Hint:** The mathematical fact that \(\sqrt[n]{x} = x^{1/n}\) is helpful for solving this task.

- The first function we encountered in this book (in Chapter 1.2.5) was
`sum()`

. Incidentally,`sum()`

is more complex than it first seemed, as its arguments can be a combination of vectors and scalars:

```
sum(1, 2, 3, 4)
#> [1] 10
sum(1, c(2, 3), 4)
#> [1] 10
sum(c(1, 2), c(3, 4))
#> [1] 10
```

We now have learned that values are assigned by their position to function arguments when we omit argument names. Explain why the following yield different results:

```
sum(1, 2, 3, NA, na.rm = TRUE)
#> [1] 6
sum(1, 2, 3, NA, TRUE)
#> [1] NA
```

**Hint:** What do `sum(1, 2, 3, NA)`

and `sum(TRUE)`

evaluate to?

#### Explicit `return()`

A more explicit version of our `power()`

function from above could look as follows:

```
<- function(x, exp = 1) {
power
<- x^exp
result
return(result)
}
```

Whereas the shorter version of the function relied on returning its *last* (and only) expression, this version makes it more explicit what is being computed and returned. As functions get larger and more complicated, it is generally a good idea to include explicit `return()`

statements. Importantly, a function can include multiple return statements and is exited as soon as a `return()`

is reached.
For instance, the following variant would never print its final disclaimer:

```
<- function(x, exp = 1) {
power_joke
<- x^exp
result
return(result)
"Just kidding"
}
```

#### Practice

- Test the
`power_joke()`

function (by evaluating it with various arguments) and try to obtain the`"Just kidding"`

line as an output.

#### Solution

The following expressions are suited to explore the `power_joke()`

function:

```
# Checks: ------
power_joke(5)
power_joke(5, 2)
power_joke(1/2, 1/2) # non-integers
power_joke(1:5, 2) # vectors 1
power_joke(2, 1:5) # vectors 2
power_joke(1:5, 1:5) # vectors 1+2
power_joke(NA) # missing values
# Errors:
power_joke()
power_joke("A")
# => "Just kidding" is never reached.
```

These tests show that the final expression (i.e., the character string “just kidding”) is never reached.

- In a 2nd step, comment out the line
`return(result)`

and re-run the same checks.

#### Solution

```
# Commenting out return(result): ------
<- function(x, exp = 1) {
power_joke
<- x^exp
result
# return(result)
"Just kidding"
}
# Checks:
power_joke(5)
power_joke(5, 2)
power_joke(1/2, 1/2) # non-integers
power_joke(1:5, 2) # vectors 1
power_joke(2, 1:5) # vectors 2
power_joke(1:5, 1:5) # vectors 1+2
power_joke(NA) # missing values
# Errors:
power_joke()
power_joke("A")
```

Without `return(result)`

, the function always returns “Just kidding,” unless an earlier error occurs.

### 11.2.2 Understanding functions

How can we understand a function?
Even when we are completely clueless about a function, we can always try to understand it by using it with different inputs and see what happens. This treats the function as a black box and is exactly what we did to explore the `plot_fn()`

and `plot_fun()`

functions of **ds4psy** (in Section 1.2.5 and Exercise 1 of Chapter 1).
But having progressed further in our R career, we now dare to look inside the black box and ask:
How does this function transform its inputs into outputs?
Asking and answering such *how* questions promotes a *mechanistic* understanding of a function, that not only provide us with an idea about the function’s *purpose* (or ``function’’), but also enables us to criticize and improve it.

#### Example

Let’s define a new function `describe()`

and try to understand what it does by asking how it transforms its inputs into outputs:

```
<- function(v, rm_na = TRUE){
describe
# (a) Check v:
if (all(is.na(v))) {return(v)}
if (all(is.null(v))) {return(v)}
if (!is.numeric(v)) {
message("v must be numeric:")
return(v)
}
# (b) Compute some metrics:
<- mean(v, na.rm = rm_na)
mn <- median(v, na.rm = rm_na)
md <- min(v, na.rm = rm_na)
min <- max(v, na.rm = rm_na)
max <- quantile(v, .25, na.rm = TRUE)
q25 <- quantile(v, .75, na.rm = TRUE)
q75 <- sum(is.na(v))
nr_NA
# (c) Create output vector:
<- c(min, q25, md, mn, q75, max, nr_NA)
out names(out) <- c("min", "q25", "md", "mn", "q75", "max", "nr_NA")
# (d)
return(out)
}
```

This example illustrates the difference between *using* and *understanding* a function — and that the definition of a function can get long and complicated. As you can think of a function as a program to tackle a specific task, it is not uncommon for function bodies to stretch over dozens or hundreds of lines of code. The longer and more complicated a function gets, the more difficult it is to *understand* and — from a programmer’s perspective — to write and to debug. For this reason, programmers typically try to structure long and complex functions into smaller parts, which can then be delegated to shorter functions. But understanding a function that calls many other functions then implies that we also need to understand these other functions.

By contrast, *using* a very long and complex function does not need to be difficult. In fact, when calling functions like `mutate()`

, `ggplot()`

, or `summarise()`

we typically do not notice that we implicitly call upon the mighty machinery of the entire **dplyr** and **ggplot** packages. It is conceivable (either as a spooky dystopia, or as a marvelous feat of ‘artificial intelligence’) that we could simply run some `do_stats()`

or `write_paper()`

function and let the computer do our job. But as long as other programmers and machine learning have not yet solved these tasks, we need to learn how to use, write, and understand functions to address them.

#### Practice

Before reading on, describe the `describe()`

function (defined in the previous code chunk):

- What types of inputs does it take?
- What do its different parts (a) to (d) do?
- What outputs will it return?
- What is the task, goal, or purpose of this function?
- Which calls will yield errors?

Check your predictions by copying and calling the function with various arguments.

#### Solution

Gaining a mechanistic understanding of a function implies that we understand *how* its outputs depend on its inputs.
Eventually, this should also indicate the function’s *purpose*, but first we simply describe what a function does with its inputs.

The `describe()`

function could be described as follows:

The function

`describe()`

has two arguments: A mandatory argument`v`

and a optional argument`rm_na`

with a default value of`TRUE`

.The function first examines its input argument

`v`

:Does

`v`

evaluate to`NA`

or`NULL`

? If so, it simply returns`v`

(i.e.,`NA`

or`NULL`

, respectively).

Is

`v`

non-numeric? If so, it prints a message to the user and returns`v`

.The function then computes seven different statistical measures. This illustrates that functions can do multiple things and typically use other functions to do so. For some of these functions (e.g.,

`mean`

), the`describe`

function passes the value of its optional argument`rm_na`

to another function’s`na.rm`

argument. However, for another function (`quantile`

), the`describe`

function does not use its`rm_na`

argument, but always provides`na.rm = TRUE`

.The function then creates a numeric vector

`out`

that includes the 7 computed measures in a specific order and adds names to the vector.The function then returns the vector

`out`

.Overall, the task addressed by

`describe()`

is to provide a range of descriptive statistics of a numeric vector`v`

.

### 11.2.3 Checking functions

Once we gained a basic understanding of a function, we can check both the function and our understanding of it by using it with a variety of arguments. Ideally, we should use our understanding to *predict* what happens when calling the function with a specific argument and then use the function to verify or falsify our prediction.

As we saw above, the results of such checks are more informative if you use the function not only with its intended inputs.
Using unusual and probably unintended inputs (like `NA`

, `NULL`

, or inputs of different data types) will show you the limits of a function. And given the importance of vectors in R, a good question to ask about a new function is: Does this function only work with scalar inputs, or does it also works with vectors?

#### Example

Here are some possible ways of checking how the `describe()`

function works:

```
# Check:
<- 1:10
v describe(v)
#> min q25 md mn q75 max nr_NA
#> 1.00 3.25 5.50 5.50 7.75 10.00 0.00
describe(c(NA, v, NA))
#> min q25 md mn q75 max nr_NA
#> 1.00 3.25 5.50 5.50 7.75 10.00 2.00
describe(v, rm_na = FALSE)
#> min q25 md mn q75 max nr_NA
#> 1.00 3.25 5.50 5.50 7.75 10.00 0.00
describe(c(v, NA), rm_na = FALSE)
#> min q25 md mn q75 max nr_NA
#> NA 3.25 NA NA 7.75 NA 1.00
# Note:
describe(NA)
#> [1] NA
describe(NULL)
#> NULL
describe(c(NA, NA, NA))
#> [1] NA NA NA
describe("A")
#> [1] "A"
describe(tibble::tibble(v = 1:10))
#> # A tibble: 10 × 1
#> v
#> <int>
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
#> 6 6
#> 7 7
#> 8 8
#> 9 9
#> 10 10
# Note: The following calls yield errors:
# describe()
# describe(x)
```

Actually, `describe()`

is — apart from subtle differences — quite similar to the **base** R function `summary()`

:

```
# Compare with base::summary function:
summary(v)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.00 3.25 5.50 5.50 7.75 10.00
summary(c(NA, v, NA))
#> Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
#> 1.00 3.25 5.50 5.50 7.75 10.00 2
# But note differences:
summary(NA)
#> Mode NA's
#> logical 1
summary(NULL)
#> Length Class Mode
#> 0 NULL NULL
summary("A")
#> Length Class Mode
#> 1 character character
summary(tibble::tibble(v = 1:10))
#> v
#> Min. : 1.00
#> 1st Qu.: 3.25
#> Median : 5.50
#> Mean : 5.50
#> 3rd Qu.: 7.75
#> Max. :10.00
# Error for:
# summary()
```

#### Practice

- Predict the result of
`describe(c(NULL, NA))`

. Then evaluate the expression and explain its result.

```
describe(c(NULL, NA))
# Hint: Check the result of
c(NULL, NA)
```

### 11.2.4 Issues of style

Creating new functions only makes sense when someone can understand and use them. Hence, writing a new function always needs to take into account the viewpoint of its users, even if those will mostly be our future selves.

In any art or craft, issues of style are important, partially a matter of taste, and largely a matter of practice and experience. Just as the work of architects, designers, and authors tends to mature in an exchange with colleagues, and with more time and multiple revisions, computer programming tends to benefit from feedback and well-organized teams. But even for an individual programmer, writing good functions is a journey, rather than a destination, and a life-long aspiration.

The primary goal of programming new functions is providing some required functionality in a clear and transparent fashion. Here are some general guidelines towards and questions that help achieving this goal:

*Task*/*goal*/*purpose*: Be aware of the*task*,*goal*, and*purpose*of a function. Explicating these “functional” aspects may involve answering a range of related questions:*Task*: What*task*does the function perform?

*Mechanism*:*How*does the function solve this task?

*Audience*:*For whom*does the function solve this task?

*Arguments*: Consider the*requirements*of a function:- What does the function
*need*to achieve its goal?

- Of what shape or type are the objects that the function uses as its
*inputs*?

- Which arguments are
*necessary*, which ones are*optional*?

- What does the function

*Result*: Consider the*output*of a function:- What should the function
*return*?

- Of what shape or type are the objects that the function provides as its
*outputs*?

- Are there any
*side-effects*to consider?

- What should the function

*Naming*: Make sure that you choose good*names*for both a function and its arguments.- Do the names clearly convey the
*task*,*goal*and*purpose*of the function and its arguments?

- Are all names
*succinct*and can easily be typed?

- Do all names correspond to the names of
*related functions*?

- Do the names clearly convey the

*Formatting*: Code each function so that its internal*structure*is clear and it is easy to read and understand (i.e., use blank lines between different parts and automatic indentation).

*Documentation*: Provide clear*comments*that explain*what*the function does,*why*it is doing it in a particular way, and anything else that may be difficult to understand.

Ultimately, programming functions always involves a considerable portion of psychology: Who is the audience of your function? Try to anticipate what needs, preferences, and wishes the users of your functions will have. Will people be able to understand and use your function? How robust is your function when users provide different data types or may misinterpret its purpose or scope? Although you probably are the primary user of your new function, anticipating possible misconceptions and responding to user feedback are important aspects of ensuring that it will remain useful in the future.

In most functions, the

`return()`

statement is the final statement of the function`<body>`

.

See Chapter 19.6: Return values for special cases in which it makes sense to provide earlier and multiple`return`

statements or provide`invisible`

return values (e.g., to write pipeable functions).↩︎