Chapter 4 Functions in R

4.1 Functions and their arguments

4.1.1 What are functions?

A key feature of R is functions. Functions are "self contained" modules of code that accomplish a specific task. Functions usually take in some sort of data structure (value, vector, dataframe etc.) as arguments, process them, and then return a result.

The general usage for a function is the name of the function followed by parentheses:

function_name(input)

The input(s) are called arguments, which can include:

the physical object (any data structure) on which the function carries out a task
specifications that alter the way the function operates (e.g. options)

Most functions can take several arguments. If you don't specify a required argument when calling the function, you will receive an error unless the function has set a default value for the argument.

4.1.2 Basic functions

We have already used a few examples of basic functions in the previous lessons i.e c(), and factor(). These functions are available as part of R's built in capabilities, and we will explore a few more of these base functions below.

Many of the base functions in R involve mathematical operations. One example would be the function sqrt(). The input/argument must be a number, and the output is the square root of that number. Let's try finding the square root of 81:

sqrt(81)

Now what would happen if we called the function (e.g. ran the function), on a vector of values instead of a single value?

sqrt(glengths)

In this case the task was performed on each individual value of the vector glengths and the respective results were displayed.

Let's try another function, this time using one that we can change some of the options (arguments that change the behavior of the function), for example round:

round(3.14159)

We can see that we get 3. That's because the default is to round to the nearest whole number. What if we want a different number of significant digits? Let's first learn how to find available arguments for a function.

4.1.3 Seeking help on arguments for functions

The best way of finding out this information is to use the ? followed by the name of the function. Doing this will open up the help manual in the bottom right panel of RStudio that will provide a description of the function, usage, arguments, details, and examples:

?round

Alternatively, if you are familiar with the function but just need to remind yourself of the names of the arguments, you can use:

args(round)

Even more useful is the example() function. This will allow you to run the examples section from the Online Help to see exactly how it works when executing the commands. Let's try that for round():

example("round")

In our example, we can change the number of digits returned by adding an argument. We can type digits=2 or however many we may want:

round(3.14159, digits=2)

Exercise

Let's use base R function to calculate mean value of the glengths vector. You might need to search online to find what function can perform this task.
Create a new vector test <- c(1, NA, 2, 3, NA, 4). Use the same base R function from exercise 1 (with addition of proper argument), and calculate mean value of the test vector. The output should be 2.5.

NOTE: In R, missing values are represented by the symbol NA (not available). It’s a way to make sure that users know they have missing data, and make a conscious decision on how to deal with it. There are ways to ignore NA during statistical calculation, or to remove NA from the vector.
Another commonly used base function is sort(). Use this function to sort the glengths vector in descending order.

4.1.4 User-defined Functions

One of the great strengths of R is the user's ability to add functions. Sometimes there is a small task (or series of tasks) you need done and you find yourself having to repeat it multiple times. In these types of situations, it can be helpful to create your own custom function. The structure of a function is given below:

name_of_function <- function(argument1, argument2) {
    statements or code that does something
    return(something)
}

First you give your function a name.
Then you assign value to it, where the value is the function.

When defining the function you will want to provide the list of arguments required (inputs and/or options to modify behaviour of the function), and wrapped between curly brackets place the tasks that are being executed on/using those arguments. The argument(s) can be any type of object (like a scalar, a matrix, a dataframe, a vector, a logical, etc), and it’s not necessary to define what it is in any way.

Finally, you can “return” the value of the object from the function, meaning pass the value of it into the global environment. The important idea behind functions is that objects that are created within the function are local to the environment of the function – they don’t exist outside of the function.

Let's try creating a simple example function. This function will take in a numeric value as input, and return the squared value.

square_it <- function(x) {
    square <- x * x
    return(square)
}

Once you run the code, you should see a function named square_it in the Environment panel (located at the top right of Rstudio interface). Now, we can use this function as any other base R functions. We type out the name of the function, and inside the parentheses we provide a numeric value x:

square_it(5)

Pretty simple, right? In this case, we only had one line of code that was run, but in theory you could have many lines of code to get obtain the final results that you want to "return" to the user.

We have only scratched the surface here when it comes to creating functions! If you are interested you can also find more detailed information on writing functions R-bloggers site.

Exercise

Write a function called multiply_it, which takes two inputs: a numeric value x, and a numeric value y. The function will return the product of these two numeric values, which is x * y. For example, multiply_it(x=4, y=6) will return output 24.