Section 5 Functions
So far, we’ve used a few built-in tools like mean()
- if you’re
coming from a language like SPSS you might think of these tools as “commands”,
but in R we call them functions. Functions are basically reusable chunks of
code, that take a certain set of inputs (also called arguments) and either
produce an output (a return value), or just do a task like showing a plot.
When you plug a specific set of inputs into the function and “run” it, we say
that you’re calling the function.
The mean()
function in R can take a vector of numbers as an input,
and return a single number as an output:
## [1] 4.833333
5.1 Arguments
The arguments of a function are the set of inputs it accepts. Some of the inputs will be used to calculate the output, while some might be different options that affect how the calculation happens.
If we look at the arguments for the default mean()
function in R,
accessed by entering ?mean
in the console, we see:
Since the first argument x
appears on its own, it’s a mandatory argument.
You have to provide a value for x
, otherwise you get an error:
Arguments like trim = 0
are optional when you’re calling the function:
the value after the =
is the default value that will be used if you don’t
supply one. The default values tell you what types of input
that argument accepts (numeric, logical, character, etc.), but it’s also good
to read the information on the function’s help page for more detail.
## [1] 23.5
## [1] 23.5
## [1] 23.3125
5.1.1 Positional or Named Arguments
If you provide arguments for a function without specifying an argument name,
they will be used in order, left to right, i.e. by position. R has a
plot(x, y)
function, and if you call it with plot(1:5, 10:6)
, 1:5
will be matched up with x
and 10:6
will be matched up with y
. Most
functions take a vector or dataframe as their first argument, so you often
end up passing the first argument by position.
If you specify the argument name, you are passing the argument by name.
It doesn’t matter which order you pass named arguments in, and you can
skip over any arguments you don’t want to provide. Because the order doesn’t
matter, both of these are equivalent to plot(1:5, 10:6)
:
You can provide arguments by both position and name. Usually, you provide one or two positional arguments first, and then as many named arguments as you need4:
# First two arguments by position, the rest named
plot(1:5, 10:6, type = "l", main = "A scatter plot")
If in doubt, just name all the arguments.
5.2 Writing your own functions
You can write your own functions in R. In fact, there’s no real difference between functions you write yourself and those that are built-in or provided by other packages. Generally, using existing functions saves a lot of time (and debugging), but there’s nothing stopping you from creating your own tools if the existing ones don’t quite give you what you need.
The basic template for creating a function is:
# Just a template, not runnable code
function(arg, optional_arg = TRUE) {
result = arg + 3
if (optional_arg) {
result = result + 1
}
return(result)
}
We set out what arguments are going to be used within the parentheses ()
after
function
. Then, within the braces {}
(called the body of the function),
we write code to transform the inputs into whatever form we need.
return(result)
specificies that we want the output of the function to
be the value of the result
variable5.
A simple example of a useful function you might write yourself is:
# It's good to write a quick comment explaining what the
# function does and what inputs it needs:
# get_percent: return the proportion of TRUE values as a percentage
# Arguments:
# x: a logical vector
get_percent = function(x) {
percent = sum(x) / length(x) * 100
return(percent)
}
random_scores = sample(1:50, size = 20)
get_percent(random_scores > 20)
## [1] 35
The advantages of writing functions are:
- If you’re doing the same thing multiple times on different pieces of data, you can create a function and stop writing the same code over and over.
- A function with a sensible name can make it clearer what is happening
in the code: e.g.
get_percent(score)
vssum(score) / length(score)
. - You can share functions you’ve written with other people - they’re flexible by nature so they’re not tied to your specific data.