16 Write your own functions

16.1 Objectives

Understand and describe when and why a function is of value
Understand the components of a function

function() function, what’s your function?

16.2 Introduction to functions

Functions are an important part of programming in R. You have already used many functions—now it’s time to write your own.

Functions have particular value in making your code DRY (don’t repeat yourself).⁸ If you need to do the same thing over and over again, whether in a single script or to create the a report with the same calculations when next month’s data arrives, you can either copy-and-paste with some find-and-replace, or take the time to write a generalized function.

Using functions has the following benefits:

there’s fewer opportunities for error (say, you miss a replacement)
your code will be better organized and easier to read

There are three parts to any function:

function name
inputs, which are called “arguments”
code statement(s) that do something with/to the argument(s)

In R, the function to create a function is function().

The code looks like this:

my.function.name <- function(my_argument_1, my_argument_2, ...) {
  code_statement
}

16.3 A basic example

(from “Writing your own Functions” in Hands-on Programming with R, by Garrett Grolemund)

Let’s say you want to simulate rolling 2 six-sided dice. Here’s some code that does that:

# define the possible outcomes
die <- 1:6
# a random pair of numbers from the `die` object
dice <- sample(die, size = 2, replace = TRUE)
# then add them up
sum(dice)

## [1] 10

To do this more than once, you would need to write the last two lines of code in again and again:

dice <- sample(die, size = 2, replace = TRUE)
sum(dice)

## [1] 7

dice <- sample(die, size = 2, replace = TRUE)
sum(dice)

## [1] 8

dice <- sample(die, size = 2, replace = TRUE)
sum(dice)

## [1] 6

Or you could write a function.

In this case, our code_statement is the three lines of code:

roll <- function() {
  die <- 1:6
  dice <- sample(die, size = 2, replace = TRUE)
  sum(dice)
}

Note that when we run this chunk, the function gets stored in our environment, and appears under Functions.

To run the function we just need the following:

roll()

## [1] 11

# and if we want to assign the result of the function to an object:
my_roll <- roll()

And then if we want to get the distribution of values from rolling a pair of dice 100 times, we can use the replicate() function:

replicate(100, roll())

##   [1]  7 10  7  6  8  7  7  9  9  7  4  8  7  9  5 10  4  3  6  4 12  3  3  6  8  4  4  7  7  8  7
##  [32]  4  5  8  9  9  5  5  9  7  8  7  5 10  7  6  5  7  5 11  7  9  6 11  5  4  8  8  8  5  7  3
##  [63]  7  9  4  3 11 11  2  7 11  6  9  5  5  6  8  7  6 12  9 10  5  5 10  7  7  7  8  9  5  6  5
##  [94] 11  9 10 11  8  6  4

16.4 A more complex example

Let’s write an equation that calculates the area of a circle that has a radius of 2.5 units.

You’ll remember that the formula is “pi r squared” where r is the radius of the circle.

$\ A = \pi r ^{2}$

When we translate that formula into code, it looks like this, where ca is “circle area”:

ca <- pi * 2.5 ^ 2

ca

## [1] 19.63495

But what if we want to calculate the area of many circles of different size? Time for a function.

We call the function with circle_area(), and pass the radius value r. The function returns the value of ca after calculating the value.

circle_area <- function(r) {
  ca <- pi * r ^ 2
  return(ca)
}

Let’s test it with our circle of radius 2.5:

circle_area(r = 2.5)

## [1] 19.63495

Now we can use that function to calculate any number of circle areas. Let’s make a tibble with 10 circles, where circle_r is the radius of each circle.

# the `set.seed()` function ensures that our pseudo-random number generation
# always returns the same values
set.seed(42)

my_circles <- tibble(
  circle_name = letters[1:10],   # the letters from a to j
  circle_r = seq(1, 19, 2))      # a sequence of numbers from 1 to 19, by 2s

my_circles

## # A tibble: 10 x 2
##    circle_name circle_r
##    <chr>          <dbl>
##  1 a                  1
##  2 b                  3
##  3 c                  5
##  4 d                  7
##  5 e                  9
##  6 f                 11
##  7 g                 13
##  8 h                 15
##  9 i                 17
## 10 j                 19

What is the area of those circles? I can apply my function…

my_circles %>%
  mutate(circle_a = circle_area(r = circle_r))

## # A tibble: 10 × 3
##    circle_name circle_r circle_a
##    <chr>          <dbl>    <dbl>
##  1 a                  1     3.14
##  2 b                  3    28.3 
##  3 c                  5    78.5 
##  4 d                  7   154.  
##  5 e                  9   254.  
##  6 f                 11   380.  
##  7 g                 13   531.  
##  8 h                 15   707.  
##  9 i                 17   908.  
## 10 j                 19  1134.

That’s good. But what if we want our result to be rounded to 2 decimal places?

We could rewrite our function to add that:

note that in the 2nd line of the function’s code, we are rounding ca before returning it.

circle_area <- function(r) {
  ca <- pi * r ^ 2
  ca <- round(ca, 2)
  return(ca)
}

Now re-run the mutate():

my_circles %>%
  mutate(circle_a = circle_area(r = circle_r))

## # A tibble: 10 × 3
##    circle_name circle_r circle_a
##    <chr>          <dbl>    <dbl>
##  1 a                  1     3.14
##  2 b                  3    28.3 
##  3 c                  5    78.5 
##  4 d                  7   154.  
##  5 e                  9   254.  
##  6 f                 11   380.  
##  7 g                 13   531.  
##  8 h                 15   707.  
##  9 i                 17   908.  
## 10 j                 19  1134.

What if we want want to have some flexibility in our rounding? We can add a 2nd argument to our function call – rnd – which will be then applied in round()

circle_area <- function(r, rnd) {
  ca <- pi * r ^ 2
  ca <- round(ca, rnd)
  return(ca)
}

And rerun with the area to 4 decimals:

my_circles %>%
  mutate(circle_a = circle_area(r = circle_r, rnd = 4))

## # A tibble: 10 × 3
##    circle_name circle_r circle_a
##    <chr>          <dbl>    <dbl>
##  1 a                  1     3.14
##  2 b                  3    28.3 
##  3 c                  5    78.5 
##  4 d                  7   154.  
##  5 e                  9   254.  
##  6 f                 11   380.  
##  7 g                 13   531.  
##  8 h                 15   707.  
##  9 i                 17   908.  
## 10 j                 19  1134.

16.4.1 Setting default values

What if we usually want to round to 2, but to have some flexibility? We can assign the default value of 2 to the rounding argument. If rnd doesn’t get specified, it will be 2 … but we can over-ride that with a specific value assignment.

circle_area <- function(r, rnd = 2) {
  ca <- pi * r ^ 2
  ca <- round(ca, rnd)
  return(ca)
}

With no specification:

note that we don’t have to be explicit in telling the function how to use the variable circle_r—it assumes that since it is the first argument passed, it is r. And because we don’t pass a second argument, the function will use the default value we’ve specified, in this case “2”.

my_circles %>%
  mutate(circle_a = circle_area(circle_r))

## # A tibble: 10 × 3
##    circle_name circle_r circle_a
##    <chr>          <dbl>    <dbl>
##  1 a                  1     3.14
##  2 b                  3    28.3 
##  3 c                  5    78.5 
##  4 d                  7   154.  
##  5 e                  9   254.  
##  6 f                 11   380.  
##  7 g                 13   531.  
##  8 h                 15   707.  
##  9 i                 17   908.  
## 10 j                 19  1134.

Or override with rounding to 3 decimals:

my_circles %>%
  mutate(circle_a = circle_area(circle_r, 3))

## # A tibble: 10 × 3
##    circle_name circle_r circle_a
##    <chr>          <dbl>    <dbl>
##  1 a                  1     3.14
##  2 b                  3    28.3 
##  3 c                  5    78.5 
##  4 d                  7   154.  
##  5 e                  9   254.  
##  6 f                 11   380.  
##  7 g                 13   531.  
##  8 h                 15   707.  
##  9 i                 17   908.  
## 10 j                 19  1134.

16.4.1.1 Your turn

Write a function that calculates the area of a triangle, using the formula

$\ A = hb/2$ where “h” is the height of the triangle, and “b” is the base.

Now, write and test a function to calculate the area of a triangle, where it is automatically rounded to 1 decimal place.

Solution

The function for a circle is pasted below:

# example
circle_area <- function(r, rnd = 2) {
  ca <- pi * r ^ 2
  ca <- round(ca, rnd)
  return(ca)
}

Now let’s edit the code above, but to work with the formula for a triangle:

# solution
tri_area <- function(h, b, rnd = 1) {
  ta <- (h * b) / 2
  ta <- round(ta, rnd)
  return(ta)
}

Now, generate a tibble with the name my_triangles:

set.seed(42)

my_triangles <- tibble(
  tri_name = letters[1:10], 
  tri_h = round(runif(10, min = 1, max = 20), 2),
  tri_b = round(runif(10, min = 1, max = 20), 2))

my_triangles

## # A tibble: 10 × 3
##    tri_name tri_h tri_b
##    <chr>    <dbl> <dbl>
##  1 a        18.4   9.7 
##  2 b        18.8  14.7 
##  3 c         6.44 18.8 
##  4 d        16.8   5.85
##  5 e        13.2   9.78
##  6 f        10.9  18.9 
##  7 g        15    19.6 
##  8 h         3.56  3.23
##  9 i        13.5  10.0 
## 10 j        14.4  11.6

And use the function to calculate the area of all the triangles in the tibble:

my_triangles %>% 
  mutate(tri_a = tri_area(tri_h, tri_b))

## # A tibble: 10 × 4
##    tri_name tri_h tri_b tri_a
##    <chr>    <dbl> <dbl> <dbl>
##  1 a        18.4   9.7   89.1
##  2 b        18.8  14.7  138. 
##  3 c         6.44 18.8   60.4
##  4 d        16.8   5.85  49.1
##  5 e        13.2   9.78  64.5
##  6 f        10.9  18.9  102. 
##  7 g        15    19.6  147. 
##  8 h         3.56  3.23   5.7
##  9 i        13.5  10.0   67.5
## 10 j        14.4  11.6   83.9

Now round to the nearest integer (zero decimals)

my_triangles %>% 
  mutate(tri_a = tri_area(tri_h, tri_b, rnd = 2))

## # A tibble: 10 × 4
##    tri_name tri_h tri_b  tri_a
##    <chr>    <dbl> <dbl>  <dbl>
##  1 a        18.4   9.7   89.1 
##  2 b        18.8  14.7  138.  
##  3 c         6.44 18.8   60.4 
##  4 d        16.8   5.85  49.1 
##  5 e        13.2   9.78  64.5 
##  6 f        10.9  18.9  102.  
##  7 g        15    19.6  147.  
##  8 h         3.56  3.23   5.75
##  9 i        13.5  10.0   67.5 
## 10 j        14.4  11.6   83.9

16.5 Readings and reference

“Functions” in R for Data Science by Hadley Wickham and Garrett Grolemund

“Writing your own Functions” in Hands-on Programming with R, by Garrett Grolemund

“Functions”, from Introduction to the R Language, Berkley, Biostatistics 140.776

“Creating Functions”, from Programming with R, Software Carpentry

“Writing simple functions” at Environmental Computing

“Mathematics in R Markdown” by R Pruim (2016-10-19)

set.seed(42)

my_triangles <- tibble(
  tri_name = letters[1:10], 
  tri_h = round(runif(10, min = 1, max = 20), 2),
  tri_b = round(runif(10, min = 1, max = 20), 2))

my_triangles

## # A tibble: 10 × 3
##    tri_name tri_h tri_b
##    <chr>    <dbl> <dbl>
##  1 a        18.4   9.7 
##  2 b        18.8  14.7 
##  3 c         6.44 18.8 
##  4 d        16.8   5.85
##  5 e        13.2   9.78
##  6 f        10.9  18.9 
##  7 g        15    19.6 
##  8 h         3.56  3.23
##  9 i        13.5  10.0 
## 10 j        14.4  11.6

#write_csv(my_triangles, "my_triangles.csv")

-30-

15 Assignment 3 - exploratory data analysis

17 Modeling