15 Write your own functions

15.1 Objectives

Understand and describe when and why a function is of value
Understand the components of a function

function() function, what’s your function?

15.2 Setup

This chunk of R code loads the packages that we will be using.

library(tidyverse)

15.3 Introduction to functions

Functions are an important part of programming in R. You have already used many functions—now it’s time to write your own.

Functions have particular value in making your code DRY (don’t repeat yourself).¹⁰ If you need to do the same thing over and over again, whether in a single script or to create the a report with the same calculations when next month’s data arrives, you can either copy-and-paste with some find-and-replace, or take the time to write a generalized function.

Using functions has the following benefits:

there’s fewer opportunities for error (say you miss a replacement)
your code will be better organized and easier to read

There are three parts to any function:

function name
inputs, which are called “arguments”
code statement(s) that do something with/to the argument(s)

In R, the function to create a function is function().

The code looks like this:

my.function.name <- function(my_argument_1, my_argument_2, ...) {
  code_statement
}

15.4 A basic example

(from “Writing your own Functions” in Hands-on Programming with R, by Garrett Grolemund)

Let’s say you want to simulate rolling 2 six-sided dice. Here’s some code that does that:

# define the possible outcomes
die <- 1:6

# a random pair of numbers from the `die` object
dice <- sample(die, size = 2, replace = TRUE)

# then add them up
sum(dice)

## [1] 8

To do this more than once, you would need to write the last two lines of code in again and again:

dice <- sample(die, size = 2, replace = TRUE)
sum(dice)

## [1] 6

dice <- sample(die, size = 2, replace = TRUE)
sum(dice)

## [1] 7

dice <- sample(die, size = 2, replace = TRUE)
sum(dice)

## [1] 9

Or you could write a function.

In this case, our code_statement is the three lines of code:

roll <- function() {
  die <- 1:6
  dice <- sample(die, size = 2, replace = TRUE)
  sum(dice)
}

Note that when we run this chunk, the function gets stored in our environment, and appears under Functions.

To run the function we just need the following:

roll()

## [1] 5

# and if we want to assign the result of the function to an object:
my_roll <- roll()

And then if we want to get the distribution of values from rolling a pair of dice 100 times, we can use the replicate() function:

replicate(100, roll())

##   [1]  5 10  6  4  9  6  9  7  7 11  6  7  7 11  6  8  7  8 11  6 10  9  8  9 10  8  7  4  4  7  8
##  [32]  9  8 11  5  9  8  7  7 10  2 11  5  5 12  6  5  6  8  3  9  8  6  2  5 11  6  7  6  6 10  6
##  [63]  4  3  9  9  5  8  8  8  2  5  5  7  6 10  2  3  7  4  9  8  8  6  8  6  8  5  3  9  6 10  7
##  [94]  5  9  7 11  3  7  7

15.5 Another example

Let’s write an equation that converts 50 pounds to the equivalent in kilograms.

1 kilogram (kg) is equal to 2.2046226218488 pounds (lb) (and conversely, 1 pound is equal to 0.45359237 kilograms).

$1 kg = 2.2046226218488 lb$

When we translate that formula into code that will convert the number of pounds into kilograms, it looks like this, where kg is “kilograms” and lb is “pounds”:

lb <- 50

kg <- lb / 2.2046226218488

kg

## [1] 22.67962

But what if we want to convert the weight of many items of different size? Time for a function.

We call the function with lb_to_kg_conversion(), and pass the value of pounds lb. The function returns the value of kg after calculating the value.

lb_to_kg_conversion <- function(lb) {
  kg <- lb / 2.2046226218488
  return(kg)
}

Let’s test it with our weight of 50 pounds:

lb_to_kg_conversion(lb = 50)

## [1] 22.67962

Now we can use that function to convert any number of weights. Let’s make a tibble with 10 weights, where weight_lb is each weight in pounds.

# the `set.seed()` function ensures that our pseudo-random number generation
# always returns the same values
set.seed(1)

my_weights <- tibble(
  weight_name = letters[1:10],    # the letters from a to j
  weight_lb = round(runif(n=10, min=1, max=20), 3)
    )      # a random sequence of numbers between 1 and 20

my_weights

## # A tibble: 10 × 2
##    weight_name weight_lb
##    <chr>           <dbl>
##  1 a                6.04
##  2 b                8.07
##  3 c               11.9 
##  4 d               18.3 
##  5 e                4.83
##  6 f               18.1 
##  7 g               18.9 
##  8 h               13.6 
##  9 i               13.0 
## 10 j                2.17

What is the weight of those items in kilograms? I can apply my function…

my_weights |>
  mutate(weight_kg = lb_to_kg_conversion(lb = weight_lb))

## # A tibble: 10 × 3
##    weight_name weight_lb weight_kg
##    <chr>           <dbl>     <dbl>
##  1 a                6.04     2.74 
##  2 b                8.07     3.66 
##  3 c               11.9      5.39 
##  4 d               18.3      8.28 
##  5 e                4.83     2.19 
##  6 f               18.1      8.20 
##  7 g               18.9      8.60 
##  8 h               13.6      6.15 
##  9 i               13.0      5.88 
## 10 j                2.17     0.986

That’s good. But what if we want our result to be rounded to 2 decimal places?

We could rewrite our function to add that additional line of code:

note that in the 2nd line of the function’s code, we are rounding kg before returning it.

lb_to_kg_conversion <- function(lb) {
  kg <- lb / 2.2046226218488
  kg <- round(kg, 2)
  return(kg)
}

Now re-run the mutate():

my_weights |>
  mutate(weight_kg = lb_to_kg_conversion(lb = weight_lb))

## # A tibble: 10 × 3
##    weight_name weight_lb weight_kg
##    <chr>           <dbl>     <dbl>
##  1 a                6.04      2.74
##  2 b                8.07      3.66
##  3 c               11.9       5.39
##  4 d               18.3       8.28
##  5 e                4.83      2.19
##  6 f               18.1       8.2 
##  7 g               18.9       8.6 
##  8 h               13.6       6.15
##  9 i               13.0       5.88
## 10 j                2.17      0.99

What if we want want to have some flexibility in our rounding? We can add a second argument to our function call—rnd—which will be then applied in round()

lb_to_kg_conversion <- function(lb, rnd) {
  kg <- lb / 2.2046226218488
  kg <- round(kg, rnd)
  return(kg)
}

And rerun with the area to 3 decimals:

my_weights |>
  mutate(weight_kg = lb_to_kg_conversion(lb = weight_lb, rnd = 3))

## # A tibble: 10 × 3
##    weight_name weight_lb weight_kg
##    <chr>           <dbl>     <dbl>
##  1 a                6.04     2.74 
##  2 b                8.07     3.66 
##  3 c               11.9      5.39 
##  4 d               18.3      8.28 
##  5 e                4.83     2.19 
##  6 f               18.1      8.20 
##  7 g               18.9      8.60 
##  8 h               13.6      6.15 
##  9 i               13.0      5.88 
## 10 j                2.17     0.986

15.5.1 Setting default values

What if we usually want to round to 2, but want to have some flexibility? We can assign the default value of 2 to the rounding argument. If rnd doesn’t get specified, it will be 2…but we can over-ride that with a specific value assignment.

lb_to_kg_conversion <- function(lb, rnd = 2) {
  kg <- lb * 2.2046226218488
  kg <- round(kg, rnd)
  return(kg)
}

With no specification:

note that we don’t have to be explicit in telling the function how to use the variable weight_lb—it assumes that since it is the first argument passed, it is lb. And because we don’t pass a second argument, the function will use the default value we’ve specified, in this case “2”.

my_weights |>
  mutate(weight_kg = lb_to_kg_conversion(weight_lb))

## # A tibble: 10 × 3
##    weight_name weight_lb weight_kg
##    <chr>           <dbl>     <dbl>
##  1 a                6.04     13.3 
##  2 b                8.07     17.8 
##  3 c               11.9      26.2 
##  4 d               18.3      40.2 
##  5 e                4.83     10.6 
##  6 f               18.1      39.8 
##  7 g               18.9      41.8 
##  8 h               13.6      29.9 
##  9 i               13.0      28.6 
## 10 j                2.17      4.79

Or override with rounding to 3 decimals:

my_weights |>
  mutate(weight_kg = lb_to_kg_conversion(weight_lb, 3))

## # A tibble: 10 × 3
##    weight_name weight_lb weight_kg
##    <chr>           <dbl>     <dbl>
##  1 a                6.04     13.3 
##  2 b                8.07     17.8 
##  3 c               11.9      26.2 
##  4 d               18.3      40.2 
##  5 e                4.83     10.7 
##  6 f               18.1      39.8 
##  7 g               18.9      41.8 
##  8 h               13.6      29.9 
##  9 i               13.0      28.6 
## 10 j                2.17      4.79

15.6 Exercises

15.6.1 Temperature

Write a function that converts temperature from degrees Celsius (C) to Fahrenheit (F), using the formula

$\ F = (C * 9/5) + 32$

Now, write and test a function to calculate the temperature in Fahrenheit, where it is automatically rounded to 1 decimal place.

Solution

The function for conversion from pounds to kilograms is pasted below:

# example
lb_to_kg_conversion <- function(lb, rnd = 1) {
  kg <- lb / 2.2046226218488
  kg <- round(kg, rnd)
  return(kg)
}

Now let’s edit the code above, but to work with the formula for temperature:

# solution
Cel_to_Fah_conversion <- function(Cel, rnd = 2) {
  Fah <- (Cel * 9/5) + 32
  Fah <- round(Fah, rnd)
  return(Fah)
}

Now, generate a tibble with the name my_temperatures:

set.seed(42)

my_temperatures <- tibble(
  temp_name = letters[1:10], 
  temp_Cel = round(runif(10, min = 1, max = 20), 2))

my_temperatures

## # A tibble: 10 × 2
##    temp_name temp_Cel
##    <chr>        <dbl>
##  1 a            18.4 
##  2 b            18.8 
##  3 c             6.44
##  4 d            16.8 
##  5 e            13.2 
##  6 f            10.9 
##  7 g            15   
##  8 h             3.56
##  9 i            13.5 
## 10 j            14.4

And use the function to convert each temperature to Fahrenheit in the tibble:

my_temperatures |> 
  mutate(temp_Fah = Cel_to_Fah_conversion(temp_Cel))

## # A tibble: 10 × 3
##    temp_name temp_Cel temp_Fah
##    <chr>        <dbl>    <dbl>
##  1 a            18.4      65.1
##  2 b            18.8      65.8
##  3 c             6.44     43.6
##  4 d            16.8      62.2
##  5 e            13.2      55.7
##  6 f            10.9      51.6
##  7 g            15        59  
##  8 h             3.56     38.4
##  9 i            13.5      56.3
## 10 j            14.4      57.9

Now round to the nearest integer (zero decimals).

my_temperatures |> 
  mutate(temp_Fah = Cel_to_Fah_conversion(temp_Cel, rnd = 0))

## # A tibble: 10 × 3
##    temp_name temp_Cel temp_Fah
##    <chr>        <dbl>    <dbl>
##  1 a            18.4        65
##  2 b            18.8        66
##  3 c             6.44       44
##  4 d            16.8        62
##  5 e            13.2        56
##  6 f            10.9        52
##  7 g            15          59
##  8 h             3.56       38
##  9 i            13.5        56
## 10 j            14.4        58

15.6.2 Multiple dice rolls

The dice rolling function we wrote above assumes we are always rolling 2 dice. Modify the function so that the default is 2, but that the number of dice can be varied.

Solution

In this solution, an argument number_of_dice is added to the function(), and specified as “2”. Then within the sample() function, the size = is specified as number_of_dice.

roll2 <- function(number_of_dice = 2) {
  die <- 1:6
  dice <- sample(die, size = number_of_dice, replace = TRUE)
  sum(dice)
}

Testing our function, first relying on the default of 2:

roll2()

## [1] 5

Then specifying 3 dice:

roll2(number_of_dice = 3)

## [1] 15

15.7 Readings and reference

“Functions” in R for Data Science, 2nd ed. by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund

“Writing your own Functions” in Hands-on Programming with R, by Garrett Grolemund

“Functions”, from Introduction to the R Language, Berkley, Biostatistics 140.776

“Creating Functions”, from Programming with R, Software Carpentry

“Writing simple functions” at Environmental Computing

“Mathematics in R Markdown” by R Pruim (2016-10-19)

-30-

15.8 Setup

This chunk of R code loads the packages that we will be using.

library(tidyverse)
library(gapminder)
library(modelr)
library(datarium)

Assignment 2 - week 3 - exploratory data analysis {#assignment3}

16 Modeling