8 Functions

Control flow statements are most often used in connection with functions. We should consider writing a function whenever we’ve copied and pasted a block of code more than twice.

We can think about functions as tools to help us achieve certain goals. In addition to calling the built-in functions, we can define our own functions, assign them a name, and then call them just like using the built-in functions.

This chapter explains how functions work in R.

8.1 Creating and calling functions

A function is an object in R that takes some input objects (arguments) and returns an output object.

To create a function, use the function keyword, followed by a list of arguments and the function body. When there are a series of statements, curly braces should be used around the function body.

function(argument1, ...., argumentN) {
  expression1
  ...
  expressionrM
}

If the function has a single statement, there is no need for any braces around the body.

function(argument1, ...., argumentN) expression

To call a function, use the function name followed by parenthesis.

my_function <- function(x, y, z) {c(x + 1, y + 2, z + 3)}
my_function(1, 2, 3)
## [1] 2 4 6

Here we have defined a function called my_function. This function has three arguments, x, y, and z. When we call the function, 1 is added to x, 2 is added to y; and 3 is added to z. If we pass values to these three arguments, and let x be 1, y be 2, and z be 3, x, y, and z each grows by 1 to be 2, 4 and 6.


We can type the name of a function to see the code that runs when we call it.

my_function
## function(x, y, z) {c(x + 1, y + 2, z + 3)}
## <environment: 0x7fc1e0e4e018>

8.2 Arguments

When we call a function in R, there are different ways to specify the arguments.

When arguments are given in the named form name = object, they may be given in any order. For instance, in the example we discussed above, argument x doesn’t have to be in the first position, and y doesn’t have to be in the second position, et cetera.

my_function(y = 2, z = 3, x = 1)
## [1] 2 4 6

If we don’t name the arguments, then R will match them based on the default positions of the arguments. In this case, R will take it that 1 is for x, 2 is for y, and 3 is for z.

my_function(1, 2, 3)
## [1] 2 4 6

Besides, we may also specify named arguments after the positional arguments.

my_function(1, 2, z = 3)
## [1] 2 4 6

However, it’s a good practice to use exact argument names. This removes ambiguity and makes code more readable.


R functions can take default values.

Below we set x to be 1, y to be 2, and z to be 3. If we don’t pass anything to the function, it will use the default values.

my_function <- function(x = 1, y = 2, z = 3) {c(x + 1, y + 1, z + 1)}
my_function()
## [1] 2 3 4
my_function(x = 3, y = 4)
## [1] 4 5 4

Exercise 1

Write an R program to calculate a dog’s age in dog’s years. For the first two years, a dog year is equal to 10.5 human years. After that, each dog year equals 4 human years.

A dog’s age in human years is 15. What is the dog’s age in dog’s years?


First of all, think about what will be the input and what will be the output? The input is human years. The output is dog years. Therefore, if we write a function, the argument is human years, and the returned value is dog years.

Let’s translate this process into R code.

  1. First, we define a function called age. The argument is human.
age <- function(human) {

}
  1. Now we want to write some expressions to calculate the result, which can be interpreted by if ... else ... statements. When a human year is smaller or equal to 2, the dog year is the human year multiplied by 10.5. When a human year is larger than two, the dog year becomes the human year multiplied by 4, after we take the first 2 human years out.
age <- function(human) {
  if (human <= 2) {
    dog <- human * 10.5
  } else {
    dog <- 2 * 10.5 + (human - 2) * 4
  }
}
  1. Finally, we output the result, denoted by dog, using print().
age <- function(human) {
  if (human <= 2) {
    dog <- human * 10.5
  } else {
    dog <- 2 * 10.5 + (human - 2) * 4
  }
  print(dog)
}

Now we can call the function age to compute the dog’s age in dog’s years. The input human is 15.

age <- function(human) {
  if (human <= 2) {
    dog <- human * 10.5
  } else {
    dog <- 2 * 10.5 + (human - 2) * 4
  }
  print(dog)
}

age(15)
## [1] 73

Exercise 2

Write an R function that calculates the number of uppercase letters and lowercase letters in a string.

Hint: Use ?"regular expression".


First, think about the output that we want to get. That will be the counts of individual substrings (i.e. letters) of an input character vector.

Then, think about how we will access each element of the input character in order to perform calculations on them. For that we need a function to split a character vector into letters.

Next, we need to find a way to count the number of letters in the input text that match a pattern (i.e. uppercase or lowercase). This can be achieved using a for loop with a counter that takes an initial value and grows as it traverses the vector in each iteration of a loop.

Finally, to match the patterns, as given in the hint, we need the help of regular expressions to identify uppercase and lowercase letters.

Below let’s look at how we may approach this problem using R.


  1. To start with, we create a function called function_letters. The input is string.
function_letters <- function(string) {

}
  1. The function strsplit() will help us split a string into substrings (i.e. letters). Note that strsplit() returns a list; therefore, to access each element of the list, we need to use list indexing methods.
function_letters <- function(string) {
  string <- strsplit(string, split = "")[[1]]

}

Here’s a quick preview of what strsplit() returns:

strsplit("string", split = "")
## [[1]]
## [1] "s" "t" "r" "i" "n" "g"
  1. The next step is to set up two counters that go through a loop to collect counts of letters that meet a condition. One for the uppercase letters, and the other for the lowercase letters.
function_letters <- function(string) {
  string <- strsplit(string, split = "")[[1]]
  c1 <- 0
  c2 <- 0
  for (i in string){
    if ( )){
      c1 <- c1 + 1  
    } else if ( ){
      c2 <- c2 + 1
    }
  }
}
  1. Then, we use grepl() to match the patterns in the string. The function evaluates if a string matches a specified pattern. It will return TRUE if it does, and FALSE if not.
function_letters <- function(string) {
  string <- strsplit(string, split = "")[[1]]
  c1 <- 0
  c2 <- 0
  for (i in string){
    if (grepl(" ", i)){
      c1 <- c1 + 1  
    } else if (grepl(" ", i)){
      c2 <- c2 + 1
    }
  }
}
  1. Next, we use regular expressions to match the patterns of uppercase and lowercase letters. If a case is true, the counter will grow by one.
function_letters <- function(string) {
  string <- strsplit(string, split = "")[[1]]
  c1 <- 0
  c2 <- 0
  for (i in string){
    if (grepl("[A-Z]", i)){
      c1 <- c1 + 1  
    } else if (grepl("[a-z]", i)){
      c2 <- c2 + 1
    }
  }
}
  1. The last step is to print the results.
function_letters <- function(string) {
  string <- strsplit(string, split = "")[[1]]
  c1 <- 0
  c2 <- 0
  for (i in string){
    if (grepl("[A-Z]", i)){
      c1 <- c1 + 1  
    } else if (grepl("[a-z]", i)){
      c2 <- c2 + 1
    }
  }
  print(paste("upper case: ", c1))
  print(paste("lower case: ", c2)) 
}

Let’s call this function and test it with some strings. We’ll create two strings for this test.

string1 <- "Mr. and Mrs. Dursley of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much."

string2 <- "One hot spring evening, just as the sun was going down, two men appeared at Patriarch's Ponds."

string1 is from the book Harry Potter.

function_letters(string1)
## [1] "upper case:  5"
## [1] "lower case:  90"

string2 is from the book The Master and Margarita.

function_letters(string2)
## [1] "upper case:  3"
## [1] "lower case:  71"

8.3 Returning values

R returns the last evaluated expression in the function.

We can use the return() function to explicitly specify the value to be returned by the function. But it is common to omit the return statement.

All functions will return a value; if there is no return() statement, it will return nothing.

my_function <- function(x,y,z) {return(c(x + 1, y + 1, z + 1))}
my_function(1, 2, 3)
## [1] 2 3 4

8.4 stop()

stop() stops execution of the current expression and executes an error action.

Below we have created a deposit function, where users can input an amount into their bank accounts. The initial balance is 100, defined by total. If users input an amount smaller than 0, which doesn’t make sense, then users will receive a message “Deposits must be positive” from the function stop(). If the amount is larger than 0, then that amount will be added to the balance. A message showing the deposit amount and the balance is printed using print().

deposit <- function(amount) {
  total <- 100
  if(amount <= 0) stop("Deposits must be positive!\n")
  total <- total + amount
  print(paste(amount, "deposited. Your balance is", total))
}

Let’s test the function with an amount of 6.

deposit(6)
## [1] "6 deposited. Your balance is 106"

If we try to deposit -6 into our account by deposit(-6), the function will return the error message Error in deposit(-6) : Deposits must be positive!. Here, even before the amount can be added to the total balance, the function is terminated by stop() because the condition if(amount <= 0) is triggered.

8.5 Recursive functions

In R, a function can call itself, which allows us to loop through data to reach a result. The process of executing a recursive function is called recursion.

Example:

tri <- function(k) {
  if (k > 0) {
    result <- k + tri(k - 1)
    return(result)
  } else {
    result <- 0
    return(result)
  }
}
tri(6)
## [1] 21

In this example, the input is k. If k is larger than 0, the result is k + tri(k - 1). It will go to k - 1 to search the answer; then k - 2, et cetera, until it reaches the end, which will be 0.

Notes:

  1. Recursion

To start, we have an input of 6. Because it’s larger than 0, we will have 6+tri(5). Here the function will need to find out the value of tri(5), which leads to finding the value of tri(4) in 5+tri(4). And this pattern goes on. When we get to tri(0), the result is equal to 0, and the program terminates.

k result
6 6+tri(5)
5 5+tri(4)
4 4+tri(3)
3 3+tri(2)
2 2+tri(1)
1 1+tri(0)
0 0
  1. Returned values

If we add each part up, we get the final result for tri(6), as shown below.

result output
tri(0) 0
1+tri(0) 1
2+tri(1) 2+1=3
3+tri(2) 3+2+1=6
4+tri(3) 4+3+2+1=10
5+tri(4) 5+4+3+2+1=15
6+tri(5) 6+5+4+3+2+1=21

Exercise 3

A sequence {An : n \(\ge\) 1} satisfies that A1 = 1, A2 = 1, and An+2 = An+1 + An + 2n for n \(\ge\) 1. Write a function in R to find the value of A36.

Hint: We will hope that the recursion reaches a base case. If the function goes on to make recursive calls forever, the program never terminates. This results in an infinite recursion, which we should try to avoid.


Step 1

If we try to solve the problem by hand, we will find that it never reaches the base case.

An = An+2 - An+1 - 2n

If we let n be 36 in the equation above, then we will get:

A36 = A38 - A37 - 2*36

We will then have to find the values for A38 and A37. A37, for instance, will be:

A37 = A39 - A38 - 2*37

As you may have noticed, the sequence will grow forever. That’s not what we want. We want the process of finding the results to be able to stop at a point.


Step 2

We will then rewrite the equation.

Ak = Ak-2 + Ak-1 + 2*(k-2)

Note that n \(\ge\) 1. Therefore, we can write three conditions: k \(\gt\) 2, k \(\le\) 2, and k \(\lt\) 1, if you would like.


Step 3

To translate the formula above into R code, one solution is a recursive function implementation.

A <- function(k) {
  if (k < 1) { stop("n<1") }
  if (k <= 2) { return(1) }
  if (k > 2) {
    a <- A(k - 1) + A(k - 2) + 2*(k - 2)
    return(a)
  }
}

A(36)

As shown below, there are many repeated calculations on subtasks. Therefore, it is slow.

k result
36 A(35) + A(34) + 2*(36 - 2)
35 A(34) + A(33) + 2*(35 - 2)
34 A(33) + A(32) + 2*(34 - 2)
4 A(3) + A(2) + 2*(4 - 2)
3 A(2) + A(1) + 2*(3 - 2)
2 1
1 1

Step 4

A faster solution can be a for loop.

A <- function(k){
  a <- numeric(k)
  if (k < 1) { stop("n<1") }
  if (k <= 2) { return(1) }
  else {
    a[1] <- 1
    a[2] <- 1
    for (i in 3:k) {
     a[i] <- a[i-2] + a[i-1] + 2*(i-2)
    }
    return(a[k])
  }
}

A(36)

Exercise 4

We define \({n \choose m}\) as the coefficient of binomial expansion for \((1 + x)^n\) \((m = 0, 1, 2, ..., n)\). Hence, \({n \choose m}\) is the coefficient of \(x^m\) in \((1 + x)^n\), where \(m = 0, 1, 2, ..., n\). We know from the property of binomial coefficients that \({n \choose m} + {n \choose m + 1} = {n + 1 \choose m + 1}\) for all \(n\) and all \(0 \le m \le n−1\).

Write a function in R to calculate \({88 \choose 44}\).


A binomial coefficient is any \(a\) in the expanded version of this term \(ax^by^c\).

In fact, R has a built-in function choose(), which does exactly \({n \choose m}\). But if we want to write our own functions to solve this problem, what would we do? There can be a couple of ways to approach this problem.


Solution 1: a recursive function

Once again, the recursion should reach a base case so that the program terminates at a point. Therefore, we want to rewrite the equation \({n \choose m} + {n \choose m + 1} = {n + 1 \choose m + 1}\) \((n \ge 0, 0 \le m \le n−1)\) so that the program does not become an infinite recursion.

As a first step, we rewrite the equation to \({n - 1 \choose m - 1} + {n - 1 \choose m} = {n \choose m}\) \((n \ge 1, 1 \le m \le n)\). We also need to specify what will happen when \(m \gt n\) and when \(n = 0\).

n = 0, m = 0  -> 1
n = 0, m > 0  -> 0
m > n  -> 0
m = 1  -> 1
m = n  -> 1

This can be further summarized as below.

m > n  -> 0
m = 1  -> 1
m = n  -> 1

Now let’s put everything into a recursive function using R.

BC <- function(n, m){
    if (m > n) { return(0) } 
    if (m == 1 | m == n) { return(1) } 
  
    else {
      result <- BC(n - 1, m - 1) + BC(n - 1, m)
      return(result)
    }
}

If we try to write down the process, we will find that there are many repeated steps, just as what we saw in Exercise 3. If we run BC(88, 44), we will find that it is extremely slow.


Solution 2: a nested for loop

Instead of using a top-down approach to search the result until the program reaches the end, we can use a nested for loop to create a matrix to store the result from each step in \({n - 1 \choose m - 1} + {n - 1 \choose m} = {n \choose m}\) \((n \ge 1, 1 \le m \le n)\), starting from the initial values of \(n\) and \(m\).

  1. Let’s create a matrix with rows and columns represented by \(n\) and \(m\).

Initially, the rows were from 0 to \(n\). The columns were from 0 to \(m\).

   0 1 2 3 ... m
  --------------
0 | 
1 | 
2 |   
3 |     
. |    
m |           
. |
n |

In this matrix, when \(m = 0\), we get 1.

   0 1 2 3 ... m
  --------------
0 |1 
1 |1 
2 |1   
3 |1     
. |...    
m |1           
. |...
n |1

When \(m = n\), we also get 1.

   0 1 2 3 ... m
  --------------
0 |1 
1 |1 1
2 |1   1
3 |1     1
. |...     ... ...
m |1           1
. |...
n |1

When \(m \gt n\), we get 0.

   0 1 2 3 ... m
  --------------
0 |1 0 0 0 ... 0
1 |1 1 0 0 ... 0
2 |1   1 0 ... 0
3 |1     1 ... 0
. |...     ... ...
m |1           1
. |...
n |1

What we will be working on in the following steps is the area within what is marked by ~~~ and |.

    0 1 2 3 ... m
   --------------
0   1 0 0 0 ... 0
   ~~~~~~~~~~~~~~~
1  |1 1 0 0 ... 0
2  |1   1 0 ... 0
3  |1     1 ... 0
.  |...     ... ...
m  |1           1
.  |...
n  |1

We may create a function like the one below to represent the matrix. i and j are row and column indexes.

BC <- function(n, m){
  C <- matrix(0, nrow = n, ncol = m)

    for (i in 1 : n){
    for (j in 1 : m){
      C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
    }
  }
}

The initial values in this matrix has been set to 0, which is the area for \(m \gt n\). So we no longer need to deal with this condition explicitly in the function.

  1. It looks okay. However, note that in R the first index starts from 1, not 0.

But in this matrix, the first index of row and column starts from 0, not 1. Therefore, in the function, we need to fix the matrix indexes so that i and j starts from 1.

    j   1 2 3 4 ... m+1
i       0 1 2 3 ... m
       --------------
1   0   1 0 0 0 ... 0
       ~~~~~~~~~~~~~~
2   1  |1 1 0 0 ... 0
3   2  |1   1 0 ... 0
4   3  |1     1 ... 0
.   .  |...     ... ...
m+1 m  |1           1
.   .  |...
n+1 n  |1

The row i now starts from 1 all the way to \(n + 1\). The column j starts from 1 to \(m + 1\).

Now let’s fix the R code.

BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)
  for (i in 1 : (n + 1)){
    for (j in 1 : (m + 1)){
      C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
    }
  }
}
  1. We have already taken care of 0s. Now let’s deal with those 1s in the first column and on the diagonal. That’s when j = 1 or j = i (\(m = n\)).

In those cases, the values of C would be 1: C[i, j] <- 1. Otherwise, the values will be generated from the equation \({n \choose m} = {n - 1 \choose m - 1} + {n - 1 \choose m}\), which is C[i, j] <- C[i - 1, j - 1] + C[i - 1, j] in R.

BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)

  for (i in 1 : (n + 1)){
    for (j in 1 : (m + 1)){
      if (j == 1 | j == i){
        C[i, j] <- 1
      } else {
        C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
      }
    }
  }
}
  1. Remember that \(m \lt n\). When \(m \gt n\), \({n \choose m}\) is 0; j stops at \(m\) in the inner loop, and the program goes back to the outer loop for the next iteration.

Let’s fix that part of the code.

BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)
  
  for (i in 1 : (n + 1)){
    for (j in 1 : (min(i, m) + 1)){
      if (j == 1 | j == i){
        C[i, j] <- 1
      } else {
        C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
      }
    }
  }
}
  1. Remember that i does not start from 1, but 2. Let’s fix the loop variable so that it starts at the right place.
BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)
  
  for (i in 2 : (n + 1)){
    for (j in 1 : (min(i, m) + 1)){
      if (j == 1 | j == i) {
        C[i, j] <- 1
      } else {
        C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
      }
    }
  }
}
  1. If we want to make sure that the matrix gets things right, we can do some tests and use print() to print the output matrix.
BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)
  
  for (i in 2 : (n + 1)){
    for (j in 1 : (min(i, m) + 1)){
     if (j == 1 | j == i) {
        C[i, j] <- 1
      } else {
        C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
      }
    }
  }
  print(C)
}

BC(8, 4)
##       [,1] [,2] [,3] [,4] [,5]
##  [1,]    0    0    0    0    0
##  [2,]    1    1    0    0    0
##  [3,]    1    2    1    0    0
##  [4,]    1    3    3    1    0
##  [5,]    1    4    6    4    1
##  [6,]    1    5   10   10    5
##  [7,]    1    6   15   20   15
##  [8,]    1    7   21   35   35
##  [9,]    1    8   28   56   70

The result is 70.

  1. Wait, C[1,1] is not correct. It is 0 but should be 1.

Let’s fix it by simply assigning 1 to the value in the first column and the first row of the matrix.

BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)
  C[1, 1] <- 1
  
  for (i in 2 : (n + 1)){
    for (j in 1 : (min(i, m) + 1)){
     if (j == 1 | j == i) {
        C[i, j] <- 1
      } else {
        C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
      }
    }
  }
  print(C)
}
  1. Now we’re ready to ask the program to return the result using return() by specifying the row and column indexes.
BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)
  C[1, 1] <- 1
  
  for (i in 2 : (n + 1)){
    for (j in 1 : (min(i, m) + 1)){
     if (j == 1 | j == i) {
        C[i, j] <- 1
      } else {
        C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
      }
    }
  }
  return(C[n + 1, m + 1])
}

BC(8, 4)
## [1] 70
  1. All set? If we test it with numbers like BC(0, 0) or BC(0, 10), we’ll get an error message saying Error in[<-(tmp, i, j, value = 1) : subscript out of bounds.

In short, it means we are trying to access the matrix out of its boundary, and it has to do with the temporary variables i and j. In i in 2 : (n + 1), when n = 0, n + 1 = 1. This translates to “from 2 to 1”, which doesn’t make sense.

We may fix this problem by overwriting how this part of results are returned, and placing two conditions here explicitly.

BC <- function(n, m){
  C <- matrix(0, nrow = n + 1, ncol = m + 1)
  
  # n = 0, m = 0
  C[1, 1] <- 1
  
  if (n == 0 & m == 0) { return(1) }
  if (n == 0 & m > 0) { return(0) }
  
  # n >= 1
  for (i in 2 : (n + 1)){
    for (j in 1 : (min(i, m) + 1)){
     if (j == 1 | j == i) {
        C[i, j] <- 1
      } else {
        C[i, j] <- C[i - 1, j - 1] + C[i - 1, j]
      }
    }
  }
  return(C[n + 1, m + 1])
}

BC(0, 0)
## [1] 1
BC(0, 10)
## [1] 0
  1. To evaluate the function BC(), we can compare its output with choose() using all.equal().
all.equal(BC(88,44), choose(88,44))
## [1] TRUE

Alternatively, if we simply use the double equal signs to evaluate the result, it could be problematic.

BC(88,44) == choose(88,44)
## [1] FALSE

It can be hard to believe, such as in the example below, but that has to do with how R stores numbers.

sqrt(2)^2 == 2
## [1] FALSE

We can get around that problem by using the function all.equal().

all.equal(sqrt(2)^2, 2)
## [1] TRUE

Exercise 5

Write a function in R to find the greatest common divisor of (12306, 32148).

The greatest common divisor (GCD), also called the greatest common factor, of two numbers is the largest number that divides them both.

There are many approaches to solving this problem.


Solution 1: a naive approach

To start with, we may just compute all the factors of two numbers, and determine the largest common factor.

To translate that to R, first of all, we write a function get_divisor() to get all the divisors or factors of a number.

Next, for the two numbers, we write a function get_intersect() to get their common factors.

Finally, we decide which one is the largest factor using max().

This is a non-recursive solution to this problem.

get_divisor <- function(n) {
  divisor <- numeric(0)
  for(i in 1:n) {
    if((n %% i) == 0) {
    divisor <- c(divisor, i)
    }
  }
  return(divisor)
}

get_intersect <- intersect(get_divisor(12306), get_divisor(32148))
gcd <- max(get_intersect)
gcd
## [1] 6

Solution 2: recursive function

The recursive nature of this problem is not immediately visible to us, if we’re not familiar with Euclidean algorithm. It is a technique of quickly finding the GCD of two integers recursively.

Below is the math part: 11

We define A in quotient remainder form:

A = B⋅Q + R

This formula lets us reduce a larger problem to a smaller problem.

Next, if A or B is 0, then B or A is returned. These two properties are summarized below.

  • GCD(A,0) = A
  • GCD(0,B) = B

If B≠0, then GCD(A,B) = GCD(B,R). The program will repeat with the new A and B. B becomes new A. This is where the recursive nature is more apparent to us.

This is the Euclidean algorithm that we’re going to use to compute the GCD of A and B.

  • If A = 0 then GCD(A,B) = B, since the GCD(0,B) = B, and we can stop.
  • If B = 0 then GCD(A,B) = A, since the GCD(A,0) = A, and we can stop.
  • Find GCD(B,R) using the Euclidean Algorithm since GCD(A,B) = GCD(B,R)

To translate the math to R code:

We have two inputs a and b. If b is 0, then a is returned. Otherwise, GCD of a and b is GCD of b and the remainder a %% b.

gcd <- function (a, b) {
  print(sprintf("*a* is %s, *b* is %s", a, b))
  if (b == 0){
    print(sprintf("b is 0, stop recursion, a is the gcd: %s", a))
    return (a)
  }
  print(sprintf("Recursing with new a = b and new b = a %% b..."))
  gcd(b, a %% b)
}

If we print each step of this computation, you may see more clearly how this program works. This is where print() is going to be extremely helpful.

print(gcd(32148, 12306))
## [1] "*a* is 32148, *b* is 12306"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 12306, *b* is 7536"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 7536, *b* is 4770"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 4770, *b* is 2766"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 2766, *b* is 2004"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 2004, *b* is 762"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 762, *b* is 480"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 480, *b* is 282"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 282, *b* is 198"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 198, *b* is 84"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 84, *b* is 30"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 30, *b* is 24"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 24, *b* is 6"
## [1] "Recursing with new a = b and new b = a % b..."
## [1] "*a* is 6, *b* is 0"
## [1] "b is 0, stop recursion, a is the gcd: 6"
## [1] 6

The program keeps running until there is no remainder from the previous division. The recursion stops here and we get the result, which is 6.


Solution 3: vectorized version

What’s more interesting for R is that we can write a vectorized version 12 of the recursive version using ifelse().

gcd <- function(x, y) {
  r <- x %% y
  return(ifelse(r, gcd(y, r), y))
}

gcd(12306, 32148)
## [1] 6

In this function, r represents the remainder. If the remainder of x and y is not 0, then we get GCD of y and the remainder r until y reaches 0.

8.6 Nested functions

We can call a function within another function. This concept has already been demonstrated through the use of recursive functions, wherein a function calls itself.

We’ll see one more example below.


Exercise 6

Write a function in R to find the least common multiple of (12306, 32148).

The least common multiple (LCM) is also called the least common divisor. For two integers a and b, denoted LCM(a,b), the LCM is the smallest positive integer that is evenly divisible by both a and b.

There are many solutions to finding the LCM. Here we are going to leverage the function gcd() that we just created to compute the greatest common divisor.

LCM(p,q) = (p×q)/GCF(p,q)

lcm <- function(p, q) {
  p * q / gcd(p, q)
}

In this case, gcd() is nested in lcm().

lcm(12306, 32148)
## [1] 65935548

We’ll also see passing functions to other functions with the help of apply family functions in the chapter Apply Family Functions.

stdError <- function(x) {sqrt(var(x)/length(x))}
tapply(data, label, stdError)

8.7 Scope of variables

A variable’s scope is the set of places from which you can see the variable. R will try to find variables in the current environment, and if it doesn’t find them it will look in the parent environment, and then that environment’s parent, and so on until it reaches the global environment.

Below we show two quick examples.

Example 1:

y <- 10
f <- function(x) {
  y <- 2
  y * 2 + g(x)
}

g <- function(x) {
  x * y
}

f(3)
## [1] 34

First, what is the value of y? Is it 2, as defined in the scope of f, the function? Or 10 outside of the function?

Here, y is 2 because R tries to find y first in f. Unless it doesn’t find any y that has been defined in f, it goes one level up. Therefore, f(3) = 2 * 2 + g(3) = 4 + g(3).

Then we need to get the result of g(3). In g(3), what is the value of y? It is 10. Because y is not locally defined in g, R will look in the parent environment of g in the global environment, where it finds that y takes the value of 10. Therefore, g(3) = 3 * 10 = 30.

To wrap up, f(3) = 4 + g(3) = 4 + 30 = 34.


Example 2:

f <- function(x) {
  y <- 2
  g <- function(x) {
    x * y
  }
  y * 2 + g(x)
}

f(3)
## [1] 10

What has changed in the second example above is how y has been defined. R searches for the value of y in the environment of f, and it finds that y is 2. Hence we have g(3) = 2 * 3 = 6. For f(3), adding 6 to 4, we get 10.