2.6 The apply family of functions

One of the biggest limitation of R is that it is slow in performing cycles. For this reason, one should aim at avoiding as much as possible to use of loops.

There are various functions which are designed to help you in avoiding these loops and they are in the family of so called apply functions. There are many of these but we will only see two here.

2.6.1 The function apply

Consider the following code.

x <- matrix(c(1:9), ncol=3 , nrow = 3)
y <- c()
for (i in 1:3){
  y[i] <- sum(x[i,])
}
y
## [1] 12 15 18

The code first defines a matrix x and an empty vector y (recall that this is bad practice, but for this example it does not matter). Then there is a for cycle which assigns to the i-th entry of y the sum of the entries of the i-th row of x. So the vector y includes the row-totals.

For this simple example the for cycle is extremely quick, but this is just to illustrate how we can replace it using the apply function.

apply(x, 1, sum)
## [1] 12 15 18

Let’s look at the above code. The first input of apply is the object we want to operate upon, in this case the matrix x. The second input specifies if the operation has to act over the rows of the matrix (input equal to 1) or over the columns (input equal to 2). The third input is the operation we want to use, in this case sum.

Beside being faster, the above code is also a lot more compact than using a for loop.

The following example computes the mean of each column of x.

apply(x, 2, mean)
## [1] 2 5 8

2.6.2 The function sapply

Consider again our function new.function which computes the sum of the squared of a number x with another number y.

new.function <- function(x,y){ x^2 + y}

Suppose that we want to compute such a sum for all numbers x from 1 to 10. Suppose that y is chosen as 2. We can achieve this with a for cycle as follows.

x <- 1:10
z <- c()
for (i in 1:10){
  z[i] <- new.function(x[i],2)
}
z
##  [1]   3   6  11  18  27  38  51  66  83 102

The function sapply can be used for this specific purpose.

x <- 1:10
sapply(x,new.function, y=2)
##  [1]   3   6  11  18  27  38  51  66  83 102

The first argument of sapply is a vector of values we want to use as input of a function. The second argument is the function we want to apply multiple times. If the function has more than one input we can then specify what their value is, in this specific case y=2.

Notice that a function can also be defined within sapply.

x <- 1:10
sapply(x, function(i) i^2 + 2)
##  [1]   3   6  11  18  27  38  51  66  83 102

So we defined the vector x and we want to apply the function defined within sapply multiple times: once for each entry in the vector x.