2.6 The apply
family of functions
One of the biggest limitation of R is that it is slow in performing cycles. For this reason, one should aim at avoiding as much as possible to use of loops.
There are various functions which are designed to help you in avoiding these loops and they are in the family of so called apply
functions. There are many of these but we will only see two here.
2.6.1 The function apply
Consider the following code.
<- matrix(c(1:9), ncol=3 , nrow = 3)
x <- c()
y for (i in 1:3){
<- sum(x[i,])
y[i]
} y
## [1] 12 15 18
The code first defines a matrix x
and an empty vector y
(recall that this is bad practice, but for this example it does not matter). Then there is a for
cycle which assigns to the i-th entry of y
the sum of the entries of the i-th row of x
. So the vector y
includes the row-totals.
For this simple example the for
cycle is extremely quick, but this is just to illustrate how we can replace it using the apply
function.
apply(x, 1, sum)
## [1] 12 15 18
Let’s look at the above code. The first input of apply
is the object we want to operate upon, in this case the matrix x
. The second input specifies if the operation has to act over the rows of the matrix (input equal to 1) or over the columns (input equal to 2). The third input is the operation we want to use, in this case sum
.
Beside being faster, the above code is also a lot more compact than using a for loop.
The following example computes the mean of each column of x
.
apply(x, 2, mean)
## [1] 2 5 8
2.6.2 The function sapply
Consider again our function new.function
which computes the sum of the squared of a number x
with another number y
.
<- function(x,y){ x^2 + y} new.function
Suppose that we want to compute such a sum for all numbers x
from 1 to 10. Suppose that y
is chosen as 2. We can achieve this with a for
cycle as follows.
<- 1:10
x <- c()
z for (i in 1:10){
<- new.function(x[i],2)
z[i]
} z
## [1] 3 6 11 18 27 38 51 66 83 102
The function sapply
can be used for this specific purpose.
<- 1:10
x sapply(x,new.function, y=2)
## [1] 3 6 11 18 27 38 51 66 83 102
The first argument of sapply
is a vector of values we want to use as input of a function. The second argument is the function we want to apply multiple times. If the function has more than one input we can then specify what their value is, in this specific case y=2
.
Notice that a function can also be defined within sapply
.
<- 1:10
x sapply(x, function(i) i^2 + 2)
## [1] 3 6 11 18 27 38 51 66 83 102
So we defined the vector x
and we want to apply the function defined within sapply
multiple times: once for each entry in the vector x
.