Chapter 13 Apply Family Functions

Apply Family of functions consist of apply(), lapply(), tapply() and sapply()

13.1 apply()

apply function takes three arguments apply(X,Margin,FUN) where X is an array or matrix, Margin takes value of 1 or 2 where it implies whether to apply the function row wise or column wise and FUN shows what kind of function to apply such as sum, mean, median etc.

## Lets create a matrix M1 with 5 rows and 3 columns
M1 <- matrix(C<-(1:15),nrow=5)
M1
##      [,1] [,2] [,3]
## [1,]    1    6   11
## [2,]    2    7   12
## [3,]    3    8   13
## [4,]    4    9   14
## [5,]    5   10   15

Lets apply “apply” function to fins sum of columns of matrix M1

M1_colsum<-apply(M1,2,sum)
M1_colsum
## [1] 15 40 65

We can also use apply function to find mean of all columns

M1_colmean<-apply(M1,2,mean)
M1_colmean
## [1]  3  8 13

Similarly we can use apply function to find maximum value of each column in R matrices.

M1_colmax<-apply(M1,2,max)
M1_colmax
## [1]  5 10 15

We can use apply function extensively while doing data analysis.

13.2 lapply

Lapply is applied for operations on a list of objects and returns a list object of same length.

Lets see application of lapply on a list:-

Names<-c("Manish","Saurabh", "Rahul","Krishna","Venkat")

Names_lower<-lapply(Names,tolower)
Names_lower
## [[1]]
## [1] "manish"
## 
## [[2]]
## [1] "saurabh"
## 
## [[3]]
## [1] "rahul"
## 
## [[4]]
## [1] "krishna"
## 
## [[5]]
## [1] "venkat"
Names_upper<-lapply(Names,toupper)
Names_upper
## [[1]]
## [1] "MANISH"
## 
## [[2]]
## [1] "SAURABH"
## 
## [[3]]
## [1] "RAHUL"
## 
## [[4]]
## [1] "KRISHNA"
## 
## [[5]]
## [1] "VENKAT"

13.3 sapply

sapply takes a list vector or dataframe as an input and returns the output in vector or matrix form Lets use sapply function in the previous example and check the result: -

Names_upper<-sapply(Names,toupper)
Names_upper
##    Manish   Saurabh     Rahul   Krishna    Venkat 
##  "MANISH" "SAURABH"   "RAHUL" "KRISHNA"  "VENKAT"

As you can see from the output using sapply, the result is in a matrix form.

13.4 tapply

tapply() applies function or operation on subset of vector broken down by a given factor variable. Let’s load iris dataset and tapply on iris dataset.

data(iris)
summary(iris)
##   Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
##  Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
##  1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
##  Median :5.800   Median :3.000   Median :4.350   Median :1.300  
##  Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
##  3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
##  Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
##        Species  
##  setosa    :50  
##  versicolor:50  
##  virginica :50  
##                 
##                 
## 

Lets tapply to iris data.

tapply(iris$Sepal.Length,iris$Species,mean)
##     setosa versicolor  virginica 
##      5.006      5.936      6.588

In our iris dataset we have three types of species and we want to calculate average Sepal Length for each of the Species.

Similarly, we can calculate median Sepal Length for the three types of species.

tapply(iris$Sepal.Length,iris$Species,median)
##     setosa versicolor  virginica 
##        5.0        5.9        6.5