Chapter 13 Apply Family Functions
Apply Family of functions consist of apply(), lapply(), tapply() and sapply()
13.1 apply()
apply function takes three arguments apply(X,Margin,FUN) where X is an array or matrix, Margin takes value of 1 or 2 where it implies whether to apply the function row wise or column wise and FUN shows what kind of function to apply such as sum, mean, median etc.
## Lets create a matrix M1 with 5 rows and 3 columns
M1 <- matrix(C<-(1:15),nrow=5)
M1
## [,1] [,2] [,3]
## [1,] 1 6 11
## [2,] 2 7 12
## [3,] 3 8 13
## [4,] 4 9 14
## [5,] 5 10 15
Lets apply “apply” function to fins sum of columns of matrix M1
M1_colsum<-apply(M1,2,sum)
M1_colsum
## [1] 15 40 65
We can also use apply function to find mean of all columns
M1_colmean<-apply(M1,2,mean)
M1_colmean
## [1] 3 8 13
Similarly we can use apply function to find maximum value of each column in R matrices.
M1_colmax<-apply(M1,2,max)
M1_colmax
## [1] 5 10 15
We can use apply function extensively while doing data analysis.
13.2 lapply
Lapply is applied for operations on a list of objects and returns a list object of same length.
Lets see application of lapply on a list:-
Names<-c("Manish","Saurabh", "Rahul","Krishna","Venkat")
Names_lower<-lapply(Names,tolower)
Names_lower
## [[1]]
## [1] "manish"
##
## [[2]]
## [1] "saurabh"
##
## [[3]]
## [1] "rahul"
##
## [[4]]
## [1] "krishna"
##
## [[5]]
## [1] "venkat"
Names_upper<-lapply(Names,toupper)
Names_upper
## [[1]]
## [1] "MANISH"
##
## [[2]]
## [1] "SAURABH"
##
## [[3]]
## [1] "RAHUL"
##
## [[4]]
## [1] "KRISHNA"
##
## [[5]]
## [1] "VENKAT"
13.3 sapply
sapply takes a list vector or dataframe as an input and returns the output in vector or matrix form Lets use sapply function in the previous example and check the result: -
Names_upper<-sapply(Names,toupper)
Names_upper
## Manish Saurabh Rahul Krishna Venkat
## "MANISH" "SAURABH" "RAHUL" "KRISHNA" "VENKAT"
As you can see from the output using sapply, the result is in a matrix form.
13.4 tapply
tapply() applies function or operation on subset of vector broken down by a given factor variable. Let’s load iris dataset and tapply on iris dataset.
data(iris)
summary(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100
## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300
## Median :5.800 Median :3.000 Median :4.350 Median :1.300
## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
## Species
## setosa :50
## versicolor:50
## virginica :50
##
##
##
Lets tapply to iris data.
tapply(iris$Sepal.Length,iris$Species,mean)
## setosa versicolor virginica
## 5.006 5.936 6.588
In our iris dataset we have three types of species and we want to calculate average Sepal Length for each of the Species.
Similarly, we can calculate median Sepal Length for the three types of species.
tapply(iris$Sepal.Length,iris$Species,median)
## setosa versicolor virginica
## 5.0 5.9 6.5