Chapter 7 R and RStudio - You Should Know
7.1 R Project
Use R Project for manage your specific workspace. It will save your code, data, and history.
7.2 Apply Family for Iteration
In R, the apply
family of functions (including apply
, lapply
, sapply
, tapply
, and others) are commonly used for iteration because they provide a concise and efficient way to apply a function to elements of a data structure, such as a matrix, data frame, list, or vector.
Some advantages of using apply
family of functions :
- Concise syntax
- Flexibility : flexible in term of input and output used. We will discuss it on the details of each function.
- Efficiency : The apply family functions are optimized for performance, often outperforming explicit loops in terms of speed and memory usage, especially for large datasets.
Details of each functions :
7.2.1 apply(X, MARGIN, FUN)
- Input: data frame or matrix
- Output: vector
- Parameters used:
- X: data frame or matrix
- MARGIN: a value of 1 or 2 that defines how the function will be executed
- 1: computation is performed on rows
- 2: computation is performed on columns
- FUN: the function applied to elements of X
Here’s an example code using apply():
# Define the data will be used
<- matrix(c(1:16), 4)) (m
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
# Compute mean across rows
apply(m, MARGIN = 1, FUN = mean)
## [1] 7 8 9 10
# Using apply() with a custom built-in function
apply(m, 1, function(x){max(x) - min(x)})
## [1] 12 12 12 12
7.2.2 lapply(X, FUN)
- Input: list or dataframe
- Output: list with the same length as input list / number of columns in the dataframe input
- Parameters used:
- X: data frame or list
- FUN: the function applied to elements of X
Here’s an example code using lapply():
# Define the data will be used
<- list(a=c(1:16),b=c(2:17), c=c(3:18))) (m
## $a
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
##
## $b
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
##
## $c
## [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# Compute mean
lapply(m, FUN = mean)
## $a
## [1] 8.5
##
## $b
## [1] 9.5
##
## $c
## [1] 10.5
# Using lapply() with a custom built-in function
lapply(m, function(x){max(x) - min(x)})
## $a
## [1] 15
##
## $b
## [1] 15
##
## $c
## [1] 15
7.2.3 sapply(X, FUN)
- Input: list or dataframe
- Output: vector with the same length as input list / number of columns in the dataframe input
- Parameters used:
- X: data frame or list
- FUN: the function applied to elements of X
Here’s an example code using sapply():
# Define the data will be used
<- list(a=c(1:16),b=c(2:17), c=c(3:18))) (m
## $a
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
##
## $b
## [1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
##
## $c
## [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# Compute mean
sapply(m, FUN = mean)
## a b c
## 8.5 9.5 10.5
# Using lapply() with a custom built-in function
sapply(m, function(x){max(x) - min(x)})
## a b c
## 15 15 15
7.2.4 tapply(X, INDEX, FUN)
- Input: vector and factor/category used in the calculation
- Output: vector with the length of category
- Parameters used:
- X: data frame or list
- INDEX: factor / category
- FUN: the function applied to elements of X
Here’s an example code using tapply():
# Define the data will be used
<- data.frame(category = c(rep("a",6),rep("b",10)), x=c(1:16))) (m
## category x
## 1 a 1
## 2 a 2
## 3 a 3
## 4 a 4
## 5 a 5
## 6 a 6
## 7 b 7
## 8 b 8
## 9 b 9
## 10 b 10
## 11 b 11
## 12 b 12
## 13 b 13
## 14 b 14
## 15 b 15
## 16 b 16
# Compute mean
tapply(m$x, m$category, mean)
## a b
## 3.5 11.5
# Using lapply() with a custom built-in function
tapply(m$x, m$category, function(x){max(x) - min(x)})
## a b
## 5 9
7.2.5 mapply(FUN, PARAMETER)
- Note: multivariate form of
sapply()
- Parameters used:
- FUN: the function applied to elements of X
- PARAMETER: parameter used in the function
Here’s an example code using mapply():
## replicating using the function rep with parameters 1:5 and 5:1
mapply(rep, 1:5, 5:1)
## [[1]]
## [1] 1 1 1 1 1
##
## [[2]]
## [1] 2 2 2 2
##
## [[3]]
## [1] 3 3 3
##
## [[4]]
## [1] 4 4
##
## [[5]]
## [1] 5
## generating normal data with parameters n=10, mean=1:10
## there will be 10 vectors of generated data, with mean = 1,2,3, .. 10, and deviation standard = 1
mapply(rnorm, 10, 1:10, 1)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,] 0.1681139 0.6584667 3.2807328 5.067042 3.448307 5.211605 7.842531
## [2,] 1.1080385 0.3515221 5.4544776 2.989423 4.316984 5.565511 6.252687
## [3,] -0.9060228 1.2841643 2.9825100 5.669833 4.343945 7.932494 7.840108
## [4,] 3.8985318 2.2734002 3.3420715 4.211750 5.986707 6.206896 6.486647
## [5,] 0.3848278 1.3286518 2.0723383 2.889647 4.766958 6.599815 6.495262
## [6,] -0.1232239 2.0615973 0.9929737 3.477535 4.469560 6.852267 7.522713
## [7,] 0.3686483 1.5798993 1.9524407 3.529765 5.071735 7.713393 7.610122
## [8,] 0.7177891 0.4707883 2.8152675 4.502570 5.339510 6.572835 5.871889
## [9,] 1.4219710 1.4346221 3.4233990 1.960275 6.124506 6.837030 5.860640
## [10,] 0.9581958 2.1544112 3.1208591 3.255114 4.997201 7.192779 7.730445
## [,8] [,9] [,10]
## [1,] 7.095180 8.580953 10.947258
## [2,] 8.150664 8.271066 9.022041
## [3,] 8.447255 8.183949 10.850818
## [4,] 8.257643 8.712349 9.212028
## [5,] 8.637640 9.390878 9.127796
## [6,] 7.804666 9.752961 10.136676
## [7,] 9.280642 8.698218 11.067564
## [8,] 7.663498 9.587273 10.710194
## [9,] 8.184668 8.873128 9.471393
## [10,] 7.328451 9.200917 10.930344