Chapter 7 R and RStudio - You Should Know

7.1 R Project

Use R Project for manage your specific workspace. It will save your code, data, and history.

7.2 Apply Family for Iteration

In R, the apply family of functions (including apply, lapply, sapply, tapply, and others) are commonly used for iteration because they provide a concise and efficient way to apply a function to elements of a data structure, such as a matrix, data frame, list, or vector.

Some advantages of using apply family of functions :

Concise syntax
Flexibility : flexible in term of input and output used. We will discuss it on the details of each function.
Efficiency : The apply family functions are optimized for performance, often outperforming explicit loops in terms of speed and memory usage, especially for large datasets.

Details of each functions :

7.2.1 `apply(X, MARGIN, FUN)`

Input: data frame or matrix
Output: vector
Parameters used:
- X: data frame or matrix
- MARGIN: a value of 1 or 2 that defines how the function will be executed
  - 1: computation is performed on rows
  - 2: computation is performed on columns
- FUN: the function applied to elements of X

Here’s an example code using apply():

# Define the data will be used
(m <- matrix(c(1:16), 4))

##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16

# Compute mean across rows
apply(m, MARGIN = 1, FUN = mean)

## [1]  7  8  9 10

# Using apply() with a custom built-in function
apply(m, 1, function(x){max(x) - min(x)})

## [1] 12 12 12 12

7.2.2 `lapply(X, FUN)`

Input: list or dataframe
Output: list with the same length as input list / number of columns in the dataframe input
Parameters used:
- X: data frame or list
- FUN: the function applied to elements of X

Here’s an example code using lapply():

# Define the data will be used
(m <- list(a=c(1:16),b=c(2:17), c=c(3:18)))

## $a
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
## 
## $b
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## 
## $c
##  [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18

# Compute mean
lapply(m, FUN = mean)

## $a
## [1] 8.5
## 
## $b
## [1] 9.5
## 
## $c
## [1] 10.5

# Using lapply() with a custom built-in function
lapply(m, function(x){max(x) - min(x)})

## $a
## [1] 15
## 
## $b
## [1] 15
## 
## $c
## [1] 15

7.2.3 `sapply(X, FUN)`

Input: list or dataframe
Output: vector with the same length as input list / number of columns in the dataframe input
Parameters used:
- X: data frame or list
- FUN: the function applied to elements of X

Here’s an example code using sapply():

# Define the data will be used
(m <- list(a=c(1:16),b=c(2:17), c=c(3:18)))

## $a
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
## 
## $b
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## 
## $c
##  [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18

# Compute mean
sapply(m, FUN = mean)

##    a    b    c 
##  8.5  9.5 10.5

# Using lapply() with a custom built-in function
sapply(m, function(x){max(x) - min(x)})

##  a  b  c 
## 15 15 15

7.2.4 `tapply(X, INDEX, FUN)`

Input: vector and factor/category used in the calculation
Output: vector with the length of category
Parameters used:
- X: data frame or list
- INDEX: factor / category
- FUN: the function applied to elements of X

Here’s an example code using tapply():

# Define the data will be used
(m <- data.frame(category = c(rep("a",6),rep("b",10)), x=c(1:16)))

##    category  x
## 1         a  1
## 2         a  2
## 3         a  3
## 4         a  4
## 5         a  5
## 6         a  6
## 7         b  7
## 8         b  8
## 9         b  9
## 10        b 10
## 11        b 11
## 12        b 12
## 13        b 13
## 14        b 14
## 15        b 15
## 16        b 16

# Compute mean
tapply(m$x, m$category, mean)

##    a    b 
##  3.5 11.5

# Using lapply() with a custom built-in function
tapply(m$x, m$category, function(x){max(x) - min(x)})

## a b 
## 5 9

7.2.5 `mapply(FUN, PARAMETER)`

Note: multivariate form of sapply()
Parameters used:
- FUN: the function applied to elements of X
- PARAMETER: parameter used in the function

Here’s an example code using mapply():

## replicating using the function rep with parameters 1:5 and 5:1
mapply(rep, 1:5, 5:1)

## [[1]]
## [1] 1 1 1 1 1
## 
## [[2]]
## [1] 2 2 2 2
## 
## [[3]]
## [1] 3 3 3
## 
## [[4]]
## [1] 4 4
## 
## [[5]]
## [1] 5

## generating normal data with parameters n=10, mean=1:10 
## there will be 10 vectors of generated data, with mean =  1,2,3, .. 10, and deviation standard = 1
mapply(rnorm, 10, 1:10, 1)

##             [,1]      [,2]      [,3]     [,4]     [,5]     [,6]     [,7]
##  [1,]  0.1681139 0.6584667 3.2807328 5.067042 3.448307 5.211605 7.842531
##  [2,]  1.1080385 0.3515221 5.4544776 2.989423 4.316984 5.565511 6.252687
##  [3,] -0.9060228 1.2841643 2.9825100 5.669833 4.343945 7.932494 7.840108
##  [4,]  3.8985318 2.2734002 3.3420715 4.211750 5.986707 6.206896 6.486647
##  [5,]  0.3848278 1.3286518 2.0723383 2.889647 4.766958 6.599815 6.495262
##  [6,] -0.1232239 2.0615973 0.9929737 3.477535 4.469560 6.852267 7.522713
##  [7,]  0.3686483 1.5798993 1.9524407 3.529765 5.071735 7.713393 7.610122
##  [8,]  0.7177891 0.4707883 2.8152675 4.502570 5.339510 6.572835 5.871889
##  [9,]  1.4219710 1.4346221 3.4233990 1.960275 6.124506 6.837030 5.860640
## [10,]  0.9581958 2.1544112 3.1208591 3.255114 4.997201 7.192779 7.730445
##           [,8]     [,9]     [,10]
##  [1,] 7.095180 8.580953 10.947258
##  [2,] 8.150664 8.271066  9.022041
##  [3,] 8.447255 8.183949 10.850818
##  [4,] 8.257643 8.712349  9.212028
##  [5,] 8.637640 9.390878  9.127796
##  [6,] 7.804666 9.752961 10.136676
##  [7,] 9.280642 8.698218 11.067564
##  [8,] 7.663498 9.587273 10.710194
##  [9,] 8.184668 8.873128  9.471393
## [10,] 7.328451 9.200917 10.930344

Chapter 7 R and RStudio - You Should Know

7.1 R Project

7.2 Apply Family for Iteration

7.2.1 apply(X, MARGIN, FUN)

7.2.2 lapply(X, FUN)

7.2.3 sapply(X, FUN)

7.2.4 tapply(X, INDEX, FUN)

7.2.5 mapply(FUN, PARAMETER)

7.2.1 `apply(X, MARGIN, FUN)`

7.2.2 `lapply(X, FUN)`

7.2.3 `sapply(X, FUN)`

7.2.4 `tapply(X, INDEX, FUN)`

7.2.5 `mapply(FUN, PARAMETER)`