Chapter 7 R and RStudio - You Should Know

7.1 R Project

Use R Project for manage your specific workspace. It will save your code, data, and history.

7.2 Apply Family for Iteration

In R, the apply family of functions (including apply, lapply, sapply, tapply, and others) are commonly used for iteration because they provide a concise and efficient way to apply a function to elements of a data structure, such as a matrix, data frame, list, or vector.

Some advantages of using apply family of functions :

  1. Concise syntax
  2. Flexibility : flexible in term of input and output used. We will discuss it on the details of each function.
  3. Efficiency : The apply family functions are optimized for performance, often outperforming explicit loops in terms of speed and memory usage, especially for large datasets.

Details of each functions :

7.2.1 apply(X, MARGIN, FUN)

  • Input: data frame or matrix
  • Output: vector
  • Parameters used:
    • X: data frame or matrix
    • MARGIN: a value of 1 or 2 that defines how the function will be executed
      • 1: computation is performed on rows
      • 2: computation is performed on columns
    • FUN: the function applied to elements of X

Here’s an example code using apply():

# Define the data will be used
(m <- matrix(c(1:16), 4))
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
# Compute mean across rows
apply(m, MARGIN = 1, FUN = mean)
## [1]  7  8  9 10
# Using apply() with a custom built-in function
apply(m, 1, function(x){max(x) - min(x)})
## [1] 12 12 12 12

7.2.2 lapply(X, FUN)

  • Input: list or dataframe
  • Output: list with the same length as input list / number of columns in the dataframe input
  • Parameters used:
    • X: data frame or list
    • FUN: the function applied to elements of X

Here’s an example code using lapply():

# Define the data will be used
(m <- list(a=c(1:16),b=c(2:17), c=c(3:18)))
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
## 
## $b
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## 
## $c
##  [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
# Compute mean
lapply(m, FUN = mean)
## $a
## [1] 8.5
## 
## $b
## [1] 9.5
## 
## $c
## [1] 10.5
# Using lapply() with a custom built-in function
lapply(m, function(x){max(x) - min(x)})
## $a
## [1] 15
## 
## $b
## [1] 15
## 
## $c
## [1] 15

7.2.3 sapply(X, FUN)

  • Input: list or dataframe
  • Output: vector with the same length as input list / number of columns in the dataframe input
  • Parameters used:
    • X: data frame or list
    • FUN: the function applied to elements of X

Here’s an example code using sapply():

# Define the data will be used
(m <- list(a=c(1:16),b=c(2:17), c=c(3:18)))
## $a
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
## 
## $b
##  [1]  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
## 
## $c
##  [1]  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18
# Compute mean
sapply(m, FUN = mean)
##    a    b    c 
##  8.5  9.5 10.5
# Using lapply() with a custom built-in function
sapply(m, function(x){max(x) - min(x)})
##  a  b  c 
## 15 15 15

7.2.4 tapply(X, INDEX, FUN)

  • Input: vector and factor/category used in the calculation
  • Output: vector with the length of category
  • Parameters used:
    • X: data frame or list
    • INDEX: factor / category
    • FUN: the function applied to elements of X

Here’s an example code using tapply():

# Define the data will be used
(m <- data.frame(category = c(rep("a",6),rep("b",10)), x=c(1:16)))
##    category  x
## 1         a  1
## 2         a  2
## 3         a  3
## 4         a  4
## 5         a  5
## 6         a  6
## 7         b  7
## 8         b  8
## 9         b  9
## 10        b 10
## 11        b 11
## 12        b 12
## 13        b 13
## 14        b 14
## 15        b 15
## 16        b 16
# Compute mean
tapply(m$x, m$category, mean)
##    a    b 
##  3.5 11.5
# Using lapply() with a custom built-in function
tapply(m$x, m$category, function(x){max(x) - min(x)})
## a b 
## 5 9

7.2.5 mapply(FUN, PARAMETER)

  • Note: multivariate form of sapply()
  • Parameters used:
    • FUN: the function applied to elements of X
    • PARAMETER: parameter used in the function

Here’s an example code using mapply():

## replicating using the function rep with parameters 1:5 and 5:1
mapply(rep, 1:5, 5:1)
## [[1]]
## [1] 1 1 1 1 1
## 
## [[2]]
## [1] 2 2 2 2
## 
## [[3]]
## [1] 3 3 3
## 
## [[4]]
## [1] 4 4
## 
## [[5]]
## [1] 5
## generating normal data with parameters n=10, mean=1:10 
## there will be 10 vectors of generated data, with mean =  1,2,3, .. 10, and deviation standard = 1
mapply(rnorm, 10, 1:10, 1)
##             [,1]      [,2]      [,3]     [,4]     [,5]     [,6]     [,7]
##  [1,]  0.1681139 0.6584667 3.2807328 5.067042 3.448307 5.211605 7.842531
##  [2,]  1.1080385 0.3515221 5.4544776 2.989423 4.316984 5.565511 6.252687
##  [3,] -0.9060228 1.2841643 2.9825100 5.669833 4.343945 7.932494 7.840108
##  [4,]  3.8985318 2.2734002 3.3420715 4.211750 5.986707 6.206896 6.486647
##  [5,]  0.3848278 1.3286518 2.0723383 2.889647 4.766958 6.599815 6.495262
##  [6,] -0.1232239 2.0615973 0.9929737 3.477535 4.469560 6.852267 7.522713
##  [7,]  0.3686483 1.5798993 1.9524407 3.529765 5.071735 7.713393 7.610122
##  [8,]  0.7177891 0.4707883 2.8152675 4.502570 5.339510 6.572835 5.871889
##  [9,]  1.4219710 1.4346221 3.4233990 1.960275 6.124506 6.837030 5.860640
## [10,]  0.9581958 2.1544112 3.1208591 3.255114 4.997201 7.192779 7.730445
##           [,8]     [,9]     [,10]
##  [1,] 7.095180 8.580953 10.947258
##  [2,] 8.150664 8.271066  9.022041
##  [3,] 8.447255 8.183949 10.850818
##  [4,] 8.257643 8.712349  9.212028
##  [5,] 8.637640 9.390878  9.127796
##  [6,] 7.804666 9.752961 10.136676
##  [7,] 9.280642 8.698218 11.067564
##  [8,] 7.663498 9.587273 10.710194
##  [9,] 8.184668 8.873128  9.471393
## [10,] 7.328451 9.200917 10.930344