Merging Discussion
2021-06-03
Chapter 1 dplyr Merge Functions
The dplyr package has a suite of four functions for merging two datasets together. The syntax is borrowed from SQL joins.
1.1 Sample datasets, for illustration:
##  PT_ID Age Sex
##      1  69   M
##      2  54   F
##      3  70   F
##      4  64   M
##  PT_ID     first_line_tx
##      1           Surgery
##      2           Surgery
##      4 Surgery+ Adjuvant
##      5         Radiation
##  PT_ID Time_since_dx toxicity_grade
##      1           151              2
##      1            46              1
##      1           262              3
##      2            89              1
##      2           277              4
##      3           192              2
##      4           193              1
##      4           195              1
##      5           124              3
##      5            84              1
##  PT_ID Time_since_dx   HUS
##      4            47 1.000
##      4           193 0.933
##      4           195 0.933
##      4           361 0.877
##      5            17 0.654
##      5            84 0.654
##      5           218 0.933
##      5           273 0.741
1.2 Full join: every observation from dataset A and B (the union of both sets)
##   PT_ID Age  Sex     first_line_tx
## 1     1  69    M           Surgery
## 2     2  54    F           Surgery
## 3     3  70    F              <NA>
## 4     4  64    M Surgery+ Adjuvant
## 5     5  NA <NA>         Radiation
1.3 Inner join: only observations in both A and B (the intersection of both sets)
##   PT_ID Age Sex     first_line_tx
## 1     1  69   M           Surgery
## 2     2  54   F           Surgery
## 3     4  64   M Surgery+ Adjuvant
1.4 Left join: all observations from dataset A
##   PT_ID Age Sex     first_line_tx
## 1     1  69   M           Surgery
## 2     2  54   F           Surgery
## 3     3  70   F              <NA>
## 4     4  64   M Surgery+ Adjuvant