Merging Discussion
2021-06-03
Chapter 1 dplyr Merge Functions
The dplyr package has a suite of four functions for merging two datasets together. The syntax is borrowed from SQL joins.
1.1 Sample datasets, for illustration:
## PT_ID Age Sex
## 1 69 M
## 2 54 F
## 3 70 F
## 4 64 M
## PT_ID first_line_tx
## 1 Surgery
## 2 Surgery
## 4 Surgery+ Adjuvant
## 5 Radiation
## PT_ID Time_since_dx toxicity_grade
## 1 151 2
## 1 46 1
## 1 262 3
## 2 89 1
## 2 277 4
## 3 192 2
## 4 193 1
## 4 195 1
## 5 124 3
## 5 84 1
## PT_ID Time_since_dx HUS
## 4 47 1.000
## 4 193 0.933
## 4 195 0.933
## 4 361 0.877
## 5 17 0.654
## 5 84 0.654
## 5 218 0.933
## 5 273 0.741
1.2 Full join: every observation from dataset A and B (the union of both sets)
## PT_ID Age Sex first_line_tx
## 1 1 69 M Surgery
## 2 2 54 F Surgery
## 3 3 70 F <NA>
## 4 4 64 M Surgery+ Adjuvant
## 5 5 NA <NA> Radiation
1.3 Inner join: only observations in both A and B (the intersection of both sets)
## PT_ID Age Sex first_line_tx
## 1 1 69 M Surgery
## 2 2 54 F Surgery
## 3 4 64 M Surgery+ Adjuvant
1.4 Left join: all observations from dataset A
## PT_ID Age Sex first_line_tx
## 1 1 69 M Surgery
## 2 2 54 F Surgery
## 3 3 70 F <NA>
## 4 4 64 M Surgery+ Adjuvant