Chapter 2 Advantages of dplyr suite
2.1 Advantages
intuitive naming conventions
doesn’t let you merge identifiers of different data classes (this can cause issues with MRNs)
##   PT_ID Age Sex     first_line_tx
## 1     1  69   M           Surgery
## 2     2  54   F           Surgery
## 3     4  64   M Surgery+ Adjuvant
## Error: Can't join on `x$PT_ID` x `y$PT_ID` because of incompatible types.
## i `x$PT_ID` is of type <double>>.
## i `y$PT_ID` is of type <character>>.
2.2 Disadvantages
- Doesn’t allow merging if one of your source datasets has multiple columns with the same name
 
patient <- data.frame(patient, 
                 c("Andrews","Benson","Cho","Doherty"),
                 c("","","","")
                 )
names(patient)[4:5] <- "Doctor"
print(patient)##   PT_ID Age Sex  Doctor Doctor
## 1     1  69   M Andrews       
## 2     2  54   F  Benson       
## 3     3  70   F     Cho       
## 4     4  64   M Doherty
## Error: Input columns in `x` must be unique.
## x Problem with `Doctor`.
However, we can clean this up using the janitor package:
patient <- clean_names(patient) # note: this will coerce your variable names to lower case
full_join(patient, treatment, by=c("pt_id"="PT_ID"))##   pt_id age  sex  doctor doctor_2     first_line_tx
## 1     1  69    M Andrews                    Surgery
## 2     2  54    F  Benson                    Surgery
## 3     3  70    F     Cho                       <NA>
## 4     4  64    M Doherty          Surgery+ Adjuvant
## 5     5  NA <NA>    <NA>     <NA>         Radiation