Chapter 2 Advantages of dplyr suite
2.1 Advantages
intuitive naming conventions
doesn’t let you merge identifiers of different data classes (this can cause issues with MRNs)
## PT_ID Age Sex first_line_tx
## 1 1 69 M Surgery
## 2 2 54 F Surgery
## 3 4 64 M Surgery+ Adjuvant
## Error: Can't join on `x$PT_ID` x `y$PT_ID` because of incompatible types.
## i `x$PT_ID` is of type <double>>.
## i `y$PT_ID` is of type <character>>.
2.2 Disadvantages
- Doesn’t allow merging if one of your source datasets has multiple columns with the same name
patient <- data.frame(patient,
c("Andrews","Benson","Cho","Doherty"),
c("","","","")
)
names(patient)[4:5] <- "Doctor"
print(patient)
## PT_ID Age Sex Doctor Doctor
## 1 1 69 M Andrews
## 2 2 54 F Benson
## 3 3 70 F Cho
## 4 4 64 M Doherty
## Error: Input columns in `x` must be unique.
## x Problem with `Doctor`.
However, we can clean this up using the janitor package:
patient <- clean_names(patient) # note: this will coerce your variable names to lower case
full_join(patient, treatment, by=c("pt_id"="PT_ID"))
## pt_id age sex doctor doctor_2 first_line_tx
## 1 1 69 M Andrews Surgery
## 2 2 54 F Benson Surgery
## 3 3 70 F Cho <NA>
## 4 4 64 M Doherty Surgery+ Adjuvant
## 5 5 NA <NA> <NA> <NA> Radiation