13.14 anti_join(): Merging data frames

  • Classically merge() (see Quick R)
  • dplyr is much faster
  • inner_join(x, y, by = NULL, copy = FALSE, ...) # all intersecting observations
  • left_join(x, y, by = NULL, copy = FALSE, ...) # all observations from x
  • semi_join(x, y, by = NULL, copy = FALSE, ...) # all observations from x
  • anti_join(x, y, by = NULL, copy = FALSE, ...) # all in x that are not in y
    • x = first data set, y = second data set
    • by = “matchingvariable” oder by = c(var1, var2??)
    • ?join: Check out the corresponding helpfile


13.14.1 Example: Merging data frames

getwd()

# An example for merging


nrow(swiss)
swiss2 <- cbind(row.names(swiss), swiss)
names(swiss2)[1] <- "region"
nrow(swiss2)
View(swiss2)

# Generate 2 data frames each possess parts of the observations and the regions
# variable
# Q: What new datasets do I generate below?
swiss.a <- swiss2[1:8,1:3]
View(swiss.a)
swiss.b <- swiss2[c(1, 6:7, 12),c(1,4:5)]
View(swiss.b)

intersect(swiss.a$region, swiss.b$region) # check intersection, i.e. 
# which regions appear in both data sets?

# MERGING OF THE TWO DATA FRAMES
library(dplyr) # is the package installed?
?inner_join
swiss.inner <- inner_join(swiss.a, swiss.b, by ="region") 
View(swiss.a)
View(swiss.inner) # data set with observations that intersect across both data sets

swiss.left <- left_join(swiss.a, swiss.b, by ="region") 
View(swiss.left) # all observations in x = swiss.a

# TRY semi_join(), anti_join() FOR YOURSELF!


13.14.2 Exercise: Merging data frames

  1. Use the code in the previous example to create swiss2 and the objects swiss.a and swiss.b subsequently.
  2. Create a third object swiss.c that include the first 10 countries in the swiss data set and their values on the variables region, Catholic and Infant.Mortality.
  3. Merge/join swiss.c with swiss.a so that the merged dataset contains all the countries that are contained in at least one of the two data sets.


13.14.3 Solution: Merging data frames