2.3 The pipe operator

2.3.1 One way to write code

So far, we’ve learned (among other things) how to read a .csv file and assign it to an object, how to transform variables with the mutate function, how to drop variables (columns) from our dataset with the select function, how to rename variables with the rename function, and how to drop observations (rows) from our dataset with the filter function:

airbnb <- read_csv("tomslee_airbnb_belgium_1454_2017-07-14.csv")
airbnb <- mutate(airbnb, room_id_F = factor(room_id), host_id_F = factor(host_id), overall_satisfaction_100 = overall_satisfaction * 20)
airbnb <- select(airbnb, -country, -survey_id)
airbnb <- rename(airbnb, country = city, city = borough)
airbnb <- filter(airbnb, city %in% c("Brussel","Antwerpen","Gent","Charleroi","Liege","Brugge","Namur","Leuven","Mons","Aalst")) 

When reading this code, we see that on each line we overwrite the airbnb object. There’s nothing fundamentally wrong with this way of writing, but we’re repeating elements of code because the last four lines consist of an assigment (airbnb <-) and of functions (mutate, select, rename, filter) that have the same first argument (the airbnb object created on the previous line).

2.3.2 A better way to write code

There’s a more elegant way to write code. It involves an operator called the pipe. It allows us to re-write our usual sequence of operations:

airbnb <- read_csv("tomslee_airbnb_belgium_1454_2017-07-14.csv")
airbnb <- mutate(airbnb, room_id_F = factor(room_id), host_id_F = factor(host_id), overall_satisfaction_100 = overall_satisfaction * 20)
airbnb <- select(airbnb, -country, -survey_id)
airbnb <- rename(airbnb, country = city, city = borough)
airbnb <- filter(airbnb, city %in% c("Brussel","Antwerpen","Gent","Charleroi","Liege","Brugge","Namur","Leuven","Mons","Aalst")) 

as:

airbnb <- read_csv("tomslee_airbnb_belgium_1454_2017-07-14.csv") %>% 
  mutate(room_id_F = factor(room_id), host_id_F = factor(host_id), overall_satisfaction_100 = overall_satisfaction * 20) %>% 
  select(-country, -survey_id) %>% 
  rename(country = city, city = borough) %>% 
  filter(city %in% c("Brussel","Antwerpen","Gent","Charleroi","Liege","Brugge","Namur","Leuven","Mons","Aalst")) 

This can be read in a natural way: “read the csv file, then mutate, then select, then rename, then filter.” We start off by reading a .csv file. Instead of storing it into an intermediate object, we provide it as the first argument for the mutate function using the pipe operator: %>%. It’s a good idea to learn the shortcut for %>% by heart: Ctrl + Shift + M. The mutate function takes the same arguments as above (create room_id_F which should be a factorization of room_id, etc), but we now don’t need to provide the first argument (which dataset do we want mutate to operate on). The first argument would be the data frame resulting from reading the .csv file on the previous line, but this is automatically passed on as first argument to mutate by the pipe operator. The pipe operator takes the output of what’s on the left side of the pipe and provides this as the first argument to what is on the right side of the pipe (i.e., the next line of code).

After creating new variables with mutate, we drop some variables with select. Again, the select function takes the same arguments as above (drop country and survey_id), but we don’t provide the first argument (which dataset should we drop variables from), because it is already provided by the pipe on the previous line. We continue in the same manner and rename some variables with rename and drop some observations with filter.

Writing code with the pipe operator exploits the similar structure of mutate, select, rename, filter, which are the most important functions for data manipulation. The first argument for all these functions is the data frame on which it should operate. This first argument can now be left out, because it is provided by the pipe operator. In the remainder of this tutorial, we will write code using the pipe operator because it considerably improves the readability of our code.