2.3 The pipe operator
2.3.1 One way to write code
So far, we’ve learned (among other things) how to read a .csv
file and assign it to an object, how to transform variables with the mutate
function, how to drop variables (columns) from our dataset with the select
function, how to rename variables with the rename
function, and how to drop observations (rows) from our dataset with the filter
function:
airbnb <- read_csv("tomslee_airbnb_belgium_1454_2017-07-14.csv")
airbnb <- mutate(airbnb, room_id_F = factor(room_id), host_id_F = factor(host_id), overall_satisfaction_100 = overall_satisfaction * 20)
airbnb <- select(airbnb, -country, -survey_id)
airbnb <- rename(airbnb, country = city, city = borough)
airbnb <- filter(airbnb, city %in% c("Brussel","Antwerpen","Gent","Charleroi","Liege","Brugge","Namur","Leuven","Mons","Aalst"))
When reading this code, we see that on each line we overwrite the airbnb
object. There’s nothing fundamentally wrong with this way of writing, but we’re repeating elements of code because the last four lines consist of an assigment (airbnb <-
) and of functions (mutate
, select
, rename
, filter
) that have the same first argument (the airbnb
object created on the previous line).
2.3.2 A better way to write code
There’s a more elegant way to write code. It involves an operator called the pipe. It allows us to re-write our usual sequence of operations:
airbnb <- read_csv("tomslee_airbnb_belgium_1454_2017-07-14.csv")
airbnb <- mutate(airbnb, room_id_F = factor(room_id), host_id_F = factor(host_id), overall_satisfaction_100 = overall_satisfaction * 20)
airbnb <- select(airbnb, -country, -survey_id)
airbnb <- rename(airbnb, country = city, city = borough)
airbnb <- filter(airbnb, city %in% c("Brussel","Antwerpen","Gent","Charleroi","Liege","Brugge","Namur","Leuven","Mons","Aalst"))
as:
airbnb <- read_csv("tomslee_airbnb_belgium_1454_2017-07-14.csv") %>%
mutate(room_id_F = factor(room_id), host_id_F = factor(host_id), overall_satisfaction_100 = overall_satisfaction * 20) %>%
select(-country, -survey_id) %>%
rename(country = city, city = borough) %>%
filter(city %in% c("Brussel","Antwerpen","Gent","Charleroi","Liege","Brugge","Namur","Leuven","Mons","Aalst"))
This can be read in a natural way: “read the csv file, then mutate, then select, then rename, then filter”. We start off by reading a .csv
file. Instead of storing it into an intermediate object, we provide it as the first argument for the mutate
function using the pipe operator: %>%
. It’s a good idea to learn the shortcut for %>%
by heart: Ctrl + Shift + M. The mutate
function takes the same arguments as above (create room_id_F
which should be a factorization of room_id
, etc), but we now don’t need to provide the first argument (which dataset do we want mutate
to operate on). The first argument would be the data frame resulting from reading the .csv
file on the previous line, but this is automatically passed on as first argument to mutate
by the pipe operator. The pipe operator takes the output of what’s on the left side of the pipe and provides this as the first argument to what is on the right side of the pipe (i.e., the next line of code).
After creating new variables with mutate
, we drop some variables with select
. Again, the select
function takes the same arguments as above (drop country
and survey_id
), but we don’t provide the first argument (which dataset should we drop variables from), because it is already provided by the pipe on the previous line. We continue in the same manner and rename some variables with rename
and drop some observations with filter
.
Writing code with the pipe operator exploits the similar structure of mutate
, select
, rename
, filter
, which are the most important functions for data manipulation. The first argument for all these functions is the data frame on which it should operate. This first argument can now be left out, because it is provided by the pipe operator. In the remainder of this tutorial, we will write code using the pipe operator because it considerably improves the readability of our code.