Chapter 12 Functions

Hello! In this tutorial, we’ll be talking about how to create functions.

Thus far, you have worked with many functions in packages and base R. But in some cases, the functions you want to use are ones that you create yourself. Learning how to write a function is the basic building block of software development in R. After all, if you think about the packages that we use, they’re really just specialized collections of related functions (this is true of tidyverse packages, but also the other packages we will be learning this semester).

Let’s begin by loading tidyverse and janitor.

options(scipen=999) #prevents scientific notation in R
library(tidyverse)
library(janitor)

For this tutorial, we will again use the Barbenheimer dataset. Let’s bring that into our environment.

data_fb <- read_csv("data/crowdtangle_barbenheimer_2023.csv") |> clean_names()
## Warning: One or more parsing issues, call `problems()` on your data frame for details, e.g.:
##   dat <- vroom(...)
##   problems(dat)

If you recall from our class, functions take the following structure.

name_of_function <- function(argument){
  #stuff you want to code
  #sometimes a return() function
}

Functions can be useful for automating processes or steps that you do repeatedly. For example, if you are working with Facebook data, there may be a set of things you like to do to clean your dataset. For example, perhaps you want to remove all the URLs in the text (since they are sometimes saved in a separate column), and you might also want to lowercase your text. We can build a function called tw_cleaner() to do this:

text_cleaner <- function(post) {
  post |>
    str_replace_all(" https?://.*", "") |>
    tolower()
  }

Notice how I don’t have a return() line? Because I’m using pipes, the function just returns the result of the last line (in this case, the end of the pipe). This pipe has three steps: removing the URLs, removing encoding issues, removing some html codes (\r and \n), and converting the string into a utf-8 encoding (which is useful for some functions in natural language processing).

data_fb$message[99]
## [1] "2 dias para Barbenheimer. Nos vemos en ig: https://instagram.com/soypablo_robles?igshid=MjEwN2IyYWYwYw=="
text_cleaner(data_fb$message[99])
## [1] "2 dias para barbenheimer. nos vemos en ig:"

12.1 functions + apply

Looks like it’s working! But how do we use this function across the whole data frame, then? We’ll learn more about that in the next chapter, but here’s a preview, using lapply(), which applies a function across a list (l in lapply is for list).

data_fb$message_clean <- lapply(data_fb$message, text_cleaner)
head(data_fb$message_clean)
## [[1]]
## [1] "check out our emo spotify playlist!"
## 
## [[2]]
## [1] "<U+00A1>quiere ser ken! <U+0001F855> cillian murphy (oppenheimer) coment<U+00F3> que esta abierto a interpretar a ken en una futura secuela de 'barbie' <U+0001F855>"
## 
## [[3]]
## [1] "the true barbenheimer experience"
## 
## [[4]]
## [1] "<U+00A1>barbenheimer es real! en un cine de estados unidos durante una funci<U+00F3>n de 'oppenheimer' hubo un fallo en el proyector y en los <U+00FA>ltimos 20 minutos la mitad de la pantalla se volvi<U+00F3> rosa <U+0001F481> el barbie x oppenheimer se volvi<U+00F3> canon."
## 
## [[5]]
## [1] "barbenheimer <U+0E2A><U+0E21><U+0E31><U+0E22><U+0E15><U+0E2D><U+0E19><U+0E2D><U+0E22><U+0E39><U+0E48> gotham <U+0001F4A2>"
## 
## [[6]]
## [1] "barbenheimer 2023. <U+0001F8A2><U+0001F0A2> <U+0001F0A2>: @justralphy"

Note that to include an optional argument in an apply() function, you just need to include it at the end. This is because lapply() will pass on whatever functions it doesn’t recognize to the next function (in this case, tw_cleaner_v2()). If it’s an argument your custom function doesn’t recognize, it will spit out an error.

12.2 Other Resources