Chapter 7 Functions

Hello! In this tutorial, we’ll be talking about how to create functions.

Thus far, you have worked with many functions in packages and base R. But in some cases, the functions you want to use are ones that you create yourself. Learning how to write a function is the basic building block of software development in R. After all, if you think about the packages that we use, they’re really just specialized collections of related functions (this is true of tidyverse packages, but also the other packages we will be learning this semester).

Let’s begin by loading some packages. In addition to tidyverse, we’re going to use one function from stringi, a package for dealing with strings (use ?stringi for more).

options(scipen=999) #prevents scientific notation in R
library(tidyverse)
#install.packages("stringi")
library(stringi)

For this tutorial, we will again use the #academictwitter dataset. Let’s bring that into our environment.

tweet_df <- read_csv("data/rtweet_academictwitter_20210115.csv")
#str(tweet_df)

If you recall from our class, functions take the following structure.

name_of_function <- function(argument){
  #stuff you want to code
  #sometimes a return() function
}

Functions can be useful for automating processes or steps that you do repeatedly. For example, if you are working with Twitter data, there may be a set of things you like to do to clean your dataset. For example, perhaps you want to remove all the URLs in the text (since they are sometimes saved in a separate column) and various encoding notes. We can build a function called tw_cleaner() to do this:

tw_cleaner <- function(tweet) {
  tweet %>% 
    str_replace_all("https://t.co/.{10}", "") %>%
    str_replace_all(".u0.{3}|\\&amp\\;", "'") %>%
    str_replace_all("\\\r|\\\n", "") %>%
    stri_enc_toutf8()
}

Notice how I don’t have a return() line? Because I’m using pipes, the function just returns the result of the last line (in this case, the end of the pipe). This pipe has three steps: removing the URLs, removing encoding issues, removing some html codes (\r and \n), and converting the string into a utf-8 encoding (which is useful for some functions in natural language processing).

tweet_df$text[3]
## [1] "The Journal of Sports Sciences is seeking an academic researcher with a strong research background in #sportscience. Apply to become the Social Media Editor of the journal before January 31: https://t.co/RBNicyj6nj #AcademicTwitter @JSportsSci @grantabt https://t.co/xXPoG5v9y8"
tw_cleaner(tweet_df$text[3])
## [1] "The Journal of Sports Sciences is seeking an academic researcher with a strong research background in #sportscience. Apply to become the Social Media Editor of the journal before January 31:  #AcademicTwitter @JSportsSci @grantabt "

Looks like it’s working!

7.1 Optional Arguments

What if we wanted to add an optional argument so you could also make the string lowercase if you wanted? You can add this using what’s called ifelse (“if else”) functions. They do exactly what you think they do: if some condition is TRUE, it will do one thing and if the condition is FALSE, it will do something else.

if(logical condition){
  #do this
} else if{ #you can string these together
  #do this
} else {
  #do this
}

As you can see, ifelse statements can be strung together. We’ll do exactly this to create an argument for lowercasing. Below, our function has a new argument: lowercase. We’re going to treat lowercase as if it’s a logical value (TRUE/FALSE) and use in our ifelse functions.

When creating an optional argument in a function, you want to make sure the start of your ifelse chain begins with the missing(<variable>) function. The first thing the computer will want to know is what to do when the user does not include a lowercase argument (aka: when lowercase is NULL). In this case, we’re going to default lowercase to FALSE, so we return() the same result as we would if lowercase == FALSE.

tw_cleaner_v2 <- function(tweet, lowercase) {
  revised <- tweet %>% 
    str_replace_all("https://t.co/.{10}", "") %>%
    str_replace_all(".u0.{3}|\\&amp\\;", "'") %>%
    str_replace_all("\\\r|\\\n", "") %>%
    stri_enc_toutf8() #this is the same as the earlier function
  if(missing(lowercase)) { #if lowercase is missing, or NULL
    return(revised) #return the string as-is up to this point
  } else if(lowercase == FALSE) { #if lowercase is set to FALSE
    return(revised) #return the stirng as-is
  } else{ #else (if lowercase == TRUE)
    return(tolower(revised)) #return the lowercased string
  } 
}
tweet_df$text[3]
## [1] "The Journal of Sports Sciences is seeking an academic researcher with a strong research background in #sportscience. Apply to become the Social Media Editor of the journal before January 31: https://t.co/RBNicyj6nj #AcademicTwitter @JSportsSci @grantabt https://t.co/xXPoG5v9y8"
tw_cleaner_v2(tweet_df$text[3], TRUE)
## [1] "the journal of sports sciences is seeking an academic researcher with a strong research background in #sportscience. apply to become the social media editor of the journal before january 31:  #academictwitter @jsportssci @grantabt "
tw_cleaner_v2(tweet_df$text[3], FALSE)
## [1] "The Journal of Sports Sciences is seeking an academic researcher with a strong research background in #sportscience. Apply to become the Social Media Editor of the journal before January 31:  #AcademicTwitter @JSportsSci @grantabt "
tw_cleaner_v2(tweet_df$text[3])
## [1] "The Journal of Sports Sciences is seeking an academic researcher with a strong research background in #sportscience. Apply to become the Social Media Editor of the journal before January 31:  #AcademicTwitter @JSportsSci @grantabt "

##functions + apply

Looks like the function is working! But how do we use this function across the whole data frame, then? Well, we can use apply()! Recall that in the last tutorial, you can use apply() and map() to apply one function across a vector. We’ll use lapply() here, but now with the function we have created.

tweet_df$text_clean <- lapply(tweet_df$text, tw_cleaner_v2, lowercase = TRUE)
head(tweet_df$text_clean)
## [[1]]
## [1] "please, always use a #colorblind friendly palette when drawing figures, especially when colours are important to distinguish points/regions. if i could i'd recolour all my previous figures. @academicchatter#academicchatter #academictwitter "
## 
## [[2]]
## [1] "this is a sad development: #jstor will move to stop hosting current articles and go back to only archival issues. #academictwitter "
## 
## [[3]]
## [1] "the journal of sports sciences is seeking an academic researcher with a strong research background in #sportscience. apply to become the social media editor of the journal before january 31:  #academictwitter @jsportssci @grantabt "
## 
## [[4]]
## [1] "hear what a participant had to say about our workshop last week!<u+0001f9e1><u+0001f9e1><u+0001f9e1> theres still time to book your ticket! follow the link  @marycparker #antiracism #antiracist #academictwitter #virtuallearning #education "
## 
## [[5]]
## [1] "please, always use a #colorblind friendly palette when drawing figures, especially when colours are important to distinguish points/regions. if i could i'd recolour all my previous figures. @academicchatter#academicchatter #academictwitter "
## 
## [[6]]
## [1] "when you have a bad day in lab, cannot get the desired results or don't understand something...#academictwitter "

Note that to include an optional argument in an apply() function, you just need to include it at the end. This is because lapply() will pass on whatever functions it doesn’t recognize to the next function (in this case, tw_cleaner_v2()). If it’s an argument your custom function doesn’t recognize, it will spit out an error.

7.2 Other Resources