Introduction1

Welcome to the R Handbook for ESM Students. This handbook is a hands-on guide to help you learn R. It will take you from installation and set up, to data cleaning, analysis, visualization, and reporting.

This guide uses real data to help you practice with R. Specifically, it uses survey data from the RStudio Learning R Survey. It also includes data from built-in R data sets and simulated data.

What is R?

R is a free, open-source programming language for statistics and data visualization. It is used in a variety of fields (e.g. science, business, education) and is considered an in-demand skill to learn (Muenchen, 2020). As of June, 2020, R is also among the top-10 most popular programming languages.

R is useful not only because it is popular, but because it facilitates a reproducible workflow where one can easily reproduce cleaning and analyses operations with minimal effort. Furthermore, because it is open-source, it allows users contribute to and expand its base functions, giving it the power to do things other statistical software packages cannot.

Key Terms

Below are some key terms you should be familiar with before getting started with R. Links will lead you to more information and examples contained in this handbook.

Term Definition
argument options that control what R functions do
assign to place data or results into a data object in the environment
base base R is the basic set of packages installed with R
class the category a data object belongs to (e.g. integer, character, logical, date, etc.)
comment information in an R script that either 1) gives additional notes that are not commands and/or 2) disables code from being run but preserves that code for future use through uncommenting
data frame a common data structure similar to a table of columns and rows; data can be of many classes
data object a data set, variable, or other form of information stored in R Studio’s environment
environment 1) software or program; 2) the work space in which data objects are saved in R Studio
factor a categorical variable
function a command that usually takes arguments; functions will always have () after their name. Arguments are entered into these parentheses
library Where R’s packages are stored. This is also a function to load those packages
load To bring something (a package, a data set) into the R workspace.
logical This is a class of data which contains information that is either TRUE or FALSE
operator a symbol, such as +, -, or %>%
package a set of functions that help R do more things
script Lines of code that tells R what to do
tidyverse A set of packages that use a cohesive set of functions to make R programming easier
vector A single variable or column that must be of only one class type

Common Symbols

Symbol Name Definition
= equals used for assignment and for inputting arguments
== double equals used for comparisons
!= not equal to NA
<- assignment operator used to assign dataframes and vectors to data objects
%>% pipe operator a tidyverse tool used to nest or link functions rather than writing them separately; it is similar to “and then”
:: name space operator used to use a specific function from a package without loading a package, e.g. psych::describe()
& and used in logical statements, e.g. ifelse(a= 1 & b=1...) | |$ |*variable operator* |uses data from a specific variable in a data object (dataframe or list), e.g.,data$variable1| |&#124; |*or* |used in logical statements, e.g.ifelse(a= 1 | b=1…)
[] single brackets refers to a row and column, e.g. data[1] is the first column of data, data[1,1] is the first column and the first row
[[]] double brackets refers to list elements, e.g. list[[1]] refers to the first element in a list
+ plus a mathematical operator; if using the ggplot2 package, + is the same as %>%
- minus a mathematical operator. Do not use - in variable names
F false also written FALSE - note the capital letters
NA not available refers to missing data
NaN not a number is an undefined number, such as 0 divided by 0
NULL null refers to 0 or nothing as an argument
T true also written as TRUE - note the capital letters

  1. Originally created by Anthony Schmidt and Austin Boyd, ESM Doctoral Students