R Software Handbook
Last Updated 2020-11-29
Introduction1
Welcome to the R Handbook for ESM Students. This handbook is a hands-on guide to help you learn R. It will take you from installation and set up, to data cleaning, analysis, visualization, and reporting.
This guide uses real data to help you practice with R. Specifically, it uses survey data from the RStudio Learning R Survey. It also includes data from built-in R
data sets and simulated data.
What is R?
R is a free, open-source programming language for statistics and data visualization. It is used in a variety of fields (e.g. science, business, education) and is considered an in-demand skill to learn (Muenchen, 2020). As of June, 2020, R is also among the top-10 most popular programming languages.
R is useful not only because it is popular, but because it facilitates a reproducible workflow where one can easily reproduce cleaning and analyses operations with minimal effort. Furthermore, because it is open-source, it allows users contribute to and expand its base functions, giving it the power to do things other statistical software packages cannot.
Key Terms
Below are some key terms you should be familiar with before getting started with R. Links will lead you to more information and examples contained in this handbook.
Term | Definition |
---|---|
argument | options that control what R functions do |
assign | to place data or results into a data object in the environment |
base | base R is the basic set of packages installed with R |
class | the category a data object belongs to (e.g. integer, character, logical, date, etc.) |
comment | information in an R script that either 1) gives additional notes that are not commands and/or 2) disables code from being run but preserves that code for future use through uncommenting |
data frame | a common data structure similar to a table of columns and rows; data can be of many classes |
data object | a data set, variable, or other form of information stored in R Studio’s environment |
environment | 1) software or program; 2) the work space in which data objects are saved in R Studio |
factor | a categorical variable |
function | a command that usually takes arguments; functions will always have () after their name. Arguments are entered into these parentheses |
library | Where R’s packages are stored. This is also a function to load those packages |
load | To bring something (a package, a data set) into the R workspace. |
logical | This is a class of data which contains information that is either TRUE or FALSE |
operator | a symbol, such as + , - , or %>% |
package | a set of functions that help R do more things |
script | Lines of code that tells R what to do |
tidyverse | A set of packages that use a cohesive set of functions to make R programming easier |
vector | A single variable or column that must be of only one class type |
Common Symbols
Symbol | Name | Definition |
---|---|---|
= | equals | used for assignment and for inputting arguments |
== | double equals | used for comparisons |
!= | not equal to | NA |
<- | assignment operator | used to assign dataframes and vectors to data objects |
%>% | pipe operator | a tidyverse tool used to nest or link functions rather than writing them separately; it is similar to “and then” |
:: | name space operator | used to use a specific function from a package without loading a package, e.g. psych::describe() |
& | and | used in logical statements, e.g. ifelse(a= 1 & b=1...) | |$ |*variable operator* |uses data from a specific variable in a data object (dataframe or list), e.g., data$variable1| || |*or* |used in logical statements, e.g. ifelse(a= 1 | b=1…) |
[] | single brackets | refers to a row and column, e.g. data[1] is the first column of data, data[1,1] is the first column and the first row |
[[]] | double brackets | refers to list elements, e.g. list[[1]] refers to the first element in a list |
+ | plus | a mathematical operator; if using the ggplot2 package, + is the same as %>% |
- | minus | a mathematical operator. Do not use - in variable names |
F | false | also written FALSE - note the capital letters |
NA | not available | refers to missing data |
NaN | not a number | is an undefined number, such as 0 divided by 0 |
NULL | null | refers to 0 or nothing as an argument |
T | true | also written as TRUE - note the capital letters |
Originally created by Anthony Schmidt and Austin Boyd, ESM Doctoral Students↩