CheatSheet

The goal of this appendix is to give a easy reference to basic manipulation functions that are often used and should always be readily accessible.

Importing Data and Loading Packages

Function Meaning
data("DataName", package="PackageName") Load the data set DataName which is found in the package PackageName
library(PackageName) Load the package PackageName to be used.
read.csv("filename.csv") Read a .csv file. This result needs to be saved or else it is just printed

Useful vectorized functions

Function Meaning
ifelse( logicalTest, TrueResult, FalseResult ) Creates a vector of output, where elements are either the TrueResult or FalseResult based on the corresponding outcome in the logicalTest vector

Data frame (tibble) manipulation

In the examples below, df stands for an arbitrary data frame that we are applying the functions to.

Function Meaning
data.frame(x= , y=.) Creates a data frame “by hand” with one column per input.
tibble(x= , y= ) Creates a tibble “by hand” with one column per input.
tribble( ~x, ~y, 1, 2) Creates a tibble “by hand”, but with row-wise specification.
df %>% add_row(x=3, y=5) Add a single row to the df data frame. Any column with unspecified data is filled with NA.
df1 %>% bind_rows(df2) Stack data frames df1 and df2
df %>% select(ColumnNames) Subset df and return a data frame with the columns specified.
df %>% filter(logicalTest) Subset df and return a data frame with the rows that satisfy the logical expression
df %>% mutate( New= ) Create (or update) a column New with some manipulation of Old column. A common manipulation is to use an ifelse() command to update only particular rows.

Data frame (tibble) reshaping

These functions will modify an input data frame df

Function Meaning
group_by(df, Column1, Column2) Create a grouped tibble with groups defined by all unique combinations of Column1 and Column2
summarize(df, Function(Column1)) Apply Function to Column1 and return a data frame with just a single row. This is quite powerful when applied to a grouped tibble as it will result in a single row per group.
df %>% pivot_wider(names_from=, values_from=) Create a wide data set from a long format
df %>% pivot_longer(names_to, values_to) Create a long data set from a wide format