2 Basics of R

This chapter serves as a primer to R by introducing the basics. It is advised to follow the lab via the .rmd file within RStudio rather than solely the compiled PDF. This way, students can experiment with code in the “code blocks” provided. Note: In text R code is displayed in a fixed-width font.

2.1 R, as a Calculator

The first thing to know about R is that it is essentially a large calculator capable of performing arithmetic:

1 + 1
## [1] 2
8 * 8
## [1] 64
2 ^ 8 # exponent
## [1] 256
(5 + 2) ^ 4
## [1] 2401
5 + 2 ^ 4
## [1] 21

R also supports elementary and algebraic functions such as log and square root.

log(100)
## [1] 4.60517
sqrt(49)
## [1] 7

2.1.1 Order of Operations

R solves equations according to the order of operations, “PEMDAS”:

  1. Parentheses
  2. Exponents
  3. Multiplication
  4. Division
  5. Addition
  6. Subtraction

Watch this video for a refresher on the order of operations: https://www.youtube.com/watch?v=94yAmf7GyHw

Try this! Using R, solve: (5 + 1) ^ 4 / (9 - 2) ^ 3

2.2 Objects

R is an “object oriented” programming language. Put simply, R uses objects to store attributes. Objects are created by assigning an attibute to them via the <- operation. You can always view the attribute of an object by typing the object name.

object1 <- 10 + 10 
object1
## [1] 20

Try this! Create a new object below; you can name it almost anything!

R includes various functions for managing created objects. The ls() function lists all existing objects.

ls()
## [1] "ds"       "object1"  "packages" "pkg"

The rm() function removes existing objects.

rm(object1)

There is no strict convention for naming objects; however, there are best practices:

  1. Avoid spaces; use underscores, periods, or CamelCase (or camelCase) for long object names
    • e.g., This_is_a_long_name, This.Is.A.Long.Name, thisIsALongName
  2. Avoid names of existing functions or reserved R objects
    • e.g., mean or sum
  3. Be descriptive but keep it short (less than 10 characters)
  4. Avoid special characters
    • e.g., ? $ % ^ &
  5. Numbers are fine, but names cannot begin with numbers.
    • e.g., object1 is ok, but 1object is not

Important: Object names are case sensitive. - e.g., object.One and Object.One refer to two separate objects

object.One <- 10 + 10
Object.One <- 5 + 5

object.One
## [1] 20
Object.One
## [1] 10

2.3 Functions

In addition to elementary and algebraic functions, R includes functions that simplify statistical analysis. For example, the mean() function calculates the mean.

Note: Sometimes na.rm = TRUE is necessary within the paranetheses to instruct R to ignore missing data.

mean(cars$dist)
## [1] 42.98

R comes with a vairety of “built in” data sets, such as the cars data set, which contains some information about about cars. This data set is used below to demonstrate and/or experiment with R functions.

A note about syntax: the dollar sign, $, is used to indicate the variable of interest relative to a data set. This is important in the case of multiple data sets that contain variables of the same name. In the previous code, R calculated the mean using the dist variable within the cars data set by specifying cars$dist

To ignore missing data when calculating the mean of dist, include the na.rm = TRUE argument within the paranetheses as follows.

mean(cars$dist, na.rm = TRUE)
## [1] 42.98

Note: The mean is exactly the same because there is no missing data in the dist variable.

2.3.1 Object Types

Object types are improtant in R and the type of an object is contingent on the attribute stored by the object. For example, an object storing characters (e.g., “blue”) has a different type than an object storing a number (e.g., 1). Use of R functions is contingent on the type of objects. For example, functions like mean() work only for objects containing numbers. The R str() function describes the structure of objects and functions.

str(mean)
## function (x, ...)
str(object.One)
##  num 20
str(cars)
## 'data.frame':    50 obs. of  2 variables:
##  $ speed: num  4 4 7 7 8 9 10 10 10 11 ...
##  $ dist : num  2 10 4 22 16 10 18 26 34 17 ...
str(cars$dist)
##  num [1:50] 2 10 4 22 16 10 18 26 34 17 ...

The str() function described mean as a function, object.One as a numeric object, the cars data as a data frame, etc.

Previously, objects were introduced as a method of storing single attributes, either a specified value or the result of arithmetic. In addition, objects can contain a collection of data via a vector or list. In mathematics and physics, a vector is defined as a quantity of both direction and magnitude. In R, vectors are defined as a collection of data of the same type. The c() function creates a vector.

vectorObject <- c(1, 2, 3)
vectorObject
## [1] 1 2 3
str(vectorObject)
##  num [1:3] 1 2 3

Further, a list is defined as a collection of multiple data types.

listObject <- list("your name", 1, F)
listObject
## [[1]]
## [1] "your name"
## 
## [[2]]
## [1] 1
## 
## [[3]]
## [1] FALSE
str(listObject)
## List of 3
##  $ : chr "your name"
##  $ : num 1
##  $ : logi FALSE

Note: The structure of the list object consists of a character, a number, and a logic (True/False).

2.4 Packages

Packages expand R to include additional functions important to statistical analysis.

2.4.1 Installing Packages

Installing packages in R can be performed via install.packages("packagename"), whereby the name of the desired package must be within quotation marks.

Note: Occasionally a package may require dependencies for installation. The dependencies can be automatically installed along with the package by including the dependencies = TRUE argument within the install.packages() function.

Try this! Use this code to install the following packages:

  1. car
  2. psych
  3. memisc
  4. Rcpp

2.4.2 Loading Packages

After installation, packages must be loaded to use their functions within R. Packages are loaded via library(packagename). Note: Unlike the install.packages() function, the library() function does not require quotation marks around the package name.

library(car)
library(psych)
library(memisc) 
library(Rcpp)
library(rmarkdown)
library(knitr)

Note: The memisc package contains object/package conflicts with the car package for the recode object. A conflict occurs when two packages contain objects (e.g., functions) of the same name. A conflict will not prevent loading packages; however, use of a specific package’s object requires an explicit call to the desired parent package. For example, to use the recode() function from car, the car::recode(variable) statement will explicitly call recode() from the car package. Vice versa, memisc::recode() will explicitly call recode() from the memisc package.

2.4.3 Updating Packages

Most packages are regularly updated. The old.packages() function compares installed packages to their latest versions online.

The update.packages() function updates out of date packages.

Note: Updating packages requires consent. The ask = FALSE argument will skip the additional consent step to save time.

update.packages()
## AER :
##  Version 1.2-5 installed in C:/Users/josie/OneDrive/Documents/R/win-library/3.5 
##  Version 1.2-6 available at https://cran.rstudio.com
## cancelled by user

The library() function lists currently loaded packages.

library()

As previously demonstrated, occasionally conflicts exist between packages. The conflicts() fubction lists conflicts between loaded packages.

conflicts()
##   [1] "G"                 "show"              "show"             
##   [4] "t"                 "factorize"         "show"             
##   [7] "traceplot"         "sim"               "summary"          
##  [10] "z"                 "Arith"             "as.array"         
##  [13] "coerce"            "Compare"           "format"           
##  [16] "initialize"        "Math"              "Math2"            
##  [19] "print"             "rename"            "show"             
##  [22] "style"             "summary"           "Summary"          
##  [25] "negative.binomial" "select"            "%>%"              
##  [28] "kable"             "logit"             "recode"           
##  [31] "%>%"               "%>%"               "%>%"              
##  [34] "reduce"            "some"              "%>%"              
##  [37] "expand"            "smiths"            "alpha"            
##  [40] "logit"             "rescale"           "sim"              
##  [43] "%>%"               "add_row"           "arrange"          
##  [46] "arrange_"          "as_data_frame"     "as_tibble"        
##  [49] "collect"           "contains"          "data_frame"       
##  [52] "data_frame_"       "distinct"          "distinct_"        
##  [55] "do"                "do_"               "ends_with"        
##  [58] "everything"        "filter"            "filter_"          
##  [61] "frame_data"        "glimpse"           "group_by"         
##  [64] "group_by_"         "groups"            "last"             
##  [67] "lst"               "lst_"              "matches"          
##  [70] "mutate"            "mutate_"           "n"                
##  [73] "num_range"         "one_of"            "recode"           
##  [76] "rename"            "rename_"           "select"           
##  [79] "select_"           "slice"             "slice_"           
##  [82] "starts_with"       "summarise"         "summarise_"       
##  [85] "syms"              "tbl_sum"           "tibble"           
##  [88] "transmute"         "transmute_"        "tribble"          
##  [91] "trunc_mat"         "type_sum"          "ungroup"          
##  [94] "%+%"               "alpha"             "arrow"            
##  [97] "enexpr"            "enexprs"           "enquo"            
## [100] "enquos"            "ensym"             "ensyms"           
## [103] "expr"              "last_plot"         "quo"              
## [106] "quo_name"          "quos"              "stat"             
## [109] "sym"               "syms"              "unit"             
## [112] "vars"              "coef"              "coefficients"     
## [115] "contr.sum"         "contr.treatment"   "contrasts"        
## [118] "contrasts<-"       "cov2cor"           "df"               
## [121] "df.residual"       "filter"            "fitted"           
## [124] "lag"               "predict"           "resid"            
## [127] "residuals"         "toeplitz"          "update"           
## [130] "vcov"              "image"             "layout"           
## [133] "plot"              "data"              "head"             
## [136] "prompt"            "tail"              "npk"              
## [139] "Arith"             "cbind2"            "coerce"           
## [142] "Compare"           "initialize"        "kronecker"        
## [145] "Logic"             "Math"              "Math2"            
## [148] "Ops"               "rbind2"            "show"             
## [151] "Summary"           "%in%"              "all.equal"        
## [154] "as.array"          "as.Date"           "as.Date.numeric"  
## [157] "as.factor"         "as.matrix"         "as.ordered"       
## [160] "body<-"            "chol"              "chol2inv"         
## [163] "colMeans"          "colSums"           "crossprod"        
## [166] "det"               "determinant"       "diag"             
## [169] "diag<-"            "diff"              "drop"             
## [172] "F"                 "formals<-"         "format"           
## [175] "I"                 "intersect"         "isSymmetric"      
## [178] "kronecker"         "labels"            "length"           
## [181] "mean"              "merge"             "norm"             
## [184] "Position"          "print"             "qr"               
## [187] "qr.coef"           "qr.fitted"         "qr.Q"             
## [190] "qr.qty"            "qr.qy"             "qr.R"             
## [193] "qr.resid"          "rcond"             "row.names"        
## [196] "rowMeans"          "rowSums"           "sample"           
## [199] "setdiff"           "setequal"          "solve"            
## [202] "sub"               "subset"            "summary"          
## [205] "t"                 "tcrossprod"        "union"            
## [208] "unique"            "unname"            "which"            
## [211] "within"            "zapsmall"

The detach() function detaches packages and is an alternative method to resolve conflicts. Supplying the unload = TRUE argument within the detach() function will unload the package. For example, to resolve the recode() function conflict between car and memisc, the memisc package can be detached and unloaded as follows: detach(package:memisc, unload = TRUE).

2.5 R Help

R includes a help function to assist with functions, accessible by including a ? prior to the function name.

? mean

Note: The help documentation will display in the bottom right quadrant of RStudio. Alternatively, typing the function name into the help search bar will yield a similar result.

To search all of R documentation for help about a function, use ??.

?? mean

Note: Google is a valuable tool for finding help. Large communities like StackExchange provide answers and explanations to common issues in R. At times, a particular problem may seem unique, but someone else has almost certainly had the same problem and the solution likely can be found online.

2.6 Setting a Working Directory

The working directory is the location where files are accessed and saved within a R session. Normally, the working directory is set at the beginning of every R file. The working directory should be set and the class data loaded at the beginning of each lab.

There are two methods of setting the working directory. First, the setwd() function can be used with the directory path. For example, setwd("C:/Directory\_to\_folder/").

Note: Forward slashes are used in place of backward slashes for directory paths.

Second, within RStudio, the “Session” tab will allow you to set the working directory. The following steps provide guidance to the “Session” tab functionality:

  1. Click the “Session” tab.
  2. Select “Set Working Directory.”
  3. Select “Choose Directory.”
  4. Select the working directory.

The getwd() function returns the set working directory.

getwd()
## [1] "C:/Users/josie/OneDrive - University of Oklahoma/GitHub/qrmlabs"

2.7 Importing Your Data

R can read many different file types, including text files, Excel files, Google Sheet files, SPSS files, and Stata files. It can even read data sets directly from websites. The file type determines the function that is necessary import a data set. For example, CSV files use the function read.csv to import a dataset. Here is an example that uses this function:

ds <- read.csv("Class Data Set Factored.csv", header = TRUE)

This line of code saves the data set in Class Data Set Factored.csv to an object called ds (short for data set). The header = TRUE argument tells R that the first row in the data set provides column (or variable) names.

Note: This code assumes that Class Data Set Factored.csv is in the working directory. To check, use list.files(). If the file containing the data set is not in the working directory, provide the complete file path in the read.csv function, like this:

ds <- read_csv("https://github.com/ripberjt/qrmlabs/raw/master/Class%20Data%20Set%20Factored.csv", header = TRUE)