2 Basics of R
This chapter serves as a primer to R
by introducing the basics. It is advised to follow the lab via the .rmd
file within RStudio rather than solely the compiled PDF. This way, students can experiment with code in the “code blocks” provided. Note: In text R code is displayed in a fixed-width
font.
2.1 R, as a Calculator
The first thing to know about R is that it is essentially a large calculator capable of performing arithmetic:
## [1] 2
## [1] 64
## [1] 256
## [1] 2401
## [1] 21
R also supports elementary and algebraic functions such as log and square root.
## [1] 4.60517
## [1] 7
2.1.1 Order of Operations
R solves equations according to the order of operations, “PEMDAS”:
- Parentheses
- Exponents
- Multiplication
- Division
- Addition
- Subtraction
Watch this video for a refresher on the order of operations: https://www.youtube.com/watch?v=94yAmf7GyHw
Try this! Using R, solve: (5 + 1) ^ 4 / (9 - 2) ^ 3
2.2 Objects
R is an “object oriented” programming language. Put simply, R uses objects to store attributes. Objects are created by assigning an attibute to them via the <-
operation. You can always view the attribute of an object by typing the object name.
## [1] 20
Try this! Create a new object below; you can name it almost anything!
R includes various functions for managing created objects. The ls()
function lists all existing objects.
## [1] "ds" "object1" "packages" "pagebreak" "pkg"
The rm()
function removes existing objects.
There is no strict convention for naming objects; however, there are best practices:
- Avoid spaces; use underscores, periods, or CamelCase (or camelCase) for long object names
- e.g., This_is_a_long_name, This.Is.A.Long.Name, thisIsALongName
- Avoid names of existing functions or reserved R objects
- e.g.,
mean
orsum
- e.g.,
- Be descriptive but keep it short (less than 10 characters)
- Avoid special characters
- e.g., ? $ % ^ &
- Numbers are fine, but names cannot begin with numbers.
- e.g.,
object1
is ok, but1object
is not
- e.g.,
Important: Object names are case sensitive.
- e.g., object.One
and Object.One
refer to two separate objects
## [1] 20
## [1] 10
2.3 Functions
In addition to elementary and algebraic functions, R includes functions that simplify statistical analysis. For example, the mean()
function calculates the mean.
Note: Sometimes na.rm = TRUE
is necessary within the paranetheses to instruct R to ignore missing data.
## [1] 42.98
R comes with a vairety of “built in” data sets, such as the cars
data set, which contains some information about about cars. This data set is used below to demonstrate and/or experiment with R functions.
A note about syntax: the dollar sign, $
, is used to indicate the variable of interest relative to a data set. This is important in the case of multiple data sets that contain variables of the same name. In the previous code, R calculated the mean using the dist
variable within the cars
data set by specifying cars$dist
To ignore missing data when calculating the mean of dist
, include the na.rm = TRUE
argument within the paranetheses as follows.
## [1] 42.98
Note: The mean is exactly the same because there is no missing data in the dist
variable.
2.3.1 Object Types
Object types are improtant in R and the type of an object is contingent on the attribute stored by the object. For example, an object storing characters (e.g., “blue”) has a different type than an object storing a number (e.g., 1). Use of R functions is contingent on the type of objects. For example, functions like mean()
work only for objects containing numbers. The R str()
function describes the structure of objects and functions.
## Formal class 'standardGeneric' [package "methods"] with 8 slots
## ..@ .Data :function (x, ...)
## ..@ generic : chr "mean"
## .. ..- attr(*, "package")= chr "base"
## ..@ package : chr "base"
## ..@ group : list()
## ..@ valueClass: chr(0)
## ..@ signature : chr "x"
## ..@ default :Formal class 'derivedDefaultMethod' [package "methods"] with 4 slots
## .. .. ..@ .Data :function (x, ...)
## .. .. ..@ target :Formal class 'signature' [package "methods"] with 3 slots
## .. .. .. .. ..@ .Data : chr "ANY"
## .. .. .. .. ..@ names : chr "x"
## .. .. .. .. ..@ package: chr "methods"
## .. .. ..@ defined:Formal class 'signature' [package "methods"] with 3 slots
## .. .. .. .. ..@ .Data : chr "ANY"
## .. .. .. .. ..@ names : chr "x"
## .. .. .. .. ..@ package: chr "methods"
## .. .. ..@ generic: chr "mean"
## .. .. .. ..- attr(*, "package")= chr "base"
## ..@ skeleton : language (new("derivedDefaultMethod", .Data = function (x, ...) UseMethod("mean"), target = new("signature", .Data = "ANY| __truncated__ ...
## num 20
## 'data.frame': 50 obs. of 2 variables:
## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...
## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
## num [1:50] 2 10 4 22 16 10 18 26 34 17 ...
The str()
function described mean
as a function, object.One
as a numeric object, the cars
data as a data frame, etc.
Previously, objects were introduced as a method of storing single attributes, either a specified value or the result of arithmetic. In addition, objects can contain a collection of data via a vector or list. In mathematics and physics, a vector is defined as a quantity of both direction and magnitude. In R, vectors are defined as a collection of data of the same type. The c()
function creates a vector.
## [1] 1 2 3
## num [1:3] 1 2 3
Further, a list is defined as a collection of multiple data types.
## [[1]]
## [1] "your name"
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] FALSE
## List of 3
## $ : chr "your name"
## $ : num 1
## $ : logi FALSE
Note: The structure of the list object consists of a character, a number, and a logic (True/False).
2.4 Packages
Packages expand R to include additional functions important to statistical analysis.
2.4.1 Installing Packages
Installing packages in R can be performed via install.packages("packagename")
, whereby the name of the desired package must be within quotation marks.
Note: Occasionally a package may require dependencies for installation. The dependencies can be automatically installed along with the package by including the dependencies = TRUE
argument within the install.packages()
function.
Try this! Use this code to install the following packages:
car
psych
memisc
Rcpp
2.4.2 Loading Packages
After installation, packages must be loaded to use their functions within R. Packages are loaded via library(packagename)
.
Note: Unlike the install.packages()
function, the library()
function does not require quotation marks around the package name.
##
## Attaching package: 'knitr'
## The following object is masked from 'package:skimr':
##
## kable
Note: The memisc
package contains object/package conflicts with the car
package for the recode
object. A conflict occurs when two packages contain objects (e.g., functions) of the same name. A conflict will not prevent loading packages; however, use of a specific package’s object requires an explicit call to the desired parent package. For example, to use the recode()
function from car
, the car::recode(variable)
statement will explicitly call recode()
from the car
package. Vice versa, memisc::recode()
will explicitly call recode()
from the memisc
package.
2.4.3 Updating Packages
Most packages are regularly updated. The old.packages()
function compares installed packages to their latest versions online.
The update.packages()
function updates out of date packages.
Note: Updating packages requires consent. The ask = FALSE
argument will skip the additional consent step to save time.
## AER :
## Version 1.2-6 installed in /Library/Frameworks/R.framework/Versions/3.6/Resources/library
## Version 1.2-7 available at https://cran.rstudio.com
## cancelled by user
The library()
function lists currently loaded packages.
As previously demonstrated, occasionally conflicts exist between packages. The conflicts()
fubction lists conflicts between loaded packages.
## [1] "show" "traceplot"
## [3] "show" "show"
## [5] "summary" "factorize"
## [7] "cpp_object_initializer" "show"
## [9] "plot" "sim"
## [11] "summary" "Arith"
## [13] "as.array" "coerce"
## [15] "Compare" "format"
## [17] "initialize" "Math"
## [19] "Math2" "print"
## [21] "rename" "show"
## [23] "style" "summary"
## [25] "Summary" "negative.binomial"
## [27] "select" "densityplot"
## [29] "%>%" "filter"
## [31] "kable" "%>%"
## [33] "%>%" "%>%"
## [35] "arrange" "arrange_"
## [37] "collect" "contains"
## [39] "distinct" "distinct_"
## [41] "do" "do_"
## [43] "ends_with" "everything"
## [45] "filter" "filter_"
## [47] "group_by" "group_by_"
## [49] "groups" "last"
## [51] "matches" "mutate"
## [53] "mutate_" "num_range"
## [55] "one_of" "recode"
## [57] "rename" "rename_"
## [59] "select" "select_"
## [61] "slice" "slice_"
## [63] "starts_with" "summarise"
## [65] "summarise_" "syms"
## [67] "transmute" "transmute_"
## [69] "ungroup" "%@%"
## [71] "%>%" "reduce"
## [73] "%>%" "expand"
## [75] "extract" "add_row"
## [77] "as_data_frame" "as_tibble"
## [79] "data_frame" "data_frame_"
## [81] "frame_data" "glimpse"
## [83] "lst" "lst_"
## [85] "tbl_sum" "tibble"
## [87] "tribble" "trunc_mat"
## [89] "type_sum" "arrow"
## [91] "enexpr" "enexprs"
## [93] "enquo" "enquos"
## [95] "ensym" "ensyms"
## [97] "expr" "last_plot"
## [99] "quo" "quo_name"
## [101] "quos" "stat"
## [103] "sym" "syms"
## [105] "unit" "vars"
## [107] "%+%" "alpha"
## [109] "logit" "lookup"
## [111] "rescale" "sim"
## [113] "smiths" "logit"
## [115] "recode" "some"
## [117] "coef" "coefficients"
## [119] "contr.sum" "contr.treatment"
## [121] "contrasts" "contrasts<-"
## [123] "cov2cor" "df.residual"
## [125] "filter" "fitted"
## [127] "lag" "predict"
## [129] "residuals" "toeplitz"
## [131] "update" "vcov"
## [133] "image" "layout"
## [135] "plot" "head"
## [137] "prompt" "tail"
## [139] "npk" "Arith"
## [141] "cbind2" "coerce"
## [143] "Compare" "initialize"
## [145] "kronecker" "Logic"
## [147] "Math" "Math2"
## [149] "Ops" "rbind2"
## [151] "show" "Summary"
## [153] "%in%" "all.equal"
## [155] "as.array" "as.factor"
## [157] "as.matrix" "as.ordered"
## [159] "body<-" "chol"
## [161] "chol2inv" "colMeans"
## [163] "colSums" "crossprod"
## [165] "det" "determinant"
## [167] "diag" "diag<-"
## [169] "diff" "drop"
## [171] "formals<-" "format"
## [173] "intersect" "isSymmetric"
## [175] "kronecker" "labels"
## [177] "mean" "merge"
## [179] "norm" "Position"
## [181] "print" "qr"
## [183] "qr.coef" "qr.fitted"
## [185] "qr.Q" "qr.qty"
## [187] "qr.qy" "qr.R"
## [189] "qr.resid" "rcond"
## [191] "row.names" "rowMeans"
## [193] "rowSums" "sample"
## [195] "setdiff" "setequal"
## [197] "solve" "subset"
## [199] "summary" "t"
## [201] "tcrossprod" "union"
## [203] "unique" "unname"
## [205] "which" "within"
## [207] "zapsmall"
The detach()
function detaches packages and is an alternative method to resolve conflicts. Supplying the unload = TRUE
argument within the detach()
function will unload the package. For example, to resolve the recode()
function conflict between car
and memisc
, the memisc package can be detached and unloaded as follows: detach(package:memisc, unload = TRUE)
.
2.5 R Help
R includes a help function to assist with functions, accessible by including a ?
prior to the function name.
Note: The help documentation will display in the bottom right quadrant of RStudio. Alternatively, typing the function name into the help search bar will yield a similar result.
To search all of R documentation for help about a function, use ??
.
Note: Google is a valuable tool for finding help. Large communities like StackExchange provide answers and explanations to common issues in R. At times, a particular problem may seem unique, but someone else has almost certainly had the same problem and the solution likely can be found online.
2.6 Setting a Working Directory
The working directory is the location where files are accessed and saved within a R session. Normally, the working directory is set at the beginning of every R file. The working directory should be set and the class data loaded at the beginning of each lab.
There are two methods of setting the working directory. First, the setwd()
function can be used with the directory path. For example, setwd("C:/Directory\_to\_folder/")
.
Note: Forward slashes are used in place of backward slashes for directory paths.
Second, within RStudio, the “Session” tab will allow you to set the working directory. The following steps provide guidance to the “Session” tab functionality:
- Click the “Session” tab.
- Select “Set Working Directory.”
- Select “Choose Directory.”
- Select the working directory.
The getwd()
function returns the set working directory.
## [1] "/Users/josephripberger/Documents/GitHub/qrmlabs"
2.7 Importing Your Data
R can read many different file types, including text files, Excel files, Google Sheet files, SPSS files, and Stata files. It can even read data sets directly from websites. The file type determines the function that is necessary import a data set. For example, CSV files use the function read.csv
to import a dataset. Here is an example that uses this function:
ds <- read.csv("Class Data Set Factored.csv", header = TRUE)
This line of code saves the data set in Class Data Set Factored.csv
to an object called ds (short for data set). The header = TRUE
argument tells R that the first row in the data set provides column (or variable) names.
Note: This code assumes that Class Data Set Factored.csv
is in the working directory. To check, use list.files()
. If the file containing the data set is not in the working directory, provide the complete file path in the read.csv
function, like this:
ds <- read_csv("https://github.com/ripberjt/qrmlabs/raw/master/Class%20Data%20Set%20Factored.csv", header = TRUE)