# 2 Basics of R

This chapter serves as a primer to `R`

by introducing the basics. It is advised to follow the lab via the `.rmd`

file within RStudio rather than solely the compiled PDF. This way, students can experiment with code in the “code blocks” provided. **Note:** In text R code is displayed in a `fixed-width`

font.

## 2.1 R, as a Calculator

The first thing to know about R is that it is essentially a large calculator capable of performing arithmetic:

`## [1] 2`

`## [1] 64`

`## [1] 256`

`## [1] 2401`

`## [1] 21`

R also supports elementary and algebraic functions such as log and square root.

`## [1] 4.60517`

`## [1] 7`

### 2.1.1 Order of Operations

R solves equations according to the order of operations, “PEMDAS”:

- Parentheses
- Exponents
- Multiplication
- Division
- Addition
- Subtraction

Watch this video for a refresher on the order of operations: https://www.youtube.com/watch?v=94yAmf7GyHw

**Try this!** Using R, solve:
(5 + 1) ^ 4 / (9 - 2) ^ 3

## 2.2 Objects

R is an “object oriented” programming language. Put simply, R uses objects to store attributes. Objects are created by assigning an attibute to them via the `<-`

operation. You can always view the attribute of an object by typing the object name.

`## [1] 20`

**Try this!** Create a new object below; you can name it almost anything!

R includes various functions for managing created objects. The `ls()`

function lists all existing objects.

`## [1] "ds" "object1" "packages" "pagebreak" "pkg"`

The `rm()`

function removes existing objects.

There is no strict convention for naming objects; however, there are best practices:

- Avoid spaces; use
*underscores*, periods, or CamelCase (or camelCase) for long object names- e.g., This_is_a_long_name, This.Is.A.Long.Name, thisIsALongName

- Avoid names of existing functions or reserved R objects
- e.g.,
`mean`

or`sum`

- e.g.,
- Be descriptive but keep it short (less than 10 characters)
- Avoid special characters
- e.g., ? $ % ^ &

- Numbers are fine, but names cannot begin with numbers.
- e.g.,
`object1`

is ok, but`1object`

is not

- e.g.,

**Important:** Object names are case sensitive.
- e.g., `object.One`

and `Object.One`

refer to two separate objects

`## [1] 20`

`## [1] 10`

## 2.3 Functions

In addition to elementary and algebraic functions, R includes functions that simplify statistical analysis. For example, the `mean()`

function calculates the mean.

**Note:** Sometimes `na.rm = TRUE`

is necessary within the paranetheses to instruct R to ignore missing data.

`## [1] 42.98`

R comes with a vairety of “built in” data sets, such as the `cars`

data set, which contains some information about about cars. This data set is used below to demonstrate and/or experiment with R functions.

A note about syntax: the dollar sign, `$`

, is used to indicate the variable of interest relative to a data set. This is important in the case of multiple data sets that contain variables of the same name. In the previous code, R calculated the mean using the `dist`

variable within the `cars`

data set by specifying `cars$dist`

To ignore missing data when calculating the mean of `dist`

, include the `na.rm = TRUE`

argument within the paranetheses as follows.

`## [1] 42.98`

**Note:** The mean is exactly the same because there is no missing data in the `dist`

variable.

### 2.3.1 Object Types

Object types are improtant in R and the type of an object is contingent on the attribute stored by the object. For example, an object storing characters (e.g., “blue”) has a different type than an object storing a number (e.g., 1). Use of R functions is contingent on the type of objects. For example, functions like `mean()`

work only for objects containing numbers. The R `str()`

function describes the structure of objects and functions.

```
## Formal class 'standardGeneric' [package "methods"] with 8 slots
## ..@ .Data :function (x, ...)
## ..@ generic : chr "mean"
## .. ..- attr(*, "package")= chr "base"
## ..@ package : chr "base"
## ..@ group : list()
## ..@ valueClass: chr(0)
## ..@ signature : chr "x"
## ..@ default :Formal class 'derivedDefaultMethod' [package "methods"] with 4 slots
## .. .. ..@ .Data :function (x, ...)
## .. .. ..@ target :Formal class 'signature' [package "methods"] with 3 slots
## .. .. .. .. ..@ .Data : chr "ANY"
## .. .. .. .. ..@ names : chr "x"
## .. .. .. .. ..@ package: chr "methods"
## .. .. ..@ defined:Formal class 'signature' [package "methods"] with 3 slots
## .. .. .. .. ..@ .Data : chr "ANY"
## .. .. .. .. ..@ names : chr "x"
## .. .. .. .. ..@ package: chr "methods"
## .. .. ..@ generic: chr "mean"
## .. .. .. ..- attr(*, "package")= chr "base"
## ..@ skeleton : language (new("derivedDefaultMethod", .Data = function (x, ...) UseMethod("mean"), target = new("signature", .Data = "ANY| __truncated__ ...
```

`## num 20`

```
## 'data.frame': 50 obs. of 2 variables:
## $ speed: num 4 4 7 7 8 9 10 10 10 11 ...
## $ dist : num 2 10 4 22 16 10 18 26 34 17 ...
```

`## num [1:50] 2 10 4 22 16 10 18 26 34 17 ...`

The `str()`

function described `mean`

as a function, `object.One`

as a numeric object, the `cars`

data as a data frame, etc.

Previously, objects were introduced as a method of storing single attributes, either a specified value or the result of arithmetic. In addition, objects can contain a collection of data via a vector or list. In mathematics and physics, a vector is defined as a quantity of both direction and magnitude. In R, vectors are defined as a collection of data of the same type. The `c()`

function creates a vector.

`## [1] 1 2 3`

`## num [1:3] 1 2 3`

Further, a list is defined as a collection of multiple data types.

```
## [[1]]
## [1] "your name"
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] FALSE
```

```
## List of 3
## $ : chr "your name"
## $ : num 1
## $ : logi FALSE
```

**Note:** The structure of the list object consists of a character, a number, and a logic (True/False).

## 2.4 Packages

Packages expand R to include additional functions important to statistical analysis.

### 2.4.1 Installing Packages

Installing packages in R can be performed via `install.packages("packagename")`

, whereby the name of the desired package must be within quotation marks.

**Note:** Occasionally a package may require dependencies for installation. The dependencies can be automatically installed along with the package by including the `dependencies = TRUE`

argument within the `install.packages()`

function.

**Try this!** Use this code to install the following packages:

`car`

`psych`

`memisc`

`Rcpp`

### 2.4.2 Loading Packages

After installation, packages must be loaded to use their functions within R. Packages are loaded via `library(packagename)`

.
**Note:** Unlike the `install.packages()`

function, the `library()`

function does not require quotation marks around the package name.

```
##
## Attaching package: 'knitr'
```

```
## The following object is masked from 'package:skimr':
##
## kable
```

**Note:** The `memisc`

package contains object/package conflicts with the `car`

package for the `recode`

object. A conflict occurs when two packages contain objects (e.g., functions) of the same name. A conflict will not prevent loading packages; however, use of a specific package’s object requires an explicit call to the desired parent package. For example, to use the `recode()`

function from `car`

, the `car::recode(variable)`

statement will explicitly call `recode()`

from the `car`

package. Vice versa, `memisc::recode()`

will explicitly call `recode()`

from the `memisc`

package.

### 2.4.3 Updating Packages

Most packages are regularly updated. The `old.packages()`

function compares installed packages to their latest versions online.

The `update.packages()`

function updates out of date packages.

**Note:** Updating packages requires consent. The `ask = FALSE`

argument will skip the additional consent step to save time.

```
## AER :
## Version 1.2-6 installed in /Library/Frameworks/R.framework/Versions/3.6/Resources/library
## Version 1.2-7 available at https://cran.rstudio.com
## cancelled by user
```

The `library()`

function lists currently loaded packages.

As previously demonstrated, occasionally conflicts exist between packages. The `conflicts()`

fubction lists conflicts between loaded packages.

```
## [1] "show" "traceplot"
## [3] "show" "show"
## [5] "summary" "factorize"
## [7] "cpp_object_initializer" "show"
## [9] "plot" "sim"
## [11] "summary" "Arith"
## [13] "as.array" "coerce"
## [15] "Compare" "format"
## [17] "initialize" "Math"
## [19] "Math2" "print"
## [21] "rename" "show"
## [23] "style" "summary"
## [25] "Summary" "negative.binomial"
## [27] "select" "densityplot"
## [29] "%>%" "filter"
## [31] "kable" "%>%"
## [33] "%>%" "%>%"
## [35] "arrange" "arrange_"
## [37] "collect" "contains"
## [39] "distinct" "distinct_"
## [41] "do" "do_"
## [43] "ends_with" "everything"
## [45] "filter" "filter_"
## [47] "group_by" "group_by_"
## [49] "groups" "last"
## [51] "matches" "mutate"
## [53] "mutate_" "num_range"
## [55] "one_of" "recode"
## [57] "rename" "rename_"
## [59] "select" "select_"
## [61] "slice" "slice_"
## [63] "starts_with" "summarise"
## [65] "summarise_" "syms"
## [67] "transmute" "transmute_"
## [69] "ungroup" "%@%"
## [71] "%>%" "reduce"
## [73] "%>%" "expand"
## [75] "extract" "add_row"
## [77] "as_data_frame" "as_tibble"
## [79] "data_frame" "data_frame_"
## [81] "frame_data" "glimpse"
## [83] "lst" "lst_"
## [85] "tbl_sum" "tibble"
## [87] "tribble" "trunc_mat"
## [89] "type_sum" "arrow"
## [91] "enexpr" "enexprs"
## [93] "enquo" "enquos"
## [95] "ensym" "ensyms"
## [97] "expr" "last_plot"
## [99] "quo" "quo_name"
## [101] "quos" "stat"
## [103] "sym" "syms"
## [105] "unit" "vars"
## [107] "%+%" "alpha"
## [109] "logit" "lookup"
## [111] "rescale" "sim"
## [113] "smiths" "logit"
## [115] "recode" "some"
## [117] "coef" "coefficients"
## [119] "contr.sum" "contr.treatment"
## [121] "contrasts" "contrasts<-"
## [123] "cov2cor" "df.residual"
## [125] "filter" "fitted"
## [127] "lag" "predict"
## [129] "residuals" "toeplitz"
## [131] "update" "vcov"
## [133] "image" "layout"
## [135] "plot" "head"
## [137] "prompt" "tail"
## [139] "npk" "Arith"
## [141] "cbind2" "coerce"
## [143] "Compare" "initialize"
## [145] "kronecker" "Logic"
## [147] "Math" "Math2"
## [149] "Ops" "rbind2"
## [151] "show" "Summary"
## [153] "%in%" "all.equal"
## [155] "as.array" "as.factor"
## [157] "as.matrix" "as.ordered"
## [159] "body<-" "chol"
## [161] "chol2inv" "colMeans"
## [163] "colSums" "crossprod"
## [165] "det" "determinant"
## [167] "diag" "diag<-"
## [169] "diff" "drop"
## [171] "formals<-" "format"
## [173] "intersect" "isSymmetric"
## [175] "kronecker" "labels"
## [177] "mean" "merge"
## [179] "norm" "Position"
## [181] "print" "qr"
## [183] "qr.coef" "qr.fitted"
## [185] "qr.Q" "qr.qty"
## [187] "qr.qy" "qr.R"
## [189] "qr.resid" "rcond"
## [191] "row.names" "rowMeans"
## [193] "rowSums" "sample"
## [195] "setdiff" "setequal"
## [197] "solve" "subset"
## [199] "summary" "t"
## [201] "tcrossprod" "union"
## [203] "unique" "unname"
## [205] "which" "within"
## [207] "zapsmall"
```

The `detach()`

function detaches packages and is an alternative method to resolve conflicts. Supplying the `unload = TRUE`

argument within the `detach()`

function will unload the package. For example, to resolve the `recode()`

function conflict between `car`

and `memisc`

, the memisc package can be detached and unloaded as follows: `detach(package:memisc, unload = TRUE)`

.

## 2.5 R Help

R includes a help function to assist with functions, accessible by including a `?`

prior to the function name.

**Note:** The help documentation will display in the bottom right quadrant of RStudio. Alternatively, typing the function name into the help search bar will yield a similar result.

To search all of R documentation for help about a function, use `??`

.

**Note:** Google is a valuable tool for finding help. Large communities like StackExchange provide answers and explanations to common issues in R. At times, a particular problem may seem unique, but someone else has almost certainly had the same problem and the solution likely can be found online.

## 2.6 Setting a Working Directory

The working directory is the location where files are accessed and saved within a R session. Normally, the working directory is set at the beginning of every R file. The working directory should be set and the class data loaded at the beginning of each lab.

There are two methods of setting the working directory. First, the `setwd()`

function can be used with the directory path. For example, `setwd("C:/Directory\_to\_folder/")`

.

**Note:** Forward slashes are used in place of backward slashes for directory paths.

Second, within RStudio, the “Session” tab will allow you to set the working directory. The following steps provide guidance to the “Session” tab functionality:

- Click the “Session” tab.
- Select “Set Working Directory.”
- Select “Choose Directory.”
- Select the working directory.

The `getwd()`

function returns the set working directory.

`## [1] "/Users/josephripberger/Documents/GitHub/qrmlabs"`

## 2.7 Importing Your Data

R can read many different file types, including text files, Excel files, Google Sheet files, SPSS files, and Stata files. It can even read data sets directly from websites. The file type determines the function that is necessary import a data set. For example, CSV files use the function `read.csv`

to import a dataset. Here is an example that uses this function:

`ds <- read.csv("Class Data Set Factored.csv", header = TRUE)`

This line of code saves the data set in `Class Data Set Factored.csv`

to an object called ds (short for data set). The `header = TRUE`

argument tells R that the first row in the data set provides column (or variable) names.

**Note:** This code assumes that `Class Data Set Factored.csv`

is in the working directory. To check, use `list.files()`

. If the file containing the data set is not in the working directory, provide the complete file path in the `read.csv`

function, like this:

`ds <- read_csv("https://github.com/ripberjt/qrmlabs/raw/master/Class%20Data%20Set%20Factored.csv", header = TRUE)`