6 R programming

This chapter covers the bare minimum of R programming needed for Exam PA. The book “R for Data Science” provides more detail.

https://r4ds.had.co.nz/

6.1 Notebook chunks

On the Exam, you will start with an .Rmd (R Markdown) template, which organize code into R Notebooks. Within each notebook, code is organized into chunks.

Your time is valuable. Throughout this book, I will include useful keyboard shortcuts.

Shortcut: To run everything in a chunk quickly, press CTRL + SHIFT + ENTER. To create a new chunk, use CTRL + ALT + I.

6.2 Basic operations

The usual math operations apply.

## [1] 3
## [1] 1
## [1] 4
## [1] 2
## [1] 8

There are two assignment operators: = and <-. The latter is preferred because it is specific to assigning a variable to a value. The = operator is also used for specifying arguments in functions (see the functions section).

Shortcut: ALT + - creates a <-..

## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] TRUE

Vectors can be added just like numbers. The c stands for “concatenate”, which creates vectors.

## [1] 4 6
## [1] 3 8
## [1] 16 36
## [1] 2 3
## [1] 7 9

I already mentioned numeric types. There are also character (string) types, factor types, and boolean types.

Character vectors can be combined with the paste() function.

## [1] "The Quick Brown Fox"

Factors look like character vectors but can only contain a finite number of predefined values.

The below factor has only one “level”, which is the list of assigned values.

## [1] "The"

The levels of a factor are by default in R in alphabetical order (Q comes alphabetically before T).

## [1] "Quick" "The"

In building linear models, the order of the factors matters. In GLMs, the “reference level” or “base level” should always be the level which has the most observations. This will be covered in the section on linear models.

Booleans are just TRUE and FALSE values. R understands T or TRUE in the same way, but the latter is preferred. When doing math, bools are converted to 0/1 values where 1 is equivalent to TRUE and 0 FALSE.

## [1] 0

Booleans are automatically converted into 0/1 values when there is a math operation.

## [1] 2

Vectors work in the same way.

## [1] 2

Vectors are indexed using [. If you are only extracting a single element, you should use [[ for clarity.

## [1] "a"
## [1] "b"
## [1] "a" "c"
## [1] "a" "b"
## [1] "a" "c"
## [1] "a"

6.3 Lists

Lists are vectors that can hold mixed object types.

## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] "Character"
## 
## [[3]]
## [1] 3.14

Lists can be named.

## $bool
## [1] TRUE
## 
## $character
## [1] "character"
## 
## $numeric
## [1] 3.14

The $ operator indexes lists.

## [1] 3.14
## [1] 8.14

Lists can also be indexed using [[.

## [1] TRUE
## [1] "character"

Lists can contain vectors, other lists, and any other object.

## $vector
## [1] 1 2 3
## 
## $character
## [1] "a" "b" "c"
## 
## $list
## $list$bool
## [1] TRUE
## 
## $list$character
## [1] "character"
## 
## $list$numeric
## [1] 3.14

To find out the type of an object, use class or str or summary.

## [1] "numeric"
## [1] "list"
## List of 3
##  $ vector   : num [1:3] 1 2 3
##  $ character: chr [1:3] "a" "b" "c"
##  $ list     :List of 3
##   ..$ bool     : logi TRUE
##   ..$ character: chr "character"
##   ..$ numeric  : num 3.14
##           Length Class  Mode     
## vector    3      -none- numeric  
## character 3      -none- character
## list      3      -none- list

6.4 Functions

You only need to understand the very basics of functions. The big picture, though, is that understanding functions helps you to understand everything in R, since R is a functional programming language, unlike Python, C, VBA, Java which are all object-oriented, or SQL which isn’t really a language but a series of set-operations.

Functions do things. The convention is to name a function as a verb. The function make_rainbows() would create a rainbow. The function summarise_vectors() would summarise vectors. Functions may or may not have an input and output.

If you need to do something in R, there is a high probability that someone has already written a function to do it. That being said, creating simple functions is quite useful.

Here is an example that has a side effect of printing the input:

## [1] "Hello, Future Actuary"

A function that returns something

When returning the last evaluated expression, the return statement is optional. In fact, it is discouraged by convention.

## [1] 7
## [1] 7

Binary operations in R are vectorized. In other words, they are applied element-wise.

## [1] 5 7 9

Many functions in R actually return lists! This is why R objects can be indexed with dollar sign.

## (Intercept)         age 
##   3165.8850    257.7226

Here’s a function that returns a list.

## [1] 5
## [1] 6

6.5 Data frames

You can think of a data frame as a table that is implemented as a list of vectors.

##   age has_fsa
## 1  25   FALSE
## 2  35    TRUE

You can also work with tibbles, which are data frames but have nicer printing:

## -- Attaching packages ------------------------------------------------------------ tidyverse 1.3.0 --
## <U+2713> ggplot2 3.2.1     <U+2713> purrr   0.3.3
## <U+2713> tibble  2.1.3     <U+2713> dplyr   0.8.3
## <U+2713> tidyr   1.0.0     <U+2713> stringr 1.4.0
## <U+2713> readr   1.3.1     <U+2713> forcats 0.4.0
## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## # A tibble: 2 x 2
##     age has_fsa
##   <dbl> <lgl>  
## 1    25 FALSE  
## 2    35 TRUE

To index columns in a tibble, the same “$” is used as indexing a list.

## [1] 25 35

To find the number of rows and columns, use dim.

## [1] 2 2

To find a summary, use summary

##       age        has_fsa       
##  Min.   :25.0   Mode :logical  
##  1st Qu.:27.5   FALSE:1        
##  Median :30.0   TRUE :1        
##  Mean   :30.0                  
##  3rd Qu.:32.5                  
##  Max.   :35.0