6 R programming
This chapter covers the bare minimum of R programming needed for Exam PA. The book “R for Data Science” provides more detail.
6.1 Notebook chunks
On the Exam, you will start with an .Rmd (R Markdown) template, which organize code into R Notebooks. Within each notebook, code is organized into chunks.
Your time is valuable. Throughout this book, I will include useful keyboard shortcuts.
Shortcut: To run everything in a chunk quickly, press
CTRL + SHIFT + ENTER
. To create a new chunk, useCTRL + ALT + I
.
6.2 Basic operations
The usual math operations apply.
## [1] 3
## [1] 1
## [1] 4
## [1] 2
## [1] 8
There are two assignment operators: =
and <-
. The latter is preferred because
it is specific to assigning a variable to a value. The =
operator is also used
for specifying arguments in functions (see the functions section).
Shortcut:
ALT + -
creates a<-
..
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] TRUE
Vectors can be added just like numbers. The c
stands for “concatenate”, which
creates vectors.
## [1] 4 6
## [1] 3 8
## [1] 16 36
## [1] 2 3
## [1] 7 9
I already mentioned numeric
types. There are also character
(string) types,
factor
types, and boolean
types.
Character vectors can be combined with the paste()
function.
## [1] "The Quick Brown Fox"
Factors look like character vectors but can only contain a finite number of predefined values.
The below factor has only one “level”, which is the list of assigned values.
## [1] "The"
The levels of a factor are by default in R in alphabetical order (Q comes alphabetically before T).
## [1] "Quick" "The"
In building linear models, the order of the factors matters. In GLMs, the “reference level” or “base level” should always be the level which has the most observations. This will be covered in the section on linear models.
Booleans are just TRUE
and FALSE
values. R understands T
or TRUE
in the
same way, but the latter is preferred. When doing math, bools are converted to
0/1 values where 1 is equivalent to TRUE and 0 FALSE.
## [1] 0
Booleans are automatically converted into 0/1 values when there is a math operation.
## [1] 2
Vectors work in the same way.
## [1] 2
Vectors are indexed using [
. If you are only extracting a single element, you
should use [[
for clarity.
## [1] "a"
## [1] "b"
## [1] "a" "c"
## [1] "a" "b"
## [1] "a" "c"
## [1] "a"
6.3 Lists
Lists are vectors that can hold mixed object types.
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] "Character"
##
## [[3]]
## [1] 3.14
Lists can be named.
## $bool
## [1] TRUE
##
## $character
## [1] "character"
##
## $numeric
## [1] 3.14
The $
operator indexes lists.
## [1] 3.14
## [1] 8.14
Lists can also be indexed using [[
.
## [1] TRUE
## [1] "character"
Lists can contain vectors, other lists, and any other object.
## $vector
## [1] 1 2 3
##
## $character
## [1] "a" "b" "c"
##
## $list
## $list$bool
## [1] TRUE
##
## $list$character
## [1] "character"
##
## $list$numeric
## [1] 3.14
To find out the type of an object, use class
or str
or summary
.
## [1] "numeric"
## [1] "list"
## List of 3
## $ vector : num [1:3] 1 2 3
## $ character: chr [1:3] "a" "b" "c"
## $ list :List of 3
## ..$ bool : logi TRUE
## ..$ character: chr "character"
## ..$ numeric : num 3.14
## Length Class Mode
## vector 3 -none- numeric
## character 3 -none- character
## list 3 -none- list
6.4 Functions
You only need to understand the very basics of functions. The big picture, though, is that understanding functions helps you to understand everything in R, since R is a functional programming language, unlike Python, C, VBA, Java which are all object-oriented, or SQL which isn’t really a language but a series of set-operations.
Functions do things. The convention is to name a function as a verb. The function
make_rainbows()
would create a rainbow. The function summarise_vectors()
would summarise vectors. Functions may or may not have an input and output.
If you need to do something in R, there is a high probability that someone has already written a function to do it. That being said, creating simple functions is quite useful.
Here is an example that has a side effect of printing the input:
## [1] "Hello, Future Actuary"
A function that returns something
When returning the last evaluated expression, the return
statement is optional.
In fact, it is discouraged by convention.
## [1] 7
## [1] 7
Binary operations in R are vectorized. In other words, they are applied element-wise.
## [1] 5 7 9
Many functions in R actually return lists! This is why R objects can be indexed with dollar sign.
## (Intercept) age
## 3165.8850 257.7226
Here’s a function that returns a list.
sum_multiply <- function(x,y) {
sum <- x + y
product <- x * y
list("Sum" = sum, "Product" = product)
}
result <- sum_multiply(2, 3)
result$Sum
## [1] 5
## [1] 6
6.5 Data frames
You can think of a data frame as a table that is implemented as a list of vectors.
## age has_fsa
## 1 25 FALSE
## 2 35 TRUE
You can also work with tibbles, which are data frames but have nicer printing:
## -- Attaching packages ------------------------------------------------------------ tidyverse 1.3.0 --
## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3
## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3
## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0
## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0
## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## # A tibble: 2 x 2
## age has_fsa
## <dbl> <lgl>
## 1 25 FALSE
## 2 35 TRUE
To index columns in a tibble, the same “$” is used as indexing a list.
## [1] 25 35
To find the number of rows and columns, use dim
.
## [1] 2 2
To find a summary, use summary
## age has_fsa
## Min. :25.0 Mode :logical
## 1st Qu.:27.5 FALSE:1
## Median :30.0 TRUE :1
## Mean :30.0
## 3rd Qu.:32.5
## Max. :35.0