# 6 R programming

This chapter covers the bare minimum of R programming needed for Exam PA. The book “R for Data Science” provides more detail.

## 6.1 Notebook chunks

On the Exam, you will start with an .Rmd (R Markdown) template, which organize code into R Notebooks. Within each notebook, code is organized into chunks.

``# This is a chunk``

Your time is valuable. Throughout this book, I will include useful keyboard shortcuts.

Shortcut: To run everything in a chunk quickly, press `CTRL + SHIFT + ENTER`. To create a new chunk, use `CTRL + ALT + I`.

## 6.2 Basic operations

The usual math operations apply.

``````# Addition
1 + 2 ``````
``##  3``
``3 - 2``
``##  1``
``````# Multiplication
2 * 2``````
``##  4``
``````# Division
4 / 2``````
``##  2``
``````# Exponentiation
2^3``````
``##  8``

There are two assignment operators: `=` and `<-`. The latter is preferred because it is specific to assigning a variable to a value. The `=` operator is also used for specifying arguments in functions (see the functions section).

Shortcut: `ALT + -` creates a `<-`..

``````# Variable assignment
y <- 2

# Equality
4 == 2``````
``##  FALSE``
``5 == 5``
``##  TRUE``
``3.14 > 3``
``##  TRUE``
``3.14 >= 3``
``##  TRUE``

Vectors can be added just like numbers. The `c` stands for “concatenate”, which creates vectors.

``````x <- c(1, 2)
y <- c(3, 4)
x + y``````
``##  4 6``
``x * y``
``##  3 8``
``````z <- x + y
z^2``````
``##  16 36``
``z / 2``
``##  2 3``
``z + 3``
``##  7 9``

I already mentioned `numeric` types. There are also `character` (string) types, `factor` types, and `boolean` types.

``````character <- "The"
character_vector <- c("The", "Quick")``````

Character vectors can be combined with the `paste()` function.

``````a <- "The"
b <- "Quick"
c <- "Brown"
d <- "Fox"
paste(a, b, c, d)``````
``##  "The Quick Brown Fox"``

Factors look like character vectors but can only contain a finite number of predefined values.

The below factor has only one “level”, which is the list of assigned values.

``````factor <- as.factor(character)
levels(factor)``````
``##  "The"``

The levels of a factor are by default in R in alphabetical order (Q comes alphabetically before T).

``````factor_vector <- as.factor(character_vector)
levels(factor_vector)``````
``##  "Quick" "The"``

In building linear models, the order of the factors matters. In GLMs, the “reference level” or “base level” should always be the level which has the most observations. This will be covered in the section on linear models.

Booleans are just `TRUE` and `FALSE` values. R understands `T` or `TRUE` in the same way, but the latter is preferred. When doing math, bools are converted to 0/1 values where 1 is equivalent to TRUE and 0 FALSE.

``````bool_true <- TRUE
bool_false <- FALSE
bool_true * bool_false``````
``##  0``

Booleans are automatically converted into 0/1 values when there is a math operation.

``bool_true + 1``
``##  2``

Vectors work in the same way.

``````bool_vect <- c(TRUE, TRUE, FALSE)
sum(bool_vect)``````
``##  2``

Vectors are indexed using `[`. If you are only extracting a single element, you should use `[[` for clarity.

``````abc <- c("a", "b", "c")
abc[]``````
``##  "a"``
``abc[]``
``##  "b"``
``abc[c(1, 3)]``
``##  "a" "c"``
``abc[c(1, 2)]``
``##  "a" "b"``
``abc[-2]``
``##  "a" "c"``
``abc[-c(2, 3)]``
``##  "a"``

## 6.3 Lists

Lists are vectors that can hold mixed object types.

``````my_list <- list(TRUE, "Character", 3.14)
my_list``````
``````## []
##  TRUE
##
## []
##  "Character"
##
## []
##  3.14``````

Lists can be named.

``````my_list <- list(bool = TRUE, character = "character", numeric = 3.14)
my_list``````
``````## \$bool
##  TRUE
##
## \$character
##  "character"
##
## \$numeric
##  3.14``````

The `\$` operator indexes lists.

``my_list\$numeric``
``##  3.14``
``my_list\$numeric + 5``
``##  8.14``

Lists can also be indexed using `[[`.

``my_list[]``
``##  TRUE``
``my_list[]``
``##  "character"``

Lists can contain vectors, other lists, and any other object.

``````everything <- list(vector = c(1, 2, 3),
character = c("a", "b", "c"),
list = my_list)
everything``````
``````## \$vector
##  1 2 3
##
## \$character
##  "a" "b" "c"
##
## \$list
## \$list\$bool
##  TRUE
##
## \$list\$character
##  "character"
##
## \$list\$numeric
##  3.14``````

To find out the type of an object, use `class` or `str` or `summary`.

``class(x)``
``##  "numeric"``
``class(everything)``
``##  "list"``
``str(everything)``
``````## List of 3
##  \$ vector   : num [1:3] 1 2 3
##  \$ character: chr [1:3] "a" "b" "c"
##  \$ list     :List of 3
##   ..\$ bool     : logi TRUE
##   ..\$ character: chr "character"
##   ..\$ numeric  : num 3.14``````
``summary(everything)``
``````##           Length Class  Mode
## vector    3      -none- numeric
## character 3      -none- character
## list      3      -none- list``````

## 6.4 Functions

You only need to understand the very basics of functions. The big picture, though, is that understanding functions helps you to understand everything in R, since R is a functional programming language, unlike Python, C, VBA, Java which are all object-oriented, or SQL which isn’t really a language but a series of set-operations.

Functions do things. The convention is to name a function as a verb. The function `make_rainbows()` would create a rainbow. The function `summarise_vectors()` would summarise vectors. Functions may or may not have an input and output.

If you need to do something in R, there is a high probability that someone has already written a function to do it. That being said, creating simple functions is quite useful.

Here is an example that has a side effect of printing the input:

``````greet_me <- function(my_name){
print(paste0("Hello, ", my_name))
}

greet_me("Future Actuary")``````
``##  "Hello, Future Actuary"``

A function that returns something

When returning the last evaluated expression, the `return` statement is optional. In fact, it is discouraged by convention.

``````add_together <- function(x, y) {
x + y
}

``##  7``
``````add_together <- function(x, y) {
return(x + y)
}

``##  7``

Binary operations in R are vectorized. In other words, they are applied element-wise.

``````x_vector <- c(1, 2, 3)
y_vector <- c(4, 5, 6)
``##  5 7 9``

Many functions in R actually return lists! This is why R objects can be indexed with dollar sign.

``````library(ExamPAData)
model <- lm(charges ~ age, data = health_insurance)
model\$coefficients``````
``````## (Intercept)         age
##   3165.8850    257.7226``````

Here’s a function that returns a list.

``````sum_multiply <- function(x,y) {
sum <- x + y
product <- x * y
list("Sum" = sum, "Product" = product)
}

result <- sum_multiply(2, 3)
result\$Sum``````
``##  5``
``result\$Product``
``##  6``

## 6.5 Data frames

You can think of a data frame as a table that is implemented as a list of vectors.

``````df <- data.frame(
age = c(25, 35),
has_fsa = c(FALSE, TRUE)
)
df``````
``````##   age has_fsa
## 1  25   FALSE
## 2  35    TRUE``````

You can also work with tibbles, which are data frames but have nicer printing:

``````# The tidyverse library has functions for making tibbles
library(tidyverse) ``````
``## -- Attaching packages ------------------------------------------------------------ tidyverse 1.3.0 --``
``````## <U+2713> ggplot2 3.2.1     <U+2713> purrr   0.3.3
## <U+2713> tibble  2.1.3     <U+2713> dplyr   0.8.3
## <U+2713> tidyr   1.0.0     <U+2713> stringr 1.4.0
## <U+2713> readr   1.3.1     <U+2713> forcats 0.4.0``````
``````## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
``````df <- tibble(
age = c(25, 35), has_fsa = c(FALSE, TRUE)
)
df``````
``````## # A tibble: 2 x 2
##     age has_fsa
##   <dbl> <lgl>
## 1    25 FALSE
## 2    35 TRUE``````

To index columns in a tibble, the same “\$” is used as indexing a list.

``df\$age``
``##  25 35``

To find the number of rows and columns, use `dim`.

``dim(df)``
``##  2 2``

To find a summary, use `summary`

``summary(df)``
``````##       age        has_fsa
##  Min.   :25.0   Mode :logical
##  1st Qu.:27.5   FALSE:1
##  Median :30.0   TRUE :1
##  Mean   :30.0
##  3rd Qu.:32.5
##  Max.   :35.0``````