# 6 R programming

This chapter covers the bare minimum of R programming needed for Exam PA. The book “R for Data Science” provides more detail.

## 6.1 Notebook chunks

On the Exam, you will start with an .Rmd (R Markdown) template, which organize code into R Notebooks. Within each notebook, code is organized into chunks.

Your time is valuable. Throughout this book, I will include useful keyboard shortcuts.

Shortcut:To run everything in a chunk quickly, press`CTRL + SHIFT + ENTER`

. To create a new chunk, use`CTRL + ALT + I`

.

## 6.2 Basic operations

The usual math operations apply.

`## [1] 3`

`## [1] 1`

`## [1] 4`

`## [1] 2`

`## [1] 8`

There are two assignment operators: `=`

and `<-`

. The latter is preferred because
it is specific to assigning a variable to a value. The `=`

operator is also used
for specifying arguments in functions (see the functions section).

Shortcut:`ALT + -`

creates a`<-`

..

`## [1] FALSE`

`## [1] TRUE`

`## [1] TRUE`

`## [1] TRUE`

Vectors can be added just like numbers. The `c`

stands for “concatenate”, which
creates vectors.

`## [1] 4 6`

`## [1] 3 8`

`## [1] 16 36`

`## [1] 2 3`

`## [1] 7 9`

I already mentioned `numeric`

types. There are also `character`

(string) types,
`factor`

types, and `boolean`

types.

Character vectors can be combined with the `paste()`

function.

`## [1] "The Quick Brown Fox"`

Factors look like character vectors but can only contain a finite number of predefined values.

The below factor has only one “level”, which is the list of assigned values.

`## [1] "The"`

The levels of a factor are by default in R in alphabetical order (Q comes alphabetically before T).

`## [1] "Quick" "The"`

**In building linear models, the order of the factors matters.** In GLMs, the
“reference level” or “base level” should always be the level which has the most
observations. This will be covered in the section on linear models.

Booleans are just `TRUE`

and `FALSE`

values. R understands `T`

or `TRUE`

in the
same way, but the latter is preferred. When doing math, bools are converted to
0/1 values where 1 is equivalent to TRUE and 0 FALSE.

`## [1] 0`

Booleans are automatically converted into 0/1 values when there is a math operation.

`## [1] 2`

Vectors work in the same way.

`## [1] 2`

Vectors are indexed using `[`

. If you are only extracting a single element, you
should use `[[`

for clarity.

`## [1] "a"`

`## [1] "b"`

`## [1] "a" "c"`

`## [1] "a" "b"`

`## [1] "a" "c"`

`## [1] "a"`

## 6.3 Lists

Lists are vectors that can hold mixed object types.

```
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] "Character"
##
## [[3]]
## [1] 3.14
```

Lists can be named.

```
## $bool
## [1] TRUE
##
## $character
## [1] "character"
##
## $numeric
## [1] 3.14
```

The `$`

operator indexes lists.

`## [1] 3.14`

`## [1] 8.14`

Lists can also be indexed using `[[`

.

`## [1] TRUE`

`## [1] "character"`

Lists can contain vectors, other lists, and any other object.

```
## $vector
## [1] 1 2 3
##
## $character
## [1] "a" "b" "c"
##
## $list
## $list$bool
## [1] TRUE
##
## $list$character
## [1] "character"
##
## $list$numeric
## [1] 3.14
```

To find out the type of an object, use `class`

or `str`

or `summary`

.

`## [1] "numeric"`

`## [1] "list"`

```
## List of 3
## $ vector : num [1:3] 1 2 3
## $ character: chr [1:3] "a" "b" "c"
## $ list :List of 3
## ..$ bool : logi TRUE
## ..$ character: chr "character"
## ..$ numeric : num 3.14
```

```
## Length Class Mode
## vector 3 -none- numeric
## character 3 -none- character
## list 3 -none- list
```

## 6.4 Functions

You only need to understand the very basics of functions. The big picture, though, is that
understanding functions helps you to understand *everything* in R, since R is a
functional programming language,
unlike Python, C, VBA, Java which are all object-oriented, or SQL which isn’t
really a language but a series of set-operations.

Functions do things. The convention is to name a function as a verb. The function
`make_rainbows()`

would create a rainbow. The function `summarise_vectors()`

would summarise vectors. Functions may or may not have an input and output.

If you need to do something in R, there is a high probability that someone has already written a function to do it. That being said, creating simple functions is quite useful.

Here is an example that has a side effect of printing the input:

`## [1] "Hello, Future Actuary"`

**A function that returns something**

When returning the last evaluated expression, the `return`

statement is optional.
In fact, it is discouraged by convention.

`## [1] 7`

`## [1] 7`

Binary operations in R are vectorized. In other words, they are applied element-wise.

`## [1] 5 7 9`

Many functions in R actually return lists! This is why R objects can be indexed with dollar sign.

```
## (Intercept) age
## 3165.8850 257.7226
```

Here’s a function that returns a list.

```
sum_multiply <- function(x,y) {
sum <- x + y
product <- x * y
list("Sum" = sum, "Product" = product)
}
result <- sum_multiply(2, 3)
result$Sum
```

`## [1] 5`

`## [1] 6`

## 6.5 Data frames

You can think of a data frame as a table that is implemented as a list of vectors.

```
## age has_fsa
## 1 25 FALSE
## 2 35 TRUE
```

You can also work with tibbles, which are data frames but have nicer printing:

`## -- Attaching packages ------------------------------------------------------------ tidyverse 1.3.0 --`

```
## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3
## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3
## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0
## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0
```

```
## -- Conflicts --------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
```

```
## # A tibble: 2 x 2
## age has_fsa
## <dbl> <lgl>
## 1 25 FALSE
## 2 35 TRUE
```

To index columns in a tibble, the same “$” is used as indexing a list.

`## [1] 25 35`

To find the number of rows and columns, use `dim`

.

`## [1] 2 2`

To find a summary, use `summary`

```
## age has_fsa
## Min. :25.0 Mode :logical
## 1st Qu.:27.5 FALSE:1
## Median :30.0 TRUE :1
## Mean :30.0
## 3rd Qu.:32.5
## Max. :35.0
```