# Chapter 3 Intro to R

I can recall vividly how I started learning R as an undergrad and I told a friend of mine – a then grad student in education science and SPSS user – about it. He replied: “R? Isn’t that this incredibly fancy scientific calculator?” Well, he was not exactly right – but not really wrong either.

Today, you are going to make your first steps with R. In the following, you will learn how to use R as a fancy calculator. This encompasses that you can extend its functionality by installing packages, the possibility to do all kinds of calculations, storing data in objects of several kinds, and accessing them.

## 3.1 Installing packages

Being a fancy calculator implies that you can extend it as you want. One of the big upsides of using R is that due to its active community, whose members are permanently striving to make it a bit better, we useRs are basically standing on the shoulders of giants. You can install packages from CRAN by using the `install.packages()`

command.

```
#install.packages("tidyverse") # installs the tidyverse package
# insert '#' if you want R not to execute the things that stand to its right; pretty useful for annotating code
```

CRAN packages have to fulfill certain requirements and packages are updated at a certain pace. If you want to use other packages or get development versions, you can also install packages from GitHub using the `devtools`

package.

Before you can use a certain package in a session, you have to load it using the `library()`

command.

`library(tidyverse)`

Now you are good to go!

## 3.2 Basic arithmetic operations

Using R as a calculator looks like this:

`5 + 5`

`## [1] 10`

`5 + 5 * 3`

`## [1] 20`

`5 + 5^2`

`## [1] 30`

`sqrt(9)`

`## [1] 3`

The latter, `sqrt()`

, is no classic arithmetic operation but a function. It takes a non-negative number as input and returns its square root.

## 3.3 Vectors

R is vector-based. That implies that we can store multiple values in vectors and perform operations on them *by element*. This is pretty handy and distinguishes it from other languages like, for instance, C or Python (without NumPy).

In R, there are two kinds of vectors: atomic vectors and lists. Atomic vectors can only contain values of one type, whereas lists can contain atomic vectors of different types – and lists as well. It might be hard for you at first to wrap your head around this. However, it will become clear as soon as we fill it with some examples. Vectors can be characterized by two key properties: their type, which can be determined with `typeof()`

, and their length which can be assessed using `length()`

. `NULL`

is the absence of a vector. `NA`

, a missing value, is the absence of a value in a vector.

In the following, I first introduce atomic vectors. Afterwards, I describe lists. Finally, augmented vectors are to be introduced: factors, data frames/tibbles, and date/date-times. I will refer to atomic vectors as vectors, and to lists as lists. I will leave out matrices and arrays. We will not work with them in the course, and, honestly, I rarely use them myself.

This tutorial borrows heavily from Hadley Wickham’s “R for Data Science” (Wickham and Grolemund 2016), and Richard Cotton’s “Learning R” (Cotton 2013).

### 3.3.1 Atomic vectors

There exist six different types of atomic vectors: **logical**, **integer**, **double**, **character**, **complex**, and **raw**. The latter two are hardly used, hence I will not include them here. Integer and double are usually summarized under the umbrella term **numeric vectors**.

We can create a vector using the `c()`

function. “c” stands for “concatenate.”

#### 3.3.1.1 Logical vectors

Logical vectors can take three values: `TRUE`

, `FALSE`

, and `NA`

. While you can create them by hand (`logical_vec <- c(TRUE, FALSE, NA)`

), they are usually the result of comparisons. In R, you have six comparison operators:

`<`

`>`

`<=`

`>=`

`==`

(always use two equal signs)`!=`

(not equal)

`5 > 6`

`## [1] FALSE`

Sometimes, we want to store the results of what we are doing. Then, we assign our operation’s result to a meaningful name:

`<- 5 > 6 example_logical_vec `

You may wonder how you should name your objects. In this case, just consult the tidyverse style guide. Here, it says that you should use lowercase letters, numbers, and underscores (called “snake case”). In general, you should stick to the tidyverse style guide. The conventions you can find in there will make your life and the lives of the people who have the honor to read your code a lot easier. And if you find examples in this tutorial where I violate any of the conventions stated there and point it out, I owe you a hot beverage.

Logical vectors can also be used in a numerical context. If so, `TRUE`

becomes `1`

and `FALSE`

`0`

. You will see an example when we deal with the conversion of vectors to different types.

You can look at vectors by either typing in the name and then executing it, or by calling `head()`

. The latter is especially useful if the vectors are very long, since it only gives back the first 10 values by default. However, you can specify the length of the output by providing a different `n`

argument.

```
<- c(TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE)
example # too long example
```

```
## [1] TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE
## [13] FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE
```

`head(example, n = 5)`

`## [1] TRUE FALSE FALSE FALSE TRUE`

#### 3.3.1.2 Numeric vectors

Numbers in R are `double`

by default. To make a numeric vector an integer, add `L`

to a number, or use `as.integer()`

.

```
<- c(1, 2, 3, 4)
double_vec typeof(double_vec)
```

`## [1] "double"`

```
<- c(1L, 2L, 3L)
integer_vec typeof(integer_vec)
```

`## [1] "integer"`

`typeof(as.integer(double_vec))`

`## [1] "integer"`

Furthermore, you can create sequences of numbers by using the `:`

operator. This will also give you an integer.

```
<- 1:9
new_sequence new_sequence
```

`## [1] 1 2 3 4 5 6 7 8 9`

`typeof(new_sequence)`

`## [1] "integer"`

Note that doubles are only approximate, since they represent floating point numbers. In your every-day coding, you should not worry too much about it. However, keep it in mind later on. You can read more about it here (page 9).

Beyond that, Integers only have one special value – `NA`

, implying a missing value. Doubles have four: `NA`

– missing value, `NaN`

– not a number, and `Inf`

and `-Inf`

– infinite values. The latter three can be illustrated with the following example:

`c(-1, 0, 1) / 0`

`## [1] -Inf NaN Inf`

And, very important: use decimal points instead of decimal commas (especially applicable to Germans).

#### 3.3.1.3 Character vectors

The vectors of type `character`

can consist of more or less anything. The only thing that matters is that their inputs are wrapped with either " " or ’ ’ (which can come in handy if you want to store text):

```
<- c("hi", "1234", "!!1!", "#+*23$%&/(")
another_character typeof(another_character)
```

`## [1] "character"`

```
<- "I am my mother's child."
text_character <- '"It has never been easy to learn how to code," said my professor' direct_speech
```

You cannot really “do” anything with character vectors, except for comparison.

```
#text_character + direct_speech # remove '#' if you want to try
== text_character text_character
```

`## [1] TRUE`

`"b" > "a"`

`## [1] TRUE`

### 3.3.2 Working with atomic vectors

#### 3.3.2.1 Convert between types

You can either explicitly or implicitly convert a vector to a certain type.

For explicit conversion, or coercion, you can just call the respective `as.xxx()`

function: `as.logical()`

, `as.integer()`

, `as.double()`

, or `as.character()`

. However, calling these functions often implies that your vector had the wrong type in first place. Hence, try to avoid it if possible, and, therefore, this is used relatively rarely.

Implicit conversion happens by using a vector in a context in which a vector of a different type is expected. One example is dealing with logical vectors. As mentioned earlier, `TRUE`

is translated to `1`

, while `FALSE`

becomes `0`

. This can come in pretty handy:

```
<- sample(1000, 100, replace = TRUE) # draw 100 numbers between 1 and 1000
x <- x > 500 # whether numbers are greater than 500
y typeof(y)
```

`## [1] "logical"`

`sum(y) # how many are greater than 500`

`## [1] 47`

`mean(y) # proportion of numbers which are greater than 500`

`## [1] 0.47`

Also, if you build a vector out of multiple types – the most complex type always wins. Here, complex means that a vector can take many different values. Character vectors, for instance, can take basically every value:

`typeof(c(TRUE, 1L))`

`## [1] "integer"`

`typeof(c(1L, 1.5))`

`## [1] "double"`

`typeof(c(1.5, "abc"))`

`## [1] "character"`

#### 3.3.2.2 Naming elements

Elements of vectors can be named. This can either happen during creation:

`<- c(one = 1, two = 2, three = 3, four = 4, five = 5) named_vector `

Or in hindsight using `set_names()`

from the `purrr`

package (which is part of the core tidyverse and, therefore, does not need to be loaded explicitly):

`<- set_names(1:5, c("one", "two", "three", "four", "five")) named_vector `

#### 3.3.2.3 Accessing elements

If we want to access a certain element of the vector, we can tell R to do so by using square brackets `[ ]`

. This can also be used for some filtering:

`1] # first element named_vector[`

```
## one
## 1
```

`length(named_vector)] # last element, using a function, again named_vector[`

```
## five
## 5
```

`-3] # all elements but the third named_vector[`

```
## one two four five
## 1 2 4 5
```

`c(1, 3)] # first and third named_vector[`

```
## one three
## 1 3
```

`1:3] # first to third named_vector[`

```
## one two three
## 1 2 3
```

`== 3] # elements that equal three named_vector[named_vector `

```
## three
## 3
```

`%in% c(1, 2, 3)] # named_vectors that also are in another vector named_vector[named_vector `

```
## one two three
## 1 2 3
```

`> 2] # values that are bigger than 2 named_vector[named_vector `

```
## three four five
## 3 4 5
```

`rev(named_vector) # reverse vector -- using a function`

```
## five four three two one
## 5 4 3 2 1
```

`c(1, 1, 1, 2, 3, 3, 3)] # first first first second third third third element named_vector[`

```
## one one one two three three three
## 1 1 1 2 3 3 3
```

`c(TRUE, TRUE, TRUE, FALSE, TRUE)] # subsetting with a logical vector -- TRUE = value at the corresponding position is retained, FALSE = value at the corresponding position is dropped named_vector[`

```
## one two three five
## 1 2 3 5
```

`c("one", "three")] # if the vector is named, you can also select the correspondingly named elements with a character vector named_vector[`

```
## one three
## 1 3
```

As stated in the beginning, atomic vectors can only contain data of one type. If we want to store data of several types in one object, we need to use lists.

### 3.3.3 Lists

Lists can contain all types of vectors, including other lists. Due to the latter feature, they are also called “recursive vectors.”

Lists can be created using `list()`

. Naming elements works like naming elements of atomic vectors.

`<- list(numbers = 1:5, characters = c("Hello", "world", "!"), logical_vec = c(TRUE, FALSE), another_list = list(1:5, 6:10)) new_list `

In theory, you can, for instance, look at a list calling `head()`

:

`head(new_list)`

```
## $numbers
## [1] 1 2 3 4 5
##
## $characters
## [1] "Hello" "world" "!"
##
## $logical_vec
## [1] TRUE FALSE
##
## $another_list
## $another_list[[1]]
## [1] 1 2 3 4 5
##
## $another_list[[2]]
## [1] 6 7 8 9 10
```

Another possibility, which is especially suitable for lists, is `str()`

, because it focuses on the structure:

`str(new_list)`

```
## List of 4
## $ numbers : int [1:5] 1 2 3 4 5
## $ characters : chr [1:3] "Hello" "world" "!"
## $ logical_vec : logi [1:2] TRUE FALSE
## $ another_list:List of 2
## ..$ : int [1:5] 1 2 3 4 5
## ..$ : int [1:5] 6 7 8 9 10
```

#### 3.3.3.1 Accessing list elements

Accessing elements of a list is similar to vectors. There are basically three ways:

Using singular square brackets gives you a sub-list:

```
<- new_list[2]
sublist sublist
```

```
## $characters
## [1] "Hello" "world" "!"
```

`typeof(sublist)`

`## [1] "list"`

Double square brackets gives you the component:

```
<- new_list[[1]]
component_1 component_1
```

`## [1] 1 2 3 4 5`

`typeof(component_1)`

`## [1] "integer"`

A bit hard to grasp? I certainly agree! You can find a nice real-world metaphor here.

If the elements are named, you can also extract them using the `$`

operator:

```
<- new_list$numbers
vector_of_numbers vector_of_numbers
```

`## [1] 1 2 3 4 5`

`typeof(vector_of_numbers)`

`## [1] "integer"`

#### 3.3.3.2 Functions for working with vectors

`all()`

and `any()`

return whether all or any of the elements fulfill a certain condition.

`all(vector_of_numbers == 5)`

`## [1] FALSE`

`any(vector_of_numbers == 5)`

`## [1] TRUE`

You can also determine `which()`

element of the vector meets a certain condition.

`which(vector_of_numbers %in% c(1, 5))`

`## [1] 1 5`

`subset()`

enables you to filter out values in a vector.

`subset(vector_of_numbers, vector_of_numbers > 4)`

`## [1] 5`

### 3.3.4 Augmented vectors

In R, there are also other vector types. They are built upon the basic vectors – atomic vectors and lists. The most important ones are *factors* (built upon integers), *date*/*date-time* (built upon doubles), and *data frames*/*tibbles* (built upon lists).

#### 3.3.4.1 Factors

Factors are used in R to represent categorical variables. They can only take a limited amount of values. Think for example of something like party affiliation of members of the German parliament. This should be stored as a factor, because you have a limited set of values (i.e., AfD, Buendnis 90/Die Gruenen, CDU, CSU, Die Linke, FDP, SPD, fraktionslos) which apply to multiple politicians. Names, on the other hand, should be stored as characters, since there is (in theory) an infinite number of possible values.

Factors are built on top of integers. They have an attribute called “levels.”

```
<- factor(levels = c("AfD", "Buendnis90/Die Gruenen", "CDU", "CSU", "Die Linke", "SPD"))
mdbs levels(mdbs)
```

```
## [1] "AfD" "Buendnis90/Die Gruenen" "CDU"
## [4] "CSU" "Die Linke" "SPD"
```

`typeof(mdbs)`

`## [1] "integer"`

` mdbs`

```
## factor(0)
## Levels: AfD Buendnis90/Die Gruenen CDU CSU Die Linke SPD
```

In our daily workflow, we normally convert character vectors to factors using `as.factor()`

. We will learn more about factors – and the `forcats`

package which has been dedicated to them.

#### 3.3.4.2 Date and date-time

Dates are simply numeric vectors that indicate the number of days that have passed since 1970-01-01. We will work with dates using the `lubridate`

package.

`library(lubridate)`

```
##
## Attaching package: 'lubridate'
```

```
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
```

```
<- as.Date("1970-01-02")
date unclass(date)
```

`## [1] 1`

`typeof(date)`

`## [1] "double"`

Date-times work analogously: a numeric vector that represents the number of seconds that have passed since 1970-01-01 00:00:00.

```
<- ymd_hms("1970-01-01 01:00:00")
datetime unclass(datetime)
```

```
## [1] 3600
## attr(,"tzone")
## [1] "UTC"
```

If you want to learn more on dates and times, have a look at the `lubridate`

package which has been dedicated to them.

### 3.3.5 Data Frames/Tibbles

The data structure in R which is probably the most central for this course – and for working with the tidyverse in general – is the data frame (or Tibble, which is used in the context of the tidy packages). In the following, I will only focus on Tibbles. The differences between a Tibble and a data frame can be found here. Strictly speaking, they are augmented vectors, but since they are the most important data type when working with tidyverse packages.

Tibbles are built upon lists, but there are some crucial differences: Lists can contain everything (including other lists), Tibbles can only contain vectors (including lists) which are of the same length or length 1 (then the value is repeated to make the vector the same length as the others, so-called recycling). These variables need to have a name. For creating tibbles, we need the `tibble`

package which comes with the tidyverse. You can give elements names which are invalid variable names in R (e.g., because they contain spaces) by wrapping them with ````

. If you want to work with this variable afterwards, you will also have to wrap its name with back ticks. When you’re working in RStudio, you can open a separate tab containing the tibble by either clicking on the object in the “environment” pane or by using the `View()`

command (I had to comment it out in the script because otherwise the RMarkdown document would not have knit).

```
<- tibble(
new_tibble a = 1:5,
b = c("Hi", ",", "it's", "me", "!"),
`an invalid name` = TRUE
) new_tibble
```

```
## # A tibble: 5 × 3
## a b `an invalid name`
## <int> <chr> <lgl>
## 1 1 Hi TRUE
## 2 2 , TRUE
## 3 3 it's TRUE
## 4 4 me TRUE
## 5 5 ! TRUE
```

`# View(new_tibble)`

You can access a Tibble’s columns by their name by either using the `$`

operator, or `[["`

– like when you access named elements in a list. This will return the vector:

`$a new_tibble`

`## [1] 1 2 3 4 5`

`typeof(new_tibble$a)`

`## [1] "integer"`

`"a"]] new_tibble[[`

`## [1] 1 2 3 4 5`

You can also extract by position using `[[`

:

`3]] new_tibble[[`

`## [1] TRUE TRUE TRUE TRUE TRUE`

As it returns a vector, you can extract the vector’s value by just adding the expression in square brackets:

`1]][[2]] # second value of first column new_tibble[[`

`## [1] 2`

Another way of accessing specific elements is by `[row, column]`

.

`1]][[2]] == new_tibble[2, 1] # second value of first column new_tibble[[`

```
## a
## [1,] TRUE
```

Also, you can access the entire row by leaving out the column and vice versa:

`2, ] #second row new_tibble[`

```
## # A tibble: 1 × 3
## a b `an invalid name`
## <int> <chr> <lgl>
## 1 2 , TRUE
```

`1] #first column new_tibble[, `

```
## # A tibble: 5 × 1
## a
## <int>
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
```

## 3.4 Further links

- More on factors can be found here (McNamara and Horton 2017).
- “The R Inferno” by Patrick Burns (pun probably not intended) is always nice to come back to (Burns 2011).
- Read the tidyverse style guide – and then stick to it.
- Probably one of the hardest things in this tutorials to get one’s head around are factors. Here you can find more about them.
- Some basic tutorials.
- If you want to learn more about data types, click here.
- Find a description of functions for vector manipulation here.

### References

*The R Inferno*.

*Learning R*. First Edition. Beijing ; Sebastopol, CA: O’Reilly.

*R for Data Science: Import, Tidy, Transform, Visualize, and Model Data*. First edition. Sebastopol, CA: O’Reilly.