## 1.4 Vectors

*Vectors* are the most common and most important data type in R. For our purposes, a *vector* is an ordered sequence of objects of the *same type*.

### 1.4.1 Basics

Think of a vector as a freight train that consists of the same type of waggons: Each waggon can contain different cargo (objects), but all of these objects must be of the same type (e.g., numbers or characters). As the waggons are indistinguishable from the outside, the contents of any particular waggon can be determined by addressing the waggon by its *name* (if it has one) or its *number* in the sequence. For instance, suppose there is a vector named `train`

. The following functions provide some basic information about it:

```
# Print the vector:
train # (see its names and elements)
#> w001 w002 w003 w004 w005 w006 w007 w008 w009 w010 w011
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "corn"
#> w012 w013 w014 w015 w016 w017 w018 w019 w020 w021 w022
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold" "coal"
#> w023 w024 w025 w026 w027 w028 w029 w030 w031
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal"
# Basic information about a vector:
typeof(train) # Of which type are these waggons/elements?
#> [1] "character"
length(train) # How many waggons/elements does train consist of?
#> [1] 31
names(train) # What are the names of the waggons/elements?
#> [1] "w001" "w002" "w003" "w004" "w005" "w006" "w007" "w008" "w009" "w010"
#> [11] "w011" "w012" "w013" "w014" "w015" "w016" "w017" "w018" "w019" "w020"
#> [21] "w021" "w022" "w023" "w024" "w025" "w026" "w027" "w028" "w029" "w030"
#> [31] "w031"
# Accessing the contents of particular waggons/elements:
train[1] # by number: What's in 1st waggon/element?
#> w001
#> "coal"
train["w001"] # by name: What's in the waggon named "w001"?
#> w001
#> "coal"
```

To obtain information about the vector `train`

, we applied some basic *functions* to it. Let’s explain each of these functions in turn:

Evaluating

`train`

prints the entire vector in our Console. Actually, it is merely a shorthand for`print(train)`

. If the vector is short enough to be printed in its entirety, we can obtain comprehensive information about it (i.e., its type, length, names, and all its elements). But even if the vector is too long to be printed, we can still learn its type, whether its elements are named, and what its first elements are.We learned that all elements of a vector are of the same

*type*. The function`typeof(v)`

explicitly determines the type of a vector`v`

.As vectors are 1-dimensional sequences, they have a

*length*. The function`length(v)`

determines the length of a vector`v`

.Vectors can have

*names*, but do not need to have them. The function`names(v)`

determines the names of a vector`v`

and returns`NULL`

if its elements are not named.The contents of particular vector elements can be accessed by their number or their name (if

`v`

contains names). To access vector elements, R uses*square*brackets`[]`

. The command`v[n]`

returns the`n`

-th element of a vector`v`

, and the command`v["abc"]`

returns the element named “abc” of a vector`v`

(if`v`

contains an element with this name).

We will learn more about accessing particular vector elements when we learn more about *indexing* below (in Section 1.4.5).

#### Practice

The following objects `v1`

and `v2`

are both vectors (i.e., an ordered sequence of elements) that have a length of 4 elements, but are of different types:

```
#> [1] 1 2 3 4
#> [1] "A" "B" "C" "D"
```

- Use basic vector functions to answer the following questions:

- Of what type are the elements of
`v1`

and`v2`

? - What is their i-th (e.g., 2nd) element?
- What is their length?

Applying basic vector functions:

```
# Of which type are the vectors?
typeof(v1)
#> [1] "integer"
typeof(v2)
#> [1] "character"
# What is the 2nd element?
v1[2]
#> [1] 2
v2[2]
#> [1] "B"
# What is their length?
length(v1)
#> [1] 4
length(v2)
#> [1] 4
```

- Additional vector functions allow answering question like the following:

- What is the reverse sequence of
`v2`

? - Do
`v1`

or`v2`

contain some element`"A"`

? - Which element of
`v2`

has a particular value (e.g.,`"B"`

or`"Z"`

)?

To answer these questions, we can apply slightly more advanced functions to vectors:

```
rev(v2) # reverse of v2?
#> [1] "D" "C" "B" "A"
"A" %in% v1 # does v1 contain "A"?
#> [1] FALSE
"A" %in% v2 # does v2 contain "A"?
#> [1] TRUE
which(v2 == "C") # which element of v2 is "C"?
#> [1] 3
which(v2 == "X") # which element of v2 is "X"?
#> integer(0)
```

We will learn more about these functions later. For now, just try to understand them and remember that there are many ways to probe and manipulate vectors.

### 1.4.2 Creating vectors

To create a new vector, we can *combine* several objects of the *same type* with the `c()`

function, separating vector elements by commas:

```
# Creating vectors:
c(1, 2, 3)
#> [1] 1 2 3
# Combining scalar objects (defined above):
c(a, b)
#> [1] 100 200
v <- c(a, b, c)
v
#> [1] 100 200 3
v <- c(c, c, c) # vectors can have repeated elements
v
#> [1] 3 3 3
# Combining scalar objects and vectors (defined above):
v <- c(a, b, v) # Note that vectors can contain vectors, ...
v
#> [1] 100 200 3 3 3
v <- c(v, v) # but the result is only 1 vector, not 2.
v
#> [1] 100 200 3 3 3 100 200 3 3 3
# Combining characters to vectors:
w <- c("one", "two", "three")
w
#> [1] "one" "two" "three"
w <- c(w, "four", "5", "many")
w
#> [1] "one" "two" "three" "four" "5" "many"
```

Most functions in R can be applied to vectors and typically either return some information *about* or a transformed *version of* the vector:

```
length(v)
#> [1] 10
typeof(v)
#> [1] "double"
sum(v)
#> [1] 618
length(w)
#> [1] 6
typeof(w)
#> [1] "character"
# sum(w) # would yield an error
values <- c(2>1, 1>2, 2==2)
values # vector of truth values
#> [1] TRUE FALSE TRUE
!values # reverses truth values
#> [1] FALSE TRUE FALSE
```

When combining data of different types (e.g., numbers and characters) in a vector, R coerces them into a common type (typically characters):

```
x <- c(1, "two", 3)
x
#> [1] "1" "two" "3"
typeof(x) # converted 1 to "1"
#> [1] "character"
y <- c(TRUE, "FALSE")
y
#> [1] "TRUE" "FALSE"
typeof(y) # converted TRUE to "TRUE"
#> [1] "character"
```

#### Practice

- What happens when we combine truth values and numeric objects into a vector?

```
z <- c(TRUE, FALSE, 100)
z
typeof(z)
sum(z)
```

#### Calculating with logical values

When combining truth values and numeric objects in a vector, a *numeric* vector is returned. Any value of `TRUE`

is interpreted as the number 1 and any value of `FALSE`

is interpreted as the number 0. The same interpretation of truth values is made when applying arithmetic functions to truth values:

```
sum(c(TRUE, FALSE, TRUE))
#> [1] 2
mean(c(TRUE, FALSE, FALSE))
#> [1] 0.3333333
```

### 1.4.3 Scalar objects are vectors

Actually, R has no dedicated type of scalar objects. Instead, individual numbers (of type integer or double) or text strings (of type character) are actually *vectors* of length 1. We can verify this by applying the `is.vector`

and `length`

functions to the scalars defined above:

```
a
#> [1] 100
is.vector(a)
#> [1] TRUE
length(a)
#> [1] 1
typeof(a)
#> [1] "double"
d
#> [1] "weird"
is.vector(d)
#> [1] TRUE
length(d)
#> [1] 1
typeof(d)
#> [1] "character"
e
#> [1] TRUE
is.vector(e)
#> [1] TRUE
length(e)
#> [1] 1
typeof(e)
#> [1] "logical"
```

### 1.4.4 Vector creation functions

For creating vectors with more than just a few elements (i.e., with larger `length`

values), using the `c`

function and then typing all vector elements becomes impractical. Useful functions and shortcuts to generate continuous or regular sequences are the functions `seq`

(and its variant `:`

) and `rep`

:

```
# Sequences (with sep):
s1 <- seq(0, 100, 1) # is short for:
s2 <- seq(from = 0, to = 100, by = 1)
s2
#> [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#> [26] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
#> [51] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
#> [ reached getOption("max.print") -- omitted 26 entries ]
all.equal(s1, s2)
#> [1] TRUE
# Shorter version (with by = 1):
s3 <- 0:100
all.equal(s1, s3)
#> [1] TRUE
# But seq allows different step sizes:
s4 <- seq(0, 100, by = 25)
s4
#> [1] 0 25 50 75 100
# Replicating vectors (with rep):
s5 <- rep(c(0, 1), 3) # is short for:
s5 <- rep(x = c(0, 1), times = 3)
s5
#> [1] 0 1 0 1 0 1
```

A common task in psychology and statistics is drawing a *sample* from a given set of objects. In R, the `sample`

function allows drawing a sample of size `size`

from a population `x`

, and a logical argument `replace`

specifies whether the sample is to be drawn with or without replacement. The result of `sample`

is a vector of length `size`

:

```
# Sampling vector elements (with sample):
sample(x = 1:3, size = 10, replace = TRUE)
#> [1] 1 2 1 1 3 3 2 2 3 2
# Note:
# sample(1:3, 10)
# would yield an error (as replace = FALSE by default).
# Binary sample (coin flip):
coin <- c("H", "T") # 2 events: Heads or Tails
sample(coin, 5, TRUE) # is short for:
#> [1] "T" "H" "H" "T" "H"
sample(x = coin, size = 5, replace = TRUE) # flip coin 5 times
#> [1] "H" "T" "H" "T" "H"
sample(x = coin, size = 1000, replace = TRUE) # flip coin 1000 times
#> [1] "H" "T" "H" "T" "T" "T" "H" "H" "H" "T" "T" "H" "H" "H" "H" "T" "T" "T" "H"
#> [20] "H" "T" "H" "T" "H" "T" "T" "T" "H" "H" "T" "H" "H" "T" "T" "T" "H" "T" "H"
#> [39] "T" "T" "H" "T" "T" "H" "H" "H" "H" "H" "T" "H" "T" "H" "H" "H" "H" "H" "T"
#> [58] "T" "T" "H" "T" "H" "T" "T" "H" "T" "T" "H" "H" "H" "H" "T" "H" "T" "H"
#> [ reached getOption("max.print") -- omitted 925 entries ]
```

#### Practice

- Evaluate the following expressions and explain their results (e.g., by looking up the documentation of
`?seq`

and`?sample`

):

```
seq(0, 10, by = 3)
seq(0, 10, length.out = 10)
sample(c("A", "B", "C"), size = 3)
sample(c("A", "B", "C"), size = 4)
sample(c("A", "B", "C"), size = 5, replace = TRUE)
```

### 1.4.5 Indexing vectors

We have learned above how inspect vectors and how to determine their type, length, names, and particular elements (see Section 1.4.1 above). An important concept in the context of vectors (and other R data structures) is the notion of a *numeric index*, which is a number that denotes the position of an element in a sequence. This sounds more difficult than it is: We already used numeric indices above to access particular elements of vectors, for instance:

```
# Assuming vector definitions (from above):
train
#> w001 w002 w003 w004 w005 w006 w007 w008 w009 w010 w011
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "corn"
#> w012 w013 w014 w015 w016 w017 w018 w019 w020 w021 w022
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold" "coal"
#> w023 w024 w025 w026 w027 w028 w029 w030 w031
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal"
v1
#> [1] 1 2 3 4
v2
#> [1] "A" "B" "C" "D"
# Numeric indexing for 1 element:
train[1] # get the 1st element of train
#> w001
#> "coal"
v1[2] # get the 2nd element of v1
#> [1] 2
v2[3] # get the 3rd element of v2
#> [1] "C"
```

Again, selecting vector elements by an index requires enclosing it in square brackets `[]`

, whereas functions always enclose their arguments in round parentheses `()`

.

In R, we often store a large number of values in vectors (e.g., the age values of 5,000 participants), but only need some of them for answering specific questions (e.g., what is the average age of all male participants?). To *select* only a subset of elements from a vector `v`

we can specify the condition or criterion for our selection in (square) brackets `v[...]`

. Supplying a numeric or logical index (in square brackets) is a very flexible and powerful way for selecting elements of a vector `v`

(or of other data structures). We can distinguish between 2 types of indexing (or subsetting):

#### Numeric indexing

In *numeric indexing* (or *numeric subsetting*), we provide a numeric vector as an index to a vector. Instead of only asking for a single element, we can use a numeric vector of indices to obtain a new vector that contains the elements at the specified positions:

```
train[c(1, 20, 30)]
#> w001 w020 w030
#> "coal" "corn" "coal"
train[1:4]
#> w001 w002 w003 w004
#> "coal" "coal" "coal" "coal"
train[20:22]
#> w020 w021 w022
#> "corn" "gold" "coal"
```

Actually, numeric indexing also works when elements are selected repeatedly:

```
train[c(1, 1)]
#> w001 w001
#> "coal" "coal"
train[rep(21, 3)]
#> w021 w021 w021
#> "gold" "gold" "gold"
```

Importantly, the index used to select element can be the result of a function call. For instance,

```
length(v2) # number of elements in v2
#> [1] 4
v2[length(v2)] # get the last element of v2
#> [1] "D"
v2[c(1, length(v2))] # get the 1st and last elements of v2
#> [1] "A" "D"
```

Sometimes, we either know that or wonder whether a particular vector contains particular elements. The `which()`

function returns the corresponding index or indices (as a numeric vector):

```
which(v2 == "B")
#> [1] 2
which(v2 != "C")
#> [1] 1 2 4
which(train == "gold")
#> w021
#> 21
which(train != "coal")
#> w011 w012 w013 w014 w015 w016 w017 w018 w019 w020 w021
#> 11 12 13 14 15 16 17 18 19 20 21
```

If we wanted to obtain the corresponding elements, we could use these numeric vectors to index or subset the same vectors:

```
v2[which(v2 == "B")]
#> [1] "B"
v2[which(v2 != "C")]
#> [1] "A" "B" "D"
train[which(train == "gold")]
#> w021
#> "gold"
train[which(train != "coal")]
#> w011 w012 w013 w014 w015 w016 w017 w018 w019 w020 w021
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold"
```

These last examples first used `which()`

to obtain numeric indices of `train`

and then applied these indices to `train`

. This works, but seems a bit complicated. We can get the same elements by using logical indexing.

#### Logical indexing

In *logical indexing* (or *logical subsetting*), we select elements of a vector `v`

by providing a vector of truth values (i.e., logical values that are either `TRUE`

or `FALSE`

) in square brackets. Typically, the length of the logical index corresponds to the length of the vector `v`

. For example, the following statement selects the 1st and 2nd elements of `v1`

:

```
v1[c(TRUE, TRUE, FALSE, FALSE)]
#> [1] 1 2
```

As with numeric indices, we can use function calls to determine the logical indices. This allows applying a test criterion (i.e., typically a test of `v`

that returns a logical vector of the same length) to a vector `v`

and only return the elements for which the criterion is `TRUE`

. For instance, getting all

```
v1 < 3 # criterion [yielding a logical vector of length(v1)]
#> [1] TRUE TRUE FALSE FALSE
v1[v1 < 3] # applying logical index to v1
#> [1] 1 2
```

This allows us to repeat the selections from above by logical indexing:

```
v2[v2 == "B"]
#> [1] "B"
v2[v2 != "C"]
#> [1] "A" "B" "D"
train[train == "gold"]
#> w021
#> "gold"
train[train != "coal"]
#> w011 w012 w013 w014 w015 w016 w017 w018 w019 w020 w021
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold"
```

Note that each of these selections uses a vector twice: Once to conduct a test that yields logical indices (i.e., a vector of truth values that has the same length as the original vector) and a second time to select the elements for which the test criterion is `TRUE`

.

#### Practice

As indexing (or subsetting) vectors is quite important in R, we practice it more extensively:

- Indexing numeric vectors:

- Evaluate and explain the following code (in terms of numeric or logical indexing):

```
v <- 1:10 # create a numeric vector
v
crit <- v > 5 # Condition: Which values of v are larger than 5?
crit
v[crit] # using crit to select values of v (for which crit is TRUE)
v[v > 5] # all in 1 step
```

- Can you use numeric indexing to get the same elements as
`v[v > 5]`

?

```
v[6:10]
#> [1] 100 200 3 3 3
v[6:length(v)]
#> [1] 100 200 3 3 3
```

- Evaluate and explain the results of the following commands:

```
x <- 0:8
x[c(2, 4, 6)]
x[x < 3]
x[x %% 2 == 0]
```

- Indexing character vectors:

- Evaluate and explain the results of the following commands:

```
rhyme <- c("ene", "mene", "miste", "es", "rappelt", "in", "der", "kiste")
length(rhyme)
rhyme[c(1:3, length(rhyme))]
rhyme[nchar(rhyme) <= 3]
rhyme[substr(rhyme, start = 1, stop = 1) == "m"]
```

- Assume the following definition of
`spices`

:

`spices <- c("salt", "pepper", "cinnamon", "lemongrass", "mint", "mustard", "wasabi")`

Can you use functions and logical indexing to obtain all spices in `spices`

with exactly 4 letters, or all spices with the letter `"i"`

at their 2nd position?

```
spices[nchar(spices) == 4] # spices with exactly 4 letters
spices[substr(spices, 2, 2) == "i"] # spices with an "i" at 2nd position
```

- The R object
`LETTERS`

stores the 26 upper-case letters of the Roman alphabet (and`letters`

the same letters in lower-case). Thus, the vector`c("A", "B", "C")`

can also be obtained by numerical subsetting to get the first 3 letters of`LETTERS`

:`LETTERS[1:3]`

. Can you combine sampling and subsetting to get a random vector of 10 elements, which are sampled (with replacement) from the letters “U” to “Z”?

```
# ?LETTERS
U_to_Z <- LETTERS[21:26] # subsetting yields "U" "V" "W" "X" "Y" "Z"
sample(x = U_to_Z, size = 10, replace = TRUE)
```