## 1.4 Vectors

*Vectors* are by far the most common and the most important data structure in R.
For our purposes, a *vector* stores data in the shape of a linear sequence of elements that are all of the same type.^{23} Any vector `x`

has three common properties:

- its
*length*(returned by`length(x)`

); - its
*type*of elements (returned by`typeof(x)`

or`mode(x)`

); - optional
*attributes*or meta-data (returned by`attributes(x)`

).

The most common attributes of a vector `x`

are the *names* of its elements (which can be set or returned by `names(x)`

), or — for vectors that are shaped as matrices or arrays — its shape (returned by `dim(x)`

).

### 1.4.1 Basics

In contrast to most other sections in this book, this section examines a vector without defining it.
Hence, please read and try to understand the following paragraphs without worrying about the definition of the `train`

vector.

We can think of a vector as a freight train that contains a sequence of waggons:
Each waggon can contain some cargo (data elements), but all elements must be of the same data type (an entire train of all people or all cargo waggons, or data elements that are all of `logical`

, `numeric`

, or `character`

type).
As the waggons are indistinguishable from the outside, the contents of any particular waggon can be determined by addressing the waggon by its *name* (an optional attribute) or its *number* (or position) in the sequence.

#### Exploring a vector

For instance, suppose there exists an R vector named `train`

.
Although its definition is not shown here, the following functions provide some basic information about it:

```
# Print the vector:
# (see its names and elements)
train #> w01 w02 w03 w04 w05 w06 w07 w08 w09 w10 w11
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal"
#> w12 w13 w14 w15
#> "coal" "gold" "coal" "coal"
# Basic information about shape and type:
length(train) # How many waggons/elements does train consist of?
#> [1] 15
typeof(train) # Of which type are these waggons/elements?
#> [1] "character"
is.character(train)
#> [1] TRUE
is.logical (train)
#> [1] FALSE
is.numeric(train)
#> [1] FALSE
# Data structure:
is.vector(train)
#> [1] TRUE
is.list(train)
#> [1] FALSE
# Details:
names(train) # What are the names of the waggons/elements?
#> [1] "w01" "w02" "w03" "w04" "w05" "w06" "w07" "w08" "w09" "w10" "w11" "w12"
#> [13] "w13" "w14" "w15"
str(train) # What is the structure of train vector?
#> Named chr [1:15] "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" ...
#> - attr(*, "names")= chr [1:15] "w01" "w02" "w03" "w04" ...
# Accessing the contents of particular waggons/elements:
1] # by number: What's in 1st waggon/element?
train[#> w01
#> "coal"
"w01"] # by name: What's in the waggon named "w001"?
train[#> w01
#> "coal"
```

To obtain information about the `train`

vector, we applied some generic functions to it:

Evaluating

`train`

prints the contents of the vector in our Console. Actually, this was merely a shorthand for`print(train)`

. If the vector is short enough to be printed in its entirety, we can obtain comprehensive information about it (i.e., its shape and type, and the values and names of its elements). But even if the vector is too long to be printed, we can still learn its shape and type, whether its elements are named, and what its first elements are.As vectors are one-dimensional sequences, the

`length()`

function determines their shape.As vectors can only have one type, the

`typeof()`

function explicates their type.Vectors can have optional

*names*. The`names()`

function determines the names of a vector`v`

and returns`NULL`

if its elements are not named.

#### Accessing elements

The contents of particular vector elements can be accessed by their number or their name (if `v`

contains names).
To access vector elements, R uses *square* brackets `[]`

:

```
# Accessing the contents of particular waggons/elements:
1] # by number: What's in 1st waggon/element?
train[#> w01
#> "coal"
"w01"] # by name: What's in the waggon named "w001"?
train[#> w01
#> "coal"
```

The command `v[n]`

returns the `n`

-th element of a vector `v`

, and
the command `v["abc"]`

returns the element named “abc” of a vector `v`

(if `v`

contains an element with this name).

We will learn more about accessing particular vector elements when we learn more about *indexing* below (in
Section 1.4.6).

#### Practice

The following objects `v1`

and `v2`

are both vectors (i.e., an ordered sequence of elements) that have a length of 4 elements, but are of different types:

```
#> [1] 1 2 3 4
#> [1] "A" "B" "C" "D"
```

- Use basic vector functions to answer the following questions:

Of what type are the elements of

`v1`

and`v2`

?What is their i-th (e.g., 2nd) element?

What is their length?

Applying basic vector functions:

```
# Of which type are the vectors?
typeof(v1)
#> [1] "integer"
typeof(v2)
#> [1] "character"
# What is the 2nd element?
2]
v1[#> [1] 2
2]
v2[#> [1] "B"
# What is the length of a vector?
length(v1)
#> [1] 4
length(v2)
#> [1] 4
```

- Additional functions designed for vectors allow addressing more advanced tasks like the following:

What is the reverse sequence of

`v2`

?Do

`v1`

or`v2`

contain some element`"A"`

?Where does an element “A” (first) occur in some vector

`v2`

?Which element of

`v2`

has a particular value (e.g.,`"B"`

or`"Z"`

)?

Note that some of these questions ask for a vector of elements, whereas others are answered by a simple binary (e.g., `TRUE`

or `FALSE`

) value, or a vector of numbers (positions). To solve these tasks, we can apply slightly more advanced functions to vectors:

- The function
`rev()`

reverses a vector:

```
rev(v2) # reverse the elements of v2
#> [1] "D" "C" "B" "A"
```

- The function
`%in%`

(used as a binary operator between its two arguments) checks whether the left (first) argument is an element of the right (second) one:

```
"A" %in% v1 # does v1 contain an element "A"?
#> [1] FALSE
"A" %in% v2 # does v2 contain an element "A"?
#> [1] TRUE
# Compare:
== "C" # is (element of) v2 equal to "C"?
v2 #> [1] FALSE FALSE TRUE FALSE
%in% "C" # is (element of) v2 "C"?
v2 #> [1] FALSE FALSE TRUE FALSE
!= "D"
v2 #> [1] TRUE TRUE TRUE FALSE
!(v2 %in% "D")
#> [1] TRUE TRUE TRUE FALSE
# Note some details:
2 %in% v1 # does v1 contain an element 2?
#> [1] TRUE
"2" %in% v1 # does v1 contain an element "2"? (!)
#> [1] TRUE
" 2" %in% v1 # does v1 contain an element " 2"?
#> [1] FALSE
# ?`%in%` # provides additional documentation.
```

- The
`%in%`

operator is based on the`match()`

function that returns the positions of (first) matches of its first argument in its second argument:

```
# Get position of (first) match:
match(x = c("A", "C", "Z"),
table = c("A", "B", "A", "Z") # may be a vector
)#> [1] 1 NA 4
# Using v1 and v2:
match("A", v1, nomatch = 0) # returns 0 if there is no match
#> [1] 0
match("B", v2)
#> [1] 2
```

- The
`which()`

function returns the position(s) (i.e., numbers) for which a test on a vector evaluates to`TRUE`

:

```
# Position of TRUE test values:
which(v1 > 2) # position of elements for which v1 > 2?
#> [1] 3 4
which(v2 == "C") # which element of v2 is "C"?
#> [1] 3
which(v2 != "A") # which element of v2 is NOT "A"?
#> [1] 2 3 4
which(v2 == "X") # which element of v2 is "X"?
#> integer(0)
```

Thus, `which()`

allows us to probe a vector with some test and returns a (numeric) vector of the positions for which the test is `TRUE`

.

We will learn more about these and many other vector-based functions later. For now, just try to understand these basic functions and remember that there are many ways to probe and manipulate vectors.

### 1.4.2 Creating vectors

To create a new vector, we can *combine* several objects of the *same type* with the `c()`

function (think *chain*, *combine*, or *concatenate*), separating vector elements by commas:

```
# Creating vectors:
c(2, 4, 6)
#> [1] 2 4 6
# Combining scalar objects (defined above):
c(a, b)
#> [1] 100 500
<- c(a, b, c)
v
v#> [1] 100 500 3
<- c(c, c, c) # vectors can have repeated elements
v
v#> [1] 3 3 3
```

The same function can be used to combine scalars and vectors:

```
# Combining scalar objects and vectors (defined above):
<- c(a, b, v) # Note that vectors can contain vectors, ...
v
v#> [1] 100 500 3 3 3
<- c(v, v) # but the result is only 1 vector, not 2:
v
v#> [1] 100 500 3 3 3 100 500 3 3 3
```

but note that the resulting vector is still a vector, rather than a vector containing other vectors (i.e., `c()`

*flattens* vectors).

The `c()`

function can also be used to combine objects of type character or logical values into vectors:

```
# Combining other data types:
<- c(TRUE, FALSE)
u
u#> [1] TRUE FALSE
<- c("one", "two", "three")
w <- c(w, "four", "5", "many")
w
w#> [1] "one" "two" "three" "four" "5" "many"
```

When combining different data types, they are *coerced* into a single data type.
The result is either a numeric vector (when mixing truth values and numberic objects) or a character vector (when mixing anything with characters):

```
# Combining different data types:
<- c(TRUE, 2L, 3.0) # logical, integer, double
x
x#> [1] 1 2 3
typeof(x)
#> [1] "double"
<- c(TRUE, "two") # logical, character
y
y#> [1] "TRUE" "two"
typeof(y)
#> [1] "character"
<- c(TRUE, 2, "three") # logical, numeric, character
z
z#> [1] "TRUE" "2" "three"
typeof(z)
#> [1] "character"
```

In R, we are almost always dealing with vectors. These vectors are either created by us (by using functions that create or return vectors) or read in as data. As we will see below (in Section 1.5), data files in the form of tables are really columns of vectors.

### 1.4.3 Applying functions to vectors

Most functions in R can be applied to vectors. When applying a function to a vector, we typically either obtain some information *about* the vector (as a scalar object) or a transformed *version of* the vector:

```
# (a) Information about a vector (as a scalar):
length(v)
#> [1] 10
typeof(v)
#> [1] "double"
sum(v)
#> [1] 1218
length(w)
#> [1] 6
typeof(w)
#> [1] "character"
# sum(w) # would yield an error
# (b) Changing a vector by applying a function:
<- c(2, 4, 6)
n
+ 1 # a numeric vector:
n #> [1] 3 5 7
> 3 # a logical vector:
n #> [1] FALSE TRUE TRUE
<- c(2>1, 1>2, 2==2)
values # vector of truth values
values #> [1] TRUE FALSE TRUE
!values # reverses truth values
#> [1] FALSE TRUE FALSE
```

When combining data of different types (e.g., numbers and characters) in a vector, R coerces them into a common type (e.g., truth values into numbers, or all into objects of type “character,” as this is always possible):

```
<- c(1, "two", 3)
x
x#> [1] "1" "two" "3"
typeof(x) # converted 1 to "1"
#> [1] "character"
<- c(TRUE, "FALSE")
y
y#> [1] "TRUE" "FALSE"
typeof(y) # converted TRUE to "TRUE"
#> [1] "character"
<- c(TRUE, 2)
z
z#> [1] 1 2
typeof(z) # converted TRUE to 1.0
#> [1] "double"
```

A common operation that changes an existing vector consists in *sorting* vectors, which is achieved by the `sort()`

function. An argument `decreasing`

is set to `FALSE`

by default, but can be set to `TRUE`

if sorting in decreasing order is desired:

```
<- c(4, 6, 2)
x
sort(x)
#> [1] 2 4 6
sort(x, decreasing = TRUE)
#> [1] 6 4 2
```

#### Calculating with logical values

When combining truth values and numeric objects in a vector, a *numeric* vector is returned.
Any value of `TRUE`

is interpreted as the number 1 and any value of `FALSE`

is interpreted as the number 0:

```
TRUE + FALSE
#> [1] 1
TRUE - FALSE + TRUE
#> [1] 2
3 * TRUE - 11 * FALSE/7
#> [1] 3
```

The same interpretation of truth values is made when applying arithmetic functions to truth values:

```
sum(c(TRUE, FALSE, TRUE))
#> [1] 2
mean(c(TRUE, FALSE, FALSE))
#> [1] 0.3333333
```

Calculating with logical values may seem a bit strange at first, but provides a useful bridge between logical and numeric data types.

#### Practice

Here are some practice exercises for applying functions to various types of vectors:

- What happens when we combine truth values (aka. logicals) and numeric objects into a vector?

```
<- c(TRUE, FALSE, 100)
y
y
typeof(y)
sum(y)
```

**Answer:** The truth values are coerced into numeric values: `FALSE`

becomes 0 and `TRUE`

becomes 1.

- Probing ABBA:

Assuming a vector `band <- c("A", "B", "B", "A")`

.

- Predict and explain the results of the following expressions:

```
<- c("A", "B", "B", "A") # define vector
band
"B" == band # a logical vector of length 4
"B" %in% band # a logical scalar
match("B", band) # a numeric scalar
which("B" == band) # a numeric vector
```

- What happens when we use
`sort()`

on (a) a character vector, or (b) on a logical vector?

```
# (a) Character vector:
<- c("Banana", "Cucumber", "Apple")
z
zsort(z)
sort(z, decreasing = TRUE)
# (b) Logical vector:
<- (c(1, 2, 3, 4, 5) %% 2 == 0)
z
zsort(z)
sort(z, decreasing = TRUE)
```

**Answer:** Both vectors are sorted, just as we have seen for numeric vectors above.
Vectors with character objects are sorted alphabetically.
In logical vectors, `FALSE`

values (corresponding to values of 0) are sorted before `TRUE`

values (corresponding to values of 1).

Look up the documentation of the

`order()`

function and contrast it with the`sort()`

function.Computing with numbers vs. logical values:

Assume the following vector definitions:

```
<- c(0, 2, 4, 6, 8)
even <- c(1, 3, 5, 7, 9)
odd <- sort(c(even, odd)) nums
```

- Predict and explain the results of the following expressions:

```
sum(even)
sum(nums %in% even)
```

### 1.4.4 Scalar objects are vectors

R is often described as a vector-based language. This means that R uses vectors when other languages would use other data structures. Actually, we were cheating a bit when distinguishing between “scalars” and “vectors” above. Although this makes sense when introducing the notion of objects, R actually has no dedicated data structure for scalar objects. Instead, even individual numbers (i.e., objects of type “integer” or “double”) and names or text labels (i.e., objects of type “character”) are actually *vectors* of length 1. We can verify this by applying the `is.vector()`

and `length()`

functions to the scalar objects defined above:

```
a#> [1] 100
is.vector(a)
#> [1] TRUE
length(a)
#> [1] 1
typeof(a)
#> [1] "double"
d#> [1] "wow"
is.vector(d)
#> [1] TRUE
length(d)
#> [1] 1
typeof(d)
#> [1] "character"
e#> [1] TRUE
is.vector(e)
#> [1] TRUE
length(e)
#> [1] 1
typeof(e)
#> [1] "logical"
```

### 1.4.5 Vector creation functions

The `c()`

function is used for combining existing vectors.
However, for creating vectors that contain more than just a few elements (i.e., vectors with larger `length()`

values), using the `c()`

function and then typing all vector elements becomes impractical.
Useful functions and shortcuts to generate continuous or regular sequences are the colon operator `:`

, and the functions `seq()`

and `rep()`

:

`m:n`

generates a numeric sequence (in steps of \(1\) or \(-1\)) from`m`

to`n`

:

```
# Colon operator (with by = 1):
<- 0:10
s1
s1#> [1] 0 1 2 3 4 5 6 7 8 9 10
<- 10:0
s2 all.equal(s1, rev(s2))
#> [1] TRUE
```

`seq()`

generates numeric sequences from an initial number`from`

to a final number`to`

and allows either setting the step-width`by`

or the length of the sequence`length.out`

:

```
# Sequences with seq():
<- seq(0, 10, 1) # is short for:
s3
s3#> [1] 0 1 2 3 4 5 6 7 8 9 10
<- seq(from = 0, to = 10, by = 1)
s4 all.equal(s3, s4)
#> [1] TRUE
all.equal(s1, s3)
#> [1] TRUE
# Note: seq() is more flexible:
<- seq(0, 10, by = 2.5) # set step size
s5
s5#> [1] 0.0 2.5 5.0 7.5 10.0
<- seq(0, 10, length.out = 5) # set output length
s6 all.equal(s5, s6)
#> [1] TRUE
```

`rep()`

replicates the values provided in its first argument`x`

either`times`

times or each element`each`

times:

```
# Replicating vectors (with rep):
<- rep(c(0, 1), 3) # is short for:
s7
s7#> [1] 0 1 0 1 0 1
<- rep(x = c(0, 1), times = 3)
s8 all.equal(s7, s8)
#> [1] TRUE
# but differs from:
<- rep(x = c(0, 1), each = 3)
s9
s9#> [1] 0 0 0 1 1 1
```

Whereas `:`

and `seq()`

create numeric vectors, `rep()`

can be used with other data types:

```
rep(c(TRUE, FALSE), times = 2)
#> [1] TRUE FALSE TRUE FALSE
rep(c("A", "B"), each = 2)
#> [1] "A" "A" "B" "B"
```

### 1.4.6 Indexing vectors

We now have learned how inspect vectors and how to determine their type, length, names, and particular elements (see Section 1.4.1 above). But when data is primarily saved in vectors, an important question is:

- How can we access, test for, or replace individual vector elements?

In R, the tasks of accessing, testing, or replacing the elements of a vector (and other data structures) are known as *indexing*, *subsetting*, or *filtering* (Matloff, 2011, p. 45ff.).
Depending on how we specify the element(s) that we wish to access, test or modify, this process comes in two main varieties (numeric vs. logical):^{24}

- A
*numeric index*simply is a number that denotes the position of an element in a sequence. Actually, we already used numeric indices above to access particular elements of vectors, for instance:

```
# Assuming vector definitions (from above):
train#> w01 w02 w03 w04 w05 w06 w07 w08 w09 w10 w11
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal"
#> w12 w13 w14 w15
#> "coal" "gold" "coal" "coal"
v1#> [1] 1 2 3 4
v2#> [1] "A" "B" "C" "D"
# Numeric indexing for 1 element:
1] # get the 1st element of train
train[#> w01
#> "coal"
2] # get the 2nd element of v1
v1[#> [1] 2
3] # get the 3rd element of v2
v2[#> [1] "C"
```

Importantly, selecting particular vector elements by an index requires enclosing their numeric position in square brackets `[]`

.
By contrast, functions always enclose their arguments in round parentheses `()`

.

- A
*logical index*is the result of a test that evaluates to one or more “logical” values (i.e., each value being either`TRUE`

or`FALSE`

).

Why do we need operations for indexing/subsetting/filtering data structures?
In R, we often store a large number of values in data structures (e.g., a vector may contain the age values of thousands of participants), but only need some of them for answering specific questions (e.g., what is the average age of all male participants?). To *select* only a subset of elements from a vector `v`

we can specify the condition or criterion for our selection in (square) brackets `v[...]`

.

Supplying a numeric or logical index (in square brackets) is a very flexible and powerful way for selecting elements of a vector `v`

(or of other data structures).
The following sections provide examples for the two main types of indexing/subsetting/filtering.
Again,

(A) *Numeric indexing* uses (vectors of) numbers to select (access or change) vector elements, and
(B) *logical indexing* uses (vectors of) logical values to select (access or change) vector elements.

#### A. Numeric indexing

In *numeric indexing* (aka. *numerical indexing* or *numeric subsetting*), we provide a numeric vector as an index to a vector.
Instead of only asking for a single element, we can use a numeric vector of indices to obtain a new vector that contains the elements at the specified positions:

```
c(1, 10, 13)]
train[#> w01 w10 w13
#> "coal" "corn" "gold"
1:4]
train[#> w01 w02 w03 w04
#> "coal" "coal" "coal" "coal"
10:13]
train[#> w10 w11 w12 w13
#> "corn" "coal" "coal" "gold"
```

Actually, numeric indexing also works when elements are selected repeatedly:

```
c(1, 1)]
train[#> w01 w01
#> "coal" "coal"
rep(10, 3)]
train[#> w10 w10 w10
#> "corn" "corn" "corn"
```

Using negative indices selects all elements except the negated ones:

```
-13]
train[#> w01 w02 w03 w04 w05 w06 w07 w08 w09 w10 w11
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal"
#> w12 w14 w15
#> "coal" "coal" "coal"
-1:-12]
train[#> w13 w14 w15
#> "gold" "coal" "coal"
```

Importantly, the index used to select element can be the result of a function call. For instance,

```
length(v2) # number of elements in v2
#> [1] 4
length(v2)] # get the last element of v2
v2[#> [1] "D"
c(1, length(v2))] # get the 1st and last elements of v2
v2[#> [1] "A" "D"
```

Sometimes, we either know that or wonder whether a particular vector contains particular elements.
The `which()`

function uses a logical test to determine the positions for which some test is `TRUE`

and returns the corresponding positions as a numeric vector:

```
which(v2 == "B")
#> [1] 2
which(v2 != "C")
#> [1] 1 2 4
which(train == "gold")
#> w13
#> 13
which(train != "coal")
#> w06 w07 w08 w09 w10 w13
#> 6 7 8 9 10 13
```

If we wanted to *obtain* the corresponding elements, we could use these numeric vectors to index or subset the same vectors:

```
which(v2 == "B")]
v2[#> [1] "B"
which(v2 != "C")]
v2[#> [1] "A" "B" "D"
which(train == "gold")]
train[#> w13
#> "gold"
which(train != "coal")]
train[#> w06 w07 w08 w09 w10 w13
#> "corn" "corn" "corn" "corn" "corn" "gold"
```

These last examples first used `which()`

to obtain numeric indices of `train`

and then applied these indices to `train`

.
This works, but seems a bit complicated. A more direct way of obtaining the same elements is provided by logical indexing.

#### B. Logical indexing

In *logical indexing* (or *logical subsetting*), we select elements of a vector `v`

by providing a test (i.e., a vector of “logical” objects that are either `TRUE`

or `FALSE`

) in square brackets.
Typically, the length of the logical index corresponds to the length of the vector `v`

.

For example, the following statement selects the 1st and 2nd elements of `v1`

:

```
c(TRUE, TRUE, FALSE, FALSE)]
v1[#> [1] 1 2
```

Numeric indexing and logical indexing can easily be combined.
As we have seen for numeric indices, we can use functions to create the logical index values.
We often specify some condition for some object `x`

and use the resulting vector of logical values as an index to select elements of `x`

.
Thus, to select specific elements of a vector `v`

, we define the selection criterion as a logical test (i.e., an expression of `v`

that returns a logical vector of the same length as `v`

) and use the result (i.e., a vector of logical values) to index vector `v`

. This is called “subsetting” or “filtering” (Matloff, 2011) and only returns the elements of `v`

for which the criterion is `TRUE`

.

For instance, detecting and getting all elements of `v1`

that are numbers below 3 could use the following criterion and indexing step:

```
< 3 # criterion => a logical vector of length(v1)
v1 #> [1] TRUE TRUE FALSE FALSE
< 3] # logical indexing: apply criterion to v1
v1[v1 #> [1] 1 2
```

Logical indexing allows us to repeat the selections from above (without the detour of numerical indexing by `which()`

):

```
== "B"]
v2[v2 #> [1] "B"
!= "C"]
v2[v2 #> [1] "A" "B" "D"
== "gold"]
train[train #> w13
#> "gold"
!= "coal"]
train[train #> w06 w07 w08 w09 w10 w13
#> "corn" "corn" "corn" "corn" "corn" "gold"
```

Note that each of these selections uses a vector twice: Once to conduct a test that yields logical indices (i.e., a vector of truth values that has the same length as the original vector) and a second time to select the elements for which the test is `TRUE`

.

#### Indexing by the `subset()`

function

An alternative to using numeric or logical indexing is provided by the `subset()`

function, which takes some data object `x`

(a vector, data frame, or list) and a logical expression as its arguments, and returns the elements for which the logical expression is `TRUE`

:

```
subset(x = train, train == "corn")
#> w06 w07 w08 w09 w10
#> "corn" "corn" "corn" "corn" "corn"
subset(x = v1, v1 > 2)
#> [1] 3 4
```

Note that indexing by brackets `[]`

and by the `subset()`

function differ in the way in which missing (or `NA`

) values are handled:

```
<- c(1:2, NA, 4:6, NA, 8:10))
(v #> [1] 1 2 NA 4 5 6 NA 8 9 10
> 5] # missing/NA values are preserved
v[v #> [1] NA 6 NA 8 9 10
subset(v, v > 5) # missing/NA values are lost
#> [1] 6 8 9 10
```

As we typically do not want to lose missing values, indexing by brackets is safer than using `subset()`

.

Indexing a vector `v`

selected specific elements of `v`

by imposing conditions on them.
The results were new vectors that were either of the same type or logical vectors, and either contained the same number or fewer elements than the original vector.

But when creating new vectors, we often want to do more than just selecting elements. A nifty way of new vectors that contain elements from existing vectors is to randomly draw objects from a population (see Section 1.6.4 on Random sampling).

#### Practice

As creating and indexing/subsetting vectors is an important topic in R, we should practice these tasks more extensively:

- Indexing numeric vectors:

- Evaluate and explain the following code (in terms of numeric or logical indexing):

```
<- 1:10 # create a numeric vector
v
v
<- v > 5 # Condition: Which values of v are larger than 5?
crit
crit
# using crit to select values of v (for which crit is TRUE)
v[crit] > 5] # all in 1 step v[v
```

- Can you use numeric indexing to get the same elements as
`v[v > 5]`

?

```
6:10]
v[#> [1] 6 NA 8 9 10
6:length(v)]
v[#> [1] 6 NA 8 9 10
```

- Evaluate and explain the results of the following commands:

```
<- 0:8
x
c(2, 4, 6)]
x[< 3]
x[x %% 2 == 0] x[x
```

- Indexing character vectors:

- Predict, evaluate, and explain the results of the following commands:

```
<- c("ene", "mene", "miste", "es", "rappelt", "in", "der", "kiste")
rhyme
length(rhyme)
c(1:3, length(rhyme))]
rhyme[nchar(rhyme) <= 3]
rhyme[substr(rhyme, start = 1, stop = 1) == "m"] rhyme[
```

Assume the following definition of `spices`

:

`<- c("salt", "pepper", "cinnamon", "lemongrass", "mint", "mustard", "wasabi") spices `

and then use a combination of functions and numeric or logical indexing for obtaining:

the first three spices

all spices, except the first two

the last spice

all spices in

`spices`

with exactly four lettersall spices with eight or more letters

all spices with the letter “i” at their 2nd position

#### Solution

```
# numeric indexing:
1:3] # first three elements of spices
spices[c(-1, -2)] # all spices except first two
spices[c(length(spices))] # last element of spices
spices[
# logical indexing:
nchar(spices) == 4] # spices with exactly 4 letters
spices[nchar(spices) >= 8] # spices with at least 8 letters
spices[substr(spices, 2, 2) == "i"] # spices with an "i" at 2nd position spices[
```

**Note:** More sophisticated ways of searching text objects are covered in Appendix E on using regular expressions.

- Selecting
`LETTERS`

:

The R object `LETTERS`

stores the 26 upper-case letters of the Roman alphabet (and `letters`

the same letters in lowercase). Thus, the vector `c("A", "B", "C")`

can be obtained by numerical subsetting to get the first three letters of `LETTERS`

: `LETTERS[1:3]`

.

Solve the following tasks by using functions and numeric or logical indexing:

How many letters are in

`LETTERS`

?Create a vector of all

`LETTERS`

up to NCreate a vector of the final six

`LETTERS`

Create a vector of all

`LETTERS`

not contained in the`LETTERS`

up to NWhat are the positions of the vowels in

`LETTERS`

?Create a vector of all non-vowels in

`LETTERS`

**Hint:** The letters “A,” “E,” “I,” “O,” and “U,” are vowels.

#### Solution

```
# ?LETTERS
length(LETTERS)
<- LETTERS[1:which(LETTERS == "N")]
up_to_N <- LETTERS[(length(LETTERS) - 5):length(LETTERS)]
final_6 !(LETTERS %in% up_to_N)]
LETTERS[
<- c("A", "E", "I", "O", "U") # vector of vowels
vowels !(LETTERS %in% vowels)] # vector of non-vowels
LETTERS[
<- which(LETTERS %in% vowels) # position of vowels (as numbers)
vowix1 <- match(vowels, LETTERS) # ... (alternative solution)
vowix2 -vowix1] # vector of non-vowels LETTERS[
```

- Creating and modifying vectors:

Use vector creation functions to define

a character vector that repeats the letters “A,” “B,” and “C” five times

a character vector that contains each of the letters “A,” “B,” and “C” five times

a numeric vector that divides the range from 0 to 100 into 6 intervals of equal width

a numeric vector that contains all integers from 1 to 1000 that are divisible by both 7 and 13

a character vector that contains all

`letters`

that are*not*“a,” “b,” or “c”

#### Solution

All these objects can be created in multiple ways. Possible solutions include:

```
rep(LETTERS[1:3], times = 5)
rep(LETTERS[1:3], each = 5)
seq(0, 100, length.out = 7)
<- rep(1:1000)
v %% 7 == 0) & (v %% 13 == 0)]
v[(v
-(1:3)]
letters[# or:
<- c("a", "b", "c")
abc !(letters %in% abc)] letters[
```

- Re-creating
`train`

for training purposes:

We examined a `train`

vector above without showing how it was created:

- Use your knowledge of creating vectors to create a vector
`train_2`

that is a similar to`train`

(from above) as possible.

**Hint:** Note that `train`

is *named* character vector.
Re-creating its elements is straightforward, but re-creating its names may be trickier.

#### Solution

- Creating the elements of the
`train_2`

vector is easy, and there are many ways of doing so. If`train`

was available in our environment, we could simply copy it. If not, we can still re-create it by defining its elements in several ways:

```
# Solution 0 (if train was available):
<- train
train_2
# Solution 1: Using individual elements:
<- c("coal", "coal", "coal", "coal", "coal",
train_2 "corn", "corn", "corn", "corn", "corn",
"coal", "coal", "gold", "coal", "coal")
# Solution 2: Using functions and preplacing 1 element:
<- c(rep("coal", 5), rep("corn", 5), rep("coal", 5))
train_2 13] <- "gold"
train_2[
# Solution 3: Using function and preplacing 5+1 elements:
<- rep("coal", 15)
train_2 6:10] <- "corn"
train_2[13] <- "gold"
train_2[
# Print result:
train_2
```

- Naming the
`train_2`

vector is more challenging, as it requires another character vector (of equal length as`train_2`

) that is assigned to`names(train_2)`

. Here are some potential solutions:

```
# Solution 0 (if train was available):
names(train_2) <- names(train)
# Solution 1: Using individual elements:
names(train_2) <- c("w01", "w02", "w03", "w04", "w05",
"w06", "w07", "w08", "w09", "w10",
"w11", "w12", "w13", "w14", "w15")
# Solution 2: Using functions:
names(train_2) <- paste0("w", ds4psy::num_as_char(x = 1:length(train), n_pre_dec = 2, n_dec = 0))
# Print result:
train_2
# Verify equality (if train was available):
all.equal(train, train_2)
```

- Given the
`train_2`

vector (from 5.), predict, evaluate, and explain the result of the following expressions:

```
8:14]
train_2[c(11, 13, 7)]
train_2[99]
train_2[== "corn"]
train_2[train_2 != "corn"]
train_2[train_2
"gold" %in% train_2
match("corn", train_2)
which(train_2 == "coal")
which(train_2 != "coal")
which(train_2 == "gold")
typeof(train_2)
typeof(names(train_2))
sum(nchar(train_2))
length(train_2) > nchar(train_2)
sum(length(train_2) > nchar(train_2))
```

- Smart Alex asks: You started this chapter by quoting John Chambers’s statement “Everything that happens is a function call.” (in Section 1.2). But when assigning objects (as in
`x <- 1:4`

) or subsetting an object with brackets (`x[3]`

) it seems that things can also happen in R without calling functions. Discuss.

**Hint:** Check the documentations of `?"<-"`

and `?"["`

.

#### Solution

The statement is true, but ubiquitous R functions often appear in abbreviated forms.
Just like evaluating `x`

was an abbreviation for `print(x)`

, and `x + y`

was an abbreviation for `sum(x, y)`

, both `<-`

and `[`

actually *are* R functions.
To render their functional nature more obvious, we can re-write them in the standard `name(argument)`

form, but need to enclose their non-standard names in quotes:

```
# Assignment:
<- 1:4)
(x "<-"(x, 1:4))
(
# Subsetting:
3]
x["["(x, 3) # reading
# Note:
"[<-"(x, 3, 99) # writing
```

#### From linear vectors to rectangular tables

Having achieved some basic familiarity with *vectors*, we can now extend our knowledge about data structures by adding a second dimension, which will then yield *rectangular tables* of data. A good question to recall at this point is:

- If we combine the data from multiple vectors into a rectangular table, do such tables in R consist of
*rows*of vectors,*columns*of vectors, or*both*?

(The answer was mentioned in discussing Table 1.1 and the practice task of Section 1.2.1).

### References

*The art of R programming: A tour of statistical software design*. San Francisco, CA: No Starch Press.

*Advanced R*(1st ed.). Retrieved from http://adv-r.had.co.nz/

Strictly speaking, the precise term for such sequences is

*atomic vectors*. The prominence of vectors is partly due to the fact that they are the building blocks of other data structures. For instance,*lists*are also vectors, but can store elements of different types. (See Wickham, 2014a for details.)↩︎Elements of a named vector

`v`

can also be accessed by named indexing (e.g.,`v$name`

), but as many vectors do not have names, this is more relevant for accessing parts of tables or lists.↩︎