1.4 Vectors

Vectors are by far the most common and the most important data structure in R. For our purposes, a vector stores data in the shape of a linear sequence of elements that are all of the same type.²³ Any vector x has three common properties:

its length (returned by length(x));
its type of elements (returned by typeof(x) or mode(x));
optional attributes or meta-data (returned by attributes(x)).

The most common attributes of a vector x are the names of its elements (which can be set or returned by names(x)), or — for vectors that are shaped as matrices or arrays — its shape (returned by dim(x)).

1.4.1 Basics

In contrast to most other sections in this book, this section examines a vector without defining it. Hence, please read and try to understand the following paragraphs without worrying about the definition of the train vector.

We can think of a vector as a freight train that contains a sequence of waggons: Each waggon can contain some cargo (data elements), but all elements must be of the same data type (an entire train of all people or all cargo waggons, or data elements that are all of logical, numeric, or character type). As the waggons are indistinguishable from the outside, the contents of any particular waggon can be determined by addressing the waggon by its name (an optional attribute) or its number (or position) in the sequence.

Exploring a vector

For instance, suppose there exists an R vector named train. Although its definition is not shown here, the following functions provide some basic information about it:

# Print the vector:
train  # (see its names and elements)
#>    w01    w02    w03    w04    w05    w06    w07    w08    w09    w10    w11 
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal" 
#>    w12    w13    w14    w15 
#> "coal" "gold" "coal" "coal"

# Basic information about shape and type:
length(train)  # How many waggons/elements does train consist of?
#> [1] 15

typeof(train)  # Of which type are these waggons/elements?
#> [1] "character"

is.character(train)
#> [1] TRUE
is.logical (train)
#> [1] FALSE
is.numeric(train)
#> [1] FALSE

# Data structure:
is.vector(train)
#> [1] TRUE
is.list(train)
#> [1] FALSE

# Details: 
names(train)   # What are the names of the waggons/elements?
#>  [1] "w01" "w02" "w03" "w04" "w05" "w06" "w07" "w08" "w09" "w10" "w11" "w12"
#> [13] "w13" "w14" "w15"
str(train)     # What is the structure of train vector?
#>  Named chr [1:15] "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" ...
#>  - attr(*, "names")= chr [1:15] "w01" "w02" "w03" "w04" ...

# Accessing the contents of particular waggons/elements:
train[1]       # by number: What's in 1st waggon/element?
#>    w01 
#> "coal"
train["w01"]   # by name:   What's in the waggon named "w001"?
#>    w01 
#> "coal"

To obtain information about the train vector, we applied some generic functions to it:

Evaluating train prints the contents of the vector in our Console. Actually, this was merely a shorthand for print(train). If the vector is short enough to be printed in its entirety, we can obtain comprehensive information about it (i.e., its shape and type, and the values and names of its elements). But even if the vector is too long to be printed, we can still learn its shape and type, whether its elements are named, and what its first elements are.
As vectors are one-dimensional sequences, the length() function determines their shape.
As vectors can only have one type, the typeof() function explicates their type.
Vectors can have optional names. The names() function determines the names of a vector v and returns NULL if its elements are not named.

Accessing elements

The contents of particular vector elements can be accessed by their number or their name (if v contains names). To access vector elements, R uses square brackets []:

# Accessing the contents of particular waggons/elements:
train[1]       # by number: What's in 1st waggon/element?
#>    w01 
#> "coal"
train["w01"]   # by name:   What's in the waggon named "w001"?
#>    w01 
#> "coal"

The expression v[n] returns the n-th element of a vector v, and the expression v["abc"] returns the element named “abc” of a vector v, provided that v contains an element with this name.

We will learn more about accessing particular vector elements when we learn more about indexing below (in Section 1.4.6).

Practice

The following objects v1 and v2 are both vectors (i.e., an ordered sequence of elements) that have a length of 4 elements, but are of different types:

#> [1] 1 2 3 4
#> [1] "A" "B" "C" "D"

Use basic vector functions to answer the following questions:

Of what type are the elements of v1 and v2?
What is their i-th (e.g., 2nd) element?
What is their length?
What are their names?

Solution

Applying basic vector functions:

# Of which type are the vectors?
typeof(v1)
typeof(v2)
# What is the 2nd element?
v1[2]        
v2[2]
# What is the length of a vector?
length(v1)
length(v2)
# Vector names?
names(v1)
names(v2)

Additional functions designed for vectors allow addressing more advanced tasks like the following:

What is the reverse sequence of v2?
Do v1 or v2 contain an element "A"?
Where does an element “A” (first) occur in some vector v2?
Which element of v2 has a particular value (e.g., "B" or "Z")?

Hint: Some of these questions ask for a vector of elements, others are answered by a logical value (i.e., either TRUE or FALSE), or a numeric vector (of element positions). To solve some tasks, we can apply slightly more advanced vector functions:

The function rev() reverses a vector:

rev(v2)  #  reverse the elements of v2 
#> [1] "D" "C" "B" "A"

The function %in% (used as a binary operator between its two arguments) checks whether the left (first) argument is an element of the right (second) one:

"A" %in% v1  # does v1 contain an element "A"? 
#> [1] FALSE
"A" %in% v2  # does v2 contain an element "A"? 
#> [1] TRUE

# Compare: 
v2 == "C"    # is (element of) v2 equal to "C"?
#> [1] FALSE FALSE  TRUE FALSE
v2 %in% "C"  # is (element of) v2 "C"?
#> [1] FALSE FALSE  TRUE FALSE

v2 != "D"
#> [1]  TRUE  TRUE  TRUE FALSE
!(v2 %in% "D")
#> [1]  TRUE  TRUE  TRUE FALSE

# Note some details:
 2  %in% v1   # does v1 contain an element 2?
#> [1] TRUE
"2" %in% v1   # does v1 contain an element "2"?  (!)
#> [1] TRUE
" 2" %in% v1  # does v1 contain an element " 2"?
#> [1] FALSE

# ?`%in%`  # provides additional documentation.

The %in% operator is based on the match() function that returns the positions of (first) matches of its first argument in its second argument:

# Get position of (first) match:
match(x = c("A", "C", "Z"), 
      table = c("A", "B", "A", "Z")  # may be a vector
      )
#> [1]  1 NA  4

# Using v1 and v2:
match("A", v1, nomatch = 0)  # returns 0 if there is no match
#> [1] 0
match("B", v2)
#> [1] 2

The which() function returns the position(s) (i.e., numbers) for which a test on a vector evaluates to TRUE:

# Position of TRUE test values:
which(v1 > 2)     # position of elements for which v1 > 2?
#> [1] 3 4
which(v2 == "C")  # which element of v2 is "C"? 
#> [1] 3
which(v2 != "A")  # which element of v2 is NOT "A"?
#> [1] 2 3 4
which(v2 == "X")  # which element of v2 is "X"? 
#> integer(0)

Thus, which() allows us to probe a vector with some test and returns a (numeric) vector of the positions for which a test evaluates to TRUE.

We will learn more about these and many other vector-based functions later. For now, just try to understand these basic functions and remember that there are many ways to probe and manipulate vectors.

1.4.2 Creating vectors

To create a new vector, we can combine several objects of the same data type with the c() function (think chain, combine, or concatenate), separating vector elements by commas:

# Creating vectors: 
c(2, 4, 6)
#> [1] 2 4 6

# Combining scalar objects (defined above): 
c(a, b)  
#> [1] 100 500

v <- c(a, b, c)
v
#> [1] 100 500   3

v <- c(c, c, c)  # vectors can have repeated elements
v
#> [1] 3 3 3

The c() function can also be used to combine scalars and vectors:

# Combining scalar objects and vectors (defined above): 
v <- c(a, b, v)  # Note that vectors can contain vectors, ...
v
#> [1] 100 500   3   3   3

v <- c(v, v)     # but the result is only 1 vector, not 2: 
v
#>  [1] 100 500   3   3   3 100 500   3   3   3

but note that the resulting vector is still a vector, rather than a nested vector containing other vectors (i.e., c() flattens vectors).

The c() function can also be used to combine objects of type character or logical values into vectors:

# Combining other data types:
u <- c(TRUE, FALSE)
u
#> [1]  TRUE FALSE

w <- c("one", "two", "three")
w <- c(w, "four", "5", "many")
w
#> [1] "one"   "two"   "three" "four"  "5"     "many"

When combining vectors of different data types, they are coerced into a single data type. The resulting vector is either of type numeric (when mixing logical values and numeric objects) or of type character (when mixing anything with characters):

# Combining different data types:
x <- c(TRUE, 2L, 3.0)  # logical, integer, double
x
#> [1] 1 2 3
typeof(x)
#> [1] "double"

y <- c(TRUE, "two")  # logical, character
y
#> [1] "TRUE" "two"
typeof(y)
#> [1] "character"

z <- c(TRUE, 2, "three")  # logical, numeric, character
z
#> [1] "TRUE"  "2"     "three"
typeof(z)
#> [1] "character"

In R, we are routinely dealing with vectors. These vectors are either created by us (by using functions that create or return vectors) or read in as data (e.g., from a file). As we will see below (in Section 1.5), data files in the form of tables are really columns of vectors. Hence, anything we learn about vectors will be useful when working with data tables later.

1.4.3 Applying functions to vectors

Most functions in R can be applied to vectors of different length. When applying a function to a vector, we typically either obtain some information about the vector (in the form of a scalar object, or a summary table) or a transformed version of the vector (i.e., a different vector of the same length and data type).

For the following examples, try to describe both the input and the output of each expression in terms of their data type and shape:

# (a) Information about a vector (as a scalar):
length(v)
#> [1] 10
typeof(v)
#> [1] "double"
sum(v)
#> [1] 1218

length(w)
#> [1] 6
typeof(w)
#> [1] "character"
# sum(w)  # would yield an error

# (b) Changing a vector by applying a function: 
n <- c(2, 4, 6)

n + 1  # a numeric vector: 
#> [1] 3 5 7

n > 3  # a logical vector:
#> [1] FALSE  TRUE  TRUE

values <- c(2>1, 1>2, 2==2)
values   # vector of truth values
#> [1]  TRUE FALSE  TRUE
!values  # reverses truth values
#> [1] FALSE  TRUE FALSE

When combining data of different types (e.g., numbers and characters) in a vector, R coerces them into a common type (e.g., truth values into numbers, or all into objects of type “character”, as this is always possible):

x <- c(1, "two", 3)
x
#> [1] "1"   "two" "3"
typeof(x)  # converted 1 to "1"
#> [1] "character"


y <- c(TRUE, "FALSE")
y
#> [1] "TRUE"  "FALSE"
typeof(y)  # converted TRUE to "TRUE"
#> [1] "character"

z <- c(TRUE, 2)
z
#> [1] 1 2
typeof(z)  # converted TRUE to 1.0
#> [1] "double"

A common operation that changes an existing vector consists in sorting vectors, which is achieved by the sort() function. An argument decreasing is set to FALSE by default, but can be set to TRUE if sorting in decreasing order is desired:

x <- c(4, 6, 2)

sort(x)
#> [1] 2 4 6
sort(x, decreasing = TRUE)
#> [1] 6 4 2

Calculating with logical values

When combining truth values and numeric objects in a vector, a numeric vector is returned. Any value of TRUE is interpreted as the number 1 and any value of FALSE is interpreted as the number 0:

TRUE + FALSE
#> [1] 1
TRUE - FALSE + TRUE
#> [1] 2
3 * TRUE - 11 * FALSE/7
#> [1] 3

The same interpretation of truth values is made when applying arithmetic functions to truth values:

sum(c(TRUE, FALSE, TRUE))
#> [1] 2
mean(c(TRUE, FALSE, FALSE))
#> [1] 0.3333333

Calculating with logical values may seem a bit strange at first, but provides a useful bridge between logical and numeric data types.

Practice

Here are some practice exercises for applying functions to various types of vectors:

What happens when we combine truth values (aka. logicals) and numeric objects into a vector?

y <- c(TRUE, FALSE, 100)
y

typeof(y)
sum(y)

Answer: The truth values are coerced into numeric values: FALSE becomes 0 and TRUE becomes 1.

Probing ABBA:

Assuming a vector band <- c("A", "B", "B", "A").

Predict and explain the results of the following expressions:

band <- c("A", "B", "B", "A")  # define vector

"B" == band         # a logical vector of length 4
"B" %in% band       # a logical scalar
match("B", band)    # a numeric scalar
which("B" == band)  # a numeric vector

What happens when we use sort() on (a) a character vector, or (b) on a logical vector?

# (a) Character vector:
z <- c("Banana", "Cucumber", "Apple")
z
sort(z)
sort(z, decreasing = TRUE)

# (b) Logical vector:
z <- (c(1, 2, 3, 4, 5) %% 2 == 0)
z
sort(z)
sort(z, decreasing = TRUE)

Answer: Both vectors are sorted, just as we have seen for numeric vectors above. Vectors with character objects are sorted alphabetically. In logical vectors, FALSE values (corresponding to values of 0) are sorted before TRUE values (corresponding to values of 1).

Look up the documentation of the order() function and contrast it with the sort() function.
Computing with numbers vs. logical values:

Assume the following vector definitions:

even <- c(0, 2, 4, 6, 8)
odd  <- c(1, 3, 5, 7, 9)
nums <- sort(c(even, odd))

Predict and explain the results of the following expressions:

sum(even)
sum(nums %in% even)

1.4.4 Scalar objects are vectors

R is often described as a vector-based language. This means that R uses vectors when other languages would use other data structures. Actually, we were cheating a bit when distinguishing between “scalars” and “vectors” above. Although this makes sense when introducing the notion of objects, R actually has no dedicated data structure for scalar objects. Instead, even individual numbers (i.e., objects of type “integer” or “double”) and names or text labels (i.e., objects of type “character”) are actually vectors of length 1. We can verify this by applying the is.vector() and length() functions to the scalar objects defined above:

a
#> [1] 100
is.vector(a)
#> [1] TRUE
length(a)
#> [1] 1
typeof(a)
#> [1] "double"

d
#> [1] "wow"
is.vector(d)
#> [1] TRUE
length(d)
#> [1] 1
typeof(d)
#> [1] "character"

e
#> [1] TRUE
is.vector(e)
#> [1] TRUE
length(e)
#> [1] 1
typeof(e)
#> [1] "logical"

1.4.5 Vector creation functions

The c() function is used for combining existing vectors. However, for creating vectors that contain more than just a few elements (i.e., vectors with larger length() values), using the c() function and then typing all vector elements becomes impractical. Useful functions and shortcuts to generate continuous or regular sequences are the colon operator :, and the functions seq() and rep():

m:n generates a numeric sequence (in steps of $1$ or $-1$ ) from m to n:

# Colon operator (with by = 1):
s1 <- 0:10
s1
#>  [1]  0  1  2  3  4  5  6  7  8  9 10
s2 <- 10:0
all.equal(s1, rev(s2))
#> [1] TRUE

seq() generates numeric sequences from an initial number from to a final number to and allows either setting the step-width by or the length of the sequence length.out:

# Sequences with seq():
s3 <- seq(0, 10, 1)  # is short for: 
s3
#>  [1]  0  1  2  3  4  5  6  7  8  9 10
s4 <- seq(from = 0, to = 10, by = 1)
all.equal(s3, s4)
#> [1] TRUE
all.equal(s1, s3)
#> [1] TRUE

# Note: seq() is more flexible:
s5 <- seq(0, 10, by = 2.5)        # set step size
s5
#> [1]  0.0  2.5  5.0  7.5 10.0
s6 <- seq(0, 10, length.out = 5)  # set output length
all.equal(s5, s6)
#> [1] TRUE

rep() replicates the values provided in its first argument x either times times or each element each times:

# Replicating vectors (with rep):
s7 <- rep(c(0, 1), 3)  # is short for:
s7
#> [1] 0 1 0 1 0 1
s8 <- rep(x = c(0, 1), times = 3)
all.equal(s7, s8)
#> [1] TRUE

# but differs from:
s9 <- rep(x = c(0, 1), each = 3)
s9
#> [1] 0 0 0 1 1 1

Whereas : and seq() create numeric vectors, rep() can be used with other data types:

rep(c(TRUE, FALSE), times = 2)
#> [1]  TRUE FALSE  TRUE FALSE
rep(c("A", "B"), each = 2)
#> [1] "A" "A" "B" "B"

1.4.6 Indexing vectors

We now have learned how inspect vectors and how to determine their type, length, names, and particular elements (see Section 1.4.1 above). But when data is primarily saved in vectors, an important question is:

How can we access, test for, or replace individual vector elements?

In R, the tasks of accessing, testing, or replacing the elements of a vector (and other data structures) are known as indexing, subsetting, or filtering (Matloff, 2011, p. 45ff.). Depending on how we specify the element(s) that we wish to access, test or modify, this process comes in two main varieties (numeric vs. logical):²⁴

A numeric index simply is a number that denotes the position of an element in a sequence. Actually, we already used numeric indices above to access particular elements of vectors, for instance:

# Assuming vector definitions (from above):
train
#>    w01    w02    w03    w04    w05    w06    w07    w08    w09    w10    w11 
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal" 
#>    w12    w13    w14    w15 
#> "coal" "gold" "coal" "coal"
v1
#> [1] 1 2 3 4
v2
#> [1] "A" "B" "C" "D"

# Numeric indexing for 1 element:
train[1]  # get the 1st element of train
#>    w01 
#> "coal"
v1[2]     # get the 2nd element of v1
#> [1] 2
v2[3]     # get the 3rd element of v2
#> [1] "C"

Importantly, selecting particular vector elements by an index requires enclosing their numeric position in square brackets []. By contrast, functions always enclose their arguments in round parentheses ().

A logical index is the result of a test that evaluates to one or more “logical” values (i.e., each value being either TRUE or FALSE).

Why do we need operations for indexing/subsetting/filtering data structures? In R, we often store a large number of values in data structures (e.g., a vector may contain the age values of thousands of participants), but only need some of them for answering specific questions (e.g., what is the average age of all male participants?). To select only a subset of elements from a vector v we can specify the condition or criterion for our selection in (square) brackets v[...].

Supplying a numeric or logical index (in square brackets) is a very flexible and powerful way for selecting elements of a vector v (or of other data structures). The following sections provide examples for the two main types of indexing/subsetting/filtering. Again,
(A) Numeric indexing uses (vectors of) numbers to select (access or change) vector elements, and (B) logical indexing uses (vectors of) logical values to select (access or change) vector elements.

A. Numeric indexing

In numeric indexing (aka. numerical indexing or numeric subsetting), we provide a numeric vector as an index to a vector. Instead of only asking for a single element, we can use a numeric vector of indices to obtain a new vector that contains the elements at the specified positions:

train[c(1, 10, 13)]
#>    w01    w10    w13 
#> "coal" "corn" "gold"
train[1:4]
#>    w01    w02    w03    w04 
#> "coal" "coal" "coal" "coal"
train[10:13]
#>    w10    w11    w12    w13 
#> "corn" "coal" "coal" "gold"

Actually, numeric indexing also works when elements are selected repeatedly:

train[c(1, 1)]
#>    w01    w01 
#> "coal" "coal"
train[rep(10, 3)]
#>    w10    w10    w10 
#> "corn" "corn" "corn"

Using negative indices selects all elements except the negated ones:

train[-13]
#>    w01    w02    w03    w04    w05    w06    w07    w08    w09    w10    w11 
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal" 
#>    w12    w14    w15 
#> "coal" "coal" "coal"
train[-1:-12]
#>    w13    w14    w15 
#> "gold" "coal" "coal"

Importantly, the index used to select element can be the result of a function call. For instance,

length(v2)      # number of elements in v2
#> [1] 4
v2[length(v2)]  # get the last element of v2
#> [1] "D"
v2[c(1, length(v2))]  # get the 1st and last elements of v2
#> [1] "A" "D"

Sometimes, we either know that or wonder whether a particular vector contains particular elements. The which() function uses a logical test to determine the positions for which some test is TRUE and returns the corresponding positions as a numeric vector:

which(v2 == "B")
#> [1] 2
which(v2 != "C")
#> [1] 1 2 4
which(train == "gold")
#> w13 
#>  13
which(train != "coal")
#> w06 w07 w08 w09 w10 w13 
#>   6   7   8   9  10  13

If we wanted to obtain the corresponding elements, we could use these numeric vectors to index or subset the same vectors:

v2[which(v2 == "B")]
#> [1] "B"
v2[which(v2 != "C")]
#> [1] "A" "B" "D"
train[which(train == "gold")]
#>    w13 
#> "gold"
train[which(train != "coal")]
#>    w06    w07    w08    w09    w10    w13 
#> "corn" "corn" "corn" "corn" "corn" "gold"

These last examples first used which() to obtain numeric indices of train and then applied these indices to train. This works, but seems a bit complicated. A more direct way of obtaining the same elements is provided by logical indexing.

B. Logical indexing

In logical indexing (or logical subsetting), we select elements of a vector v by providing a test (i.e., a vector of “logical” objects that are either TRUE or FALSE) in square brackets. Typically, the length of the logical index corresponds to the length of the vector v.
For example, the following statement selects the 1st and 2nd elements of v1:

v1[c(TRUE, TRUE, FALSE, FALSE)]
#> [1] 1 2

Numeric indexing and logical indexing can easily be combined. As we have seen for numeric indices, we can use functions to create the logical index values. We often specify some condition for some object x and use the resulting vector of logical values as an index to select elements of x. Thus, to select specific elements of a vector v, we define the selection criterion as a logical test (i.e., an expression of v that returns a logical vector of the same length as v) and use the result (i.e., a vector of logical values) to index vector v. This is called “subsetting” or “filtering” (Matloff, 2011) and only returns the elements of v for which the criterion is TRUE.

For instance, detecting and getting all elements of v1 that are numbers below 3 could use the following criterion and indexing step:

v1 < 3      # criterion => a logical vector of length(v1)
#> [1]  TRUE  TRUE FALSE FALSE
v1[v1 < 3]  # logical indexing: apply criterion to v1
#> [1] 1 2

Logical indexing allows us to repeat the selections from above (without the detour of numerical indexing by which()):

v2[v2 == "B"]
#> [1] "B"
v2[v2 != "C"]
#> [1] "A" "B" "D"
train[train == "gold"]
#>    w13 
#> "gold"
train[train != "coal"]
#>    w06    w07    w08    w09    w10    w13 
#> "corn" "corn" "corn" "corn" "corn" "gold"

Note that each of these selections uses a vector twice: Once to conduct a test that yields logical indices (i.e., a vector of truth values that has the same length as the original vector) and a second time to select the elements for which the test is TRUE.

Indexing by the `subset()` function

An alternative to using numeric or logical indexing is provided by the subset() function, which takes some data object x (a vector, data frame, or list) and a logical expression as its arguments, and returns the elements for which the logical expression is TRUE:

subset(x = train, train == "corn")
#>    w06    w07    w08    w09    w10 
#> "corn" "corn" "corn" "corn" "corn"
subset(x = v1, v1 > 2)
#> [1] 3 4

Note that indexing by brackets [] and by the subset() function differ in the way in which missing (or NA) values are handled:

(v <- c(1:2, NA, 4:6, NA, 8:10))
#>  [1]  1  2 NA  4  5  6 NA  8  9 10

v[v > 5]          # missing/NA values are preserved
#> [1] NA  6 NA  8  9 10
subset(v, v > 5)  # missing/NA values are lost 
#> [1]  6  8  9 10

As we typically do not want to lose missing values, indexing by brackets is safer than using subset().

Indexing a vector v selected specific elements of v by imposing conditions on them. The results were new vectors that were either of the same type or logical vectors, and either contained the same number or fewer elements than the original vector.

But when creating new vectors, we often want to do more than just selecting elements. A nifty way of new vectors that contain elements from existing vectors is to randomly draw objects from a population (see Section 1.6.4 on Random sampling).

Practice

As creating and indexing/subsetting vectors is an important topic in R, we should practice these tasks more extensively:

Indexing numeric vectors:

Evaluate and explain the following code (in terms of numeric or logical indexing):

v <- 1:10  # create a numeric vector
v

crit <- v > 5  # Condition: Which values of v are larger than 5?
crit

v[crit]   # using crit to select values of v (for which crit is TRUE)
v[v > 5]  # all in 1 step

Can you use numeric indexing to get the same elements as v[v > 5]?

v[6:10]
#> [1]  6 NA  8  9 10
v[6:length(v)]
#> [1]  6 NA  8  9 10

Evaluate and explain the results of the following commands:

x <- 0:8

x[c(2, 4, 6)]
x[x < 3]
x[x %% 2 == 0]

Indexing character vectors:

Predict, evaluate, and explain the results of the following commands:

rhyme <- c("ene", "mene", "miste", "es", "rappelt", "in", "der", "kiste")

length(rhyme)
rhyme[c(1:3, length(rhyme))]
rhyme[nchar(rhyme) <= 3]
rhyme[substr(rhyme, start = 1, stop = 1) == "m"]

Assume the following definition of spices:

spices <- c("salt", "pepper", "cinnamon", "lemongrass", "mint", "mustard", "wasabi")

and then use a combination of functions and numeric or logical indexing for obtaining:

the first three spices
all spices, except the first two
the last spice
all spices in spices with exactly four letters
all spices with eight or more letters
all spices with the letter “i” at their 2nd position

Solution

# numeric indexing: 
spices[1:3]                # first three elements of spices
spices[c(-1, -2)]          # all spices except first two             
spices[c(length(spices))]  # last element of spices

# logical indexing:
spices[nchar(spices) == 4]           # spices with exactly 4 letters
spices[nchar(spices) >= 8]           # spices with at least 8 letters
spices[substr(spices, 2, 2) == "i"]  # spices with an "i" at 2nd position

Note: More sophisticated ways of searching text objects are covered in Appendix E on using regular expressions.

Selecting LETTERS:

The R object LETTERS stores the 26 upper-case letters of the Roman alphabet (and letters the same letters in lowercase). Thus, the vector c("A", "B", "C") can be obtained by numerical subsetting to get the first three letters of LETTERS: LETTERS[1:3].

Solve the following tasks by using functions and numeric or logical indexing:

How many letters are in LETTERS?
Create a vector of all LETTERS up to N
Create a vector of the final six LETTERS
Create a vector of all LETTERS not contained in the LETTERS up to N
What are the positions of the vowels in LETTERS?
Create a vector of all non-vowels in LETTERS

Hint: The letters “A”, “E”, “I”, “O”, and “U”, are vowels.

Solution

# ?LETTERS
length(LETTERS)
up_to_N <- LETTERS[1:which(LETTERS == "N")]
final_6 <- LETTERS[(length(LETTERS) - 5):length(LETTERS)]
LETTERS[!(LETTERS %in% up_to_N)]

vowels <- c("A", "E", "I", "O", "U")  # vector of vowels
LETTERS[!(LETTERS %in% vowels)]       # vector of non-vowels

vowix1 <- which(LETTERS %in% vowels)  # position of vowels (as numbers)
vowix2 <- match(vowels, LETTERS)      # ... (alternative solution)
LETTERS[-vowix1]                      # vector of non-vowels

Creating and modifying vectors:

Use vector creation functions to define

a character vector that repeats the letters “A”, “B”, and “C” five times
a character vector that contains each of the letters “A”, “B”, and “C” five times
a numeric vector that divides the range from 0 to 100 into 6 intervals of equal width
a numeric vector that contains all integers from 1 to 1000 that are divisible by both 7 and 13
a character vector that contains all letters that are not “a”, “b”, or “c”

Solution

All these objects can be created in multiple ways. Possible solutions include:

rep(LETTERS[1:3], times = 5)

rep(LETTERS[1:3],  each = 5)

seq(0, 100, length.out = 7)

v <- rep(1:1000)
v[(v %% 7 == 0) & (v %% 13 == 0)]

letters[-(1:3)]
# or: 
abc <- c("a", "b", "c") 
letters[!(letters %in% abc)]

Re-creating train for training purposes:

We examined a train vector above without showing how it was created:

Use your knowledge of creating vectors to create a vector train_2 that is a similar to train (from above) as possible.

Hint: Note that train is named character vector. Re-creating its elements is straightforward, but re-creating its names may be trickier.

Solution

Creating the elements of the train_2 vector is easy, and there are many ways of doing so. If train was available in our environment, we could simply copy it. If not, we can still re-create it by defining its elements in several ways:

# Solution 0 (if train was available): 
train_2 <- train

# Solution 1: Using individual elements:
train_2 <- c("coal", "coal", "coal", "coal", "coal", 
             "corn", "corn", "corn", "corn", "corn",
             "coal", "coal", "gold", "coal", "coal") 

# Solution 2: Using functions and preplacing 1 element:
train_2     <- c(rep("coal", 5), rep("corn", 5), rep("coal", 5))
train_2[13] <- "gold"

# Solution 3: Using function and preplacing 5+1 elements:
train_2 <- rep("coal", 15)
train_2[6:10] <- "corn"
train_2[13] <- "gold"

# Print result:
train_2

Naming the train_2 vector is more challenging, as it requires another character vector (of equal length as train_2) that is assigned to names(train_2). Here are some potential solutions:

# Solution 0 (if train was available): 
names(train_2) <- names(train)

# Solution 1: Using individual elements:
names(train_2) <- c("w01", "w02", "w03", "w04", "w05", 
                    "w06", "w07", "w08", "w09", "w10", 
                    "w11", "w12", "w13", "w14", "w15")

# Solution 2: Using functions:
names(train_2) <- paste0("w", ds4psy::num_as_char(x = 1:length(train), n_pre_dec = 2, n_dec = 0))

# Print result:
train_2

# Verify equality (if train was available): 
all.equal(train, train_2)

Given the train_2 vector (from 5.), predict, evaluate, and explain the result of the following expressions:

train_2[8:14]
train_2[c(11, 13, 7)]
train_2[99]
train_2[train_2 == "corn"]
train_2[train_2 != "corn"]

"gold" %in% train_2
match("corn", train_2)

which(train_2 == "coal")
which(train_2 != "coal")
which(train_2 == "gold")

typeof(train_2)
typeof(names(train_2))

sum(nchar(train_2))
length(train_2) > nchar(train_2)
sum(length(train_2) > nchar(train_2))

Smart Alex asks: You started this chapter by quoting John Chambers’s statement “Everything that happens is a function call.” (in Section 1.2). But when assigning objects (as in x <- 1:4) or subsetting an object with brackets (x[3]) it seems that things can also happen in R without calling functions. Discuss.

Hint: Check the documentations of ?"<-" and ?"[".

Solution

The statement is true, but ubiquitous R functions often appear in abbreviated forms. Just like evaluating x was an abbreviation for print(x), and x + y was an abbreviation for sum(x, y), both <- and [ actually are R functions. To render their functional nature more obvious, we can re-write them in the standard name(argument) form, but need to enclose their non-standard names in quotes:

# Assignment:
(x <- 1:4)
("<-"(x, 1:4))

# Subsetting: 
x[3]
"["(x, 3)  # reading

# Note:
"[<-"(x, 3, 99)  # writing

From linear vectors to rectangular tables

Having achieved some basic familiarity with vectors, we can now extend our knowledge about data structures by adding a second dimension, which will then yield rectangular tables of data. A good question to recall at this point is:

If we combine the data from multiple vectors into a rectangular table, do such tables in R consist of rows of vectors, columns of vectors, or both?

(The answer was mentioned in discussing Table 1.1 and the practice task of Section 1.2.1).

References

Matloff, N. (2011). The art of R programming: A tour of statistical software design. San Francisco, CA: No Starch Press.

Wickham, H. (2014a). Advanced R (1st ed.). Retrieved from http://adv-r.had.co.nz/

Strictly speaking, the precise term for such sequences is atomic vectors. The prominence of vectors is partly due to the fact that they are the building blocks of other data structures. For instance, lists are also vectors, but can store elements of different types. (See Wickham, 2014a for details.)↩︎
Elements of a named vector v can also be accessed by named indexing (e.g., v$name), but as many vectors do not have names, this is more relevant for accessing parts of tables or lists.↩︎

1.4 Vectors

1.4.1 Basics

Exploring a vector

Accessing elements

Practice

Solution

1.4.2 Creating vectors

1.4.3 Applying functions to vectors

Calculating with logical values

Practice

1.4.4 Scalar objects are vectors

1.4.5 Vector creation functions

1.4.6 Indexing vectors

A. Numeric indexing

B. Logical indexing

Indexing by the subset() function

Practice

Solution

Solution

Solution

Solution

Solution

From linear vectors to rectangular tables

References

Indexing by the `subset()` function