1.4 Vectors
Vectors are by far the most common and the most important data structure in R.
For our purposes, a vector stores data in the shape of a linear sequence of elements that are all of the same type.23 Any vector x
has three common properties:
- its length (returned by
length(x)
); - its type of elements (returned by
typeof(x)
ormode(x)
); - optional attributes or meta-data (returned by
attributes(x)
).
The most common attributes of a vector x
are the names of its elements (which can be set or returned by names(x)
), or — for vectors that are shaped as matrices or arrays — its shape (returned by dim(x)
).
1.4.1 Basics
In contrast to most other sections in this book, this section examines a vector without defining it.
Hence, please read and try to understand the following paragraphs without worrying about the definition of the train
vector.
We can think of a vector as a freight train that contains a sequence of waggons:
Each waggon can contain some cargo (data elements), but all elements must be of the same data type (an entire train of all people or all cargo waggons, or data elements that are all of logical
, numeric
, or character
type).
As the waggons are indistinguishable from the outside, the contents of any particular waggon can be determined by addressing the waggon by its name (an optional attribute) or its number (or position) in the sequence.
Exploring a vector
For instance, suppose there exists an R vector named train
.
Although its definition is not shown here, the following functions provide some basic information about it:
# Print the vector:
# (see its names and elements)
train #> w01 w02 w03 w04 w05 w06 w07 w08 w09 w10 w11
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal"
#> w12 w13 w14 w15
#> "coal" "gold" "coal" "coal"
# Basic information about shape and type:
length(train) # How many waggons/elements does train consist of?
#> [1] 15
typeof(train) # Of which type are these waggons/elements?
#> [1] "character"
is.character(train)
#> [1] TRUE
is.logical (train)
#> [1] FALSE
is.numeric(train)
#> [1] FALSE
# Data structure:
is.vector(train)
#> [1] TRUE
is.list(train)
#> [1] FALSE
# Details:
names(train) # What are the names of the waggons/elements?
#> [1] "w01" "w02" "w03" "w04" "w05" "w06" "w07" "w08" "w09" "w10" "w11" "w12"
#> [13] "w13" "w14" "w15"
str(train) # What is the structure of train vector?
#> Named chr [1:15] "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" ...
#> - attr(*, "names")= chr [1:15] "w01" "w02" "w03" "w04" ...
# Accessing the contents of particular waggons/elements:
1] # by number: What's in 1st waggon/element?
train[#> w01
#> "coal"
"w01"] # by name: What's in the waggon named "w001"?
train[#> w01
#> "coal"
To obtain information about the train
vector, we applied some generic functions to it:
Evaluating
train
prints the contents of the vector in our Console. Actually, this was merely a shorthand forprint(train)
. If the vector is short enough to be printed in its entirety, we can obtain comprehensive information about it (i.e., its shape and type, and the values and names of its elements). But even if the vector is too long to be printed, we can still learn its shape and type, whether its elements are named, and what its first elements are.As vectors are one-dimensional sequences, the
length()
function determines their shape.As vectors can only have one type, the
typeof()
function explicates their type.Vectors can have optional names. The
names()
function determines the names of a vectorv
and returnsNULL
if its elements are not named.
Accessing elements
The contents of particular vector elements can be accessed by their number or their name (if v
contains names).
To access vector elements, R uses square brackets []
:
# Accessing the contents of particular waggons/elements:
1] # by number: What's in 1st waggon/element?
train[#> w01
#> "coal"
"w01"] # by name: What's in the waggon named "w001"?
train[#> w01
#> "coal"
The command v[n]
returns the n
-th element of a vector v
, and
the command v["abc"]
returns the element named “abc” of a vector v
(if v
contains an element with this name).
We will learn more about accessing particular vector elements when we learn more about indexing below (in Section 1.4.6).
Practice
The following objects v1
and v2
are both vectors (i.e., an ordered sequence of elements) that have a length of 4 elements, but are of different types:
#> [1] 1 2 3 4
#> [1] "A" "B" "C" "D"
- Use basic vector functions to answer the following questions:
Of what type are the elements of
v1
andv2
?What is their i-th (e.g., 2nd) element?
What is their length?
Applying basic vector functions:
# Of which type are the vectors?
typeof(v1)
#> [1] "integer"
typeof(v2)
#> [1] "character"
# What is the 2nd element?
2]
v1[#> [1] 2
2]
v2[#> [1] "B"
# What is the length of a vector?
length(v1)
#> [1] 4
length(v2)
#> [1] 4
- Additional functions designed for vectors allow addressing more advanced tasks like the following:
What is the reverse sequence of
v2
?Do
v1
orv2
contain some element"A"
?Where does an element “A” (first) occur in some vector
v2
?Which element of
v2
has a particular value (e.g.,"B"
or"Z"
)?
Note that some of these questions ask for a vector of elements, whereas others are answered by a simple binary (e.g., TRUE
or FALSE
) value, or a vector of numbers (positions). To solve these tasks, we can apply slightly more advanced functions to vectors:
- The function
rev()
reverses a vector:
rev(v2) # reverse the elements of v2
#> [1] "D" "C" "B" "A"
- The function
%in%
(used as a binary operator between its two arguments) checks whether the left (first) argument is an element of the right (second) one:
"A" %in% v1 # does v1 contain an element "A"?
#> [1] FALSE
"A" %in% v2 # does v2 contain an element "A"?
#> [1] TRUE
# Compare:
== "C" # is (element of) v2 equal to "C"?
v2 #> [1] FALSE FALSE TRUE FALSE
%in% "C" # is (element of) v2 "C"?
v2 #> [1] FALSE FALSE TRUE FALSE
!= "D"
v2 #> [1] TRUE TRUE TRUE FALSE
!(v2 %in% "D")
#> [1] TRUE TRUE TRUE FALSE
# Note some details:
2 %in% v1 # does v1 contain an element 2?
#> [1] TRUE
"2" %in% v1 # does v1 contain an element "2"? (!)
#> [1] TRUE
" 2" %in% v1 # does v1 contain an element " 2"?
#> [1] FALSE
# ?`%in%` # provides additional documentation.
- The
%in%
operator is based on thematch()
function that returns the positions of (first) matches of its first argument in its second argument:
# Get position of (first) match:
match(x = c("A", "C", "Z"),
table = c("A", "B", "A", "Z") # may be a vector
)#> [1] 1 NA 4
# Using v1 and v2:
match("A", v1, nomatch = 0) # returns 0 if there is no match
#> [1] 0
match("B", v2)
#> [1] 2
- The
which()
function returns the position(s) (i.e., numbers) for which a test on a vector evaluates toTRUE
:
# Position of TRUE test values:
which(v1 > 2) # position of elements for which v1 > 2?
#> [1] 3 4
which(v2 == "C") # which element of v2 is "C"?
#> [1] 3
which(v2 != "A") # which element of v2 is NOT "A"?
#> [1] 2 3 4
which(v2 == "X") # which element of v2 is "X"?
#> integer(0)
Thus, which()
allows us to probe a vector with some test and returns a (numeric) vector of the positions for which the test is TRUE
.
We will learn more about these and many other vector-based functions later. For now, just try to understand these basic functions and remember that there are many ways to probe and manipulate vectors.
1.4.2 Creating vectors
To create a new vector, we can combine several objects of the same type with the c()
function (think chain, combine, or concatenate), separating vector elements by commas:
# Creating vectors:
c(2, 4, 6)
#> [1] 2 4 6
# Combining scalar objects (defined above):
c(a, b)
#> [1] 100 500
<- c(a, b, c)
v
v#> [1] 100 500 3
<- c(c, c, c) # vectors can have repeated elements
v
v#> [1] 3 3 3
The same function can be used to combine scalars and vectors:
# Combining scalar objects and vectors (defined above):
<- c(a, b, v) # Note that vectors can contain vectors, ...
v
v#> [1] 100 500 3 3 3
<- c(v, v) # but the result is only 1 vector, not 2:
v
v#> [1] 100 500 3 3 3 100 500 3 3 3
but note that the resulting vector is still a vector, rather than a vector containing other vectors (i.e., c()
flattens vectors).
The c()
function can also be used to combine objects of type character or logical values into vectors:
# Combining other data types:
<- c(TRUE, FALSE)
u
u#> [1] TRUE FALSE
<- c("one", "two", "three")
w <- c(w, "four", "5", "many")
w
w#> [1] "one" "two" "three" "four" "5" "many"
When combining different data types, they are coerced into a single data type. The result is either a numeric vector (when mixing truth values and numberic objects) or a character vector (when mixing anything with characters):
# Combining different data types:
<- c(TRUE, 2L, 3.0) # logical, integer, double
x
x#> [1] 1 2 3
typeof(x)
#> [1] "double"
<- c(TRUE, "two") # logical, character
y
y#> [1] "TRUE" "two"
typeof(y)
#> [1] "character"
<- c(TRUE, 2, "three") # logical, numeric, character
z
z#> [1] "TRUE" "2" "three"
typeof(z)
#> [1] "character"
In R, we are almost always dealing with vectors. These vectors are either created by us (by using functions that create or return vectors) or read in as data. As we will see below (in Section 1.5), data files in the form of tables are really columns of vectors.
1.4.3 Applying functions to vectors
Most functions in R can be applied to vectors. When applying a function to a vector, we typically either obtain some information about the vector (as a scalar object) or a transformed version of the vector:
# (a) Information about a vector (as a scalar):
length(v)
#> [1] 10
typeof(v)
#> [1] "double"
sum(v)
#> [1] 1218
length(w)
#> [1] 6
typeof(w)
#> [1] "character"
# sum(w) # would yield an error
# (b) Changing a vector by applying a function:
<- c(2, 4, 6)
n
+ 1 # a numeric vector:
n #> [1] 3 5 7
> 3 # a logical vector:
n #> [1] FALSE TRUE TRUE
<- c(2>1, 1>2, 2==2)
values # vector of truth values
values #> [1] TRUE FALSE TRUE
!values # reverses truth values
#> [1] FALSE TRUE FALSE
When combining data of different types (e.g., numbers and characters) in a vector, R coerces them into a common type (e.g., truth values into numbers, or all into objects of type “character,” as this is always possible):
<- c(1, "two", 3)
x
x#> [1] "1" "two" "3"
typeof(x) # converted 1 to "1"
#> [1] "character"
<- c(TRUE, "FALSE")
y
y#> [1] "TRUE" "FALSE"
typeof(y) # converted TRUE to "TRUE"
#> [1] "character"
<- c(TRUE, 2)
z
z#> [1] 1 2
typeof(z) # converted TRUE to 1.0
#> [1] "double"
A common operation that changes an existing vector consists in sorting vectors, which is achieved by the sort()
function. An argument decreasing
is set to FALSE
by default, but can be set to TRUE
if sorting in decreasing order is desired:
<- c(4, 6, 2)
x
sort(x)
#> [1] 2 4 6
sort(x, decreasing = TRUE)
#> [1] 6 4 2
Calculating with logical values
When combining truth values and numeric objects in a vector, a numeric vector is returned.
Any value of TRUE
is interpreted as the number 1 and any value of FALSE
is interpreted as the number 0:
TRUE + FALSE
#> [1] 1
TRUE - FALSE + TRUE
#> [1] 2
3 * TRUE - 11 * FALSE/7
#> [1] 3
The same interpretation of truth values is made when applying arithmetic functions to truth values:
sum(c(TRUE, FALSE, TRUE))
#> [1] 2
mean(c(TRUE, FALSE, FALSE))
#> [1] 0.3333333
Calculating with logical values may seem a bit strange at first, but provides a useful bridge between logical and numeric data types.
Practice
Here are some practice exercises for applying functions to various types of vectors:
- What happens when we combine truth values (aka. logicals) and numeric objects into a vector?
<- c(TRUE, FALSE, 100)
y
y
typeof(y)
sum(y)
Answer: The truth values are coerced into numeric values: FALSE
becomes 0 and TRUE
becomes 1.
- Probing ABBA:
Assuming a vector band <- c("A", "B", "B", "A")
.
- Predict and explain the results of the following expressions:
<- c("A", "B", "B", "A") # define vector
band
"B" == band # a logical vector of length 4
"B" %in% band # a logical scalar
match("B", band) # a numeric scalar
which("B" == band) # a numeric vector
- What happens when we use
sort()
on (a) a character vector, or (b) on a logical vector?
# (a) Character vector:
<- c("Banana", "Cucumber", "Apple")
z
zsort(z)
sort(z, decreasing = TRUE)
# (b) Logical vector:
<- (c(1, 2, 3, 4, 5) %% 2 == 0)
z
zsort(z)
sort(z, decreasing = TRUE)
Answer: Both vectors are sorted, just as we have seen for numeric vectors above.
Vectors with character objects are sorted alphabetically.
In logical vectors, FALSE
values (corresponding to values of 0) are sorted before TRUE
values (corresponding to values of 1).
Look up the documentation of the
order()
function and contrast it with thesort()
function.Computing with numbers vs. logical values:
Assume the following vector definitions:
<- c(0, 2, 4, 6, 8)
even <- c(1, 3, 5, 7, 9)
odd <- sort(c(even, odd)) nums
- Predict and explain the results of the following expressions:
sum(even)
sum(nums %in% even)
1.4.4 Scalar objects are vectors
R is often described as a vector-based language. This means that R uses vectors when other languages would use other data structures. Actually, we were cheating a bit when distinguishing between “scalars” and “vectors” above. Although this makes sense when introducing the notion of objects, R actually has no dedicated data structure for scalar objects. Instead, even individual numbers (i.e., objects of type “integer” or “double”) and names or text labels (i.e., objects of type “character”) are actually vectors of length 1. We can verify this by applying the is.vector()
and length()
functions to the scalar objects defined above:
a#> [1] 100
is.vector(a)
#> [1] TRUE
length(a)
#> [1] 1
typeof(a)
#> [1] "double"
d#> [1] "wow"
is.vector(d)
#> [1] TRUE
length(d)
#> [1] 1
typeof(d)
#> [1] "character"
e#> [1] TRUE
is.vector(e)
#> [1] TRUE
length(e)
#> [1] 1
typeof(e)
#> [1] "logical"
1.4.5 Vector creation functions
The c()
function is used for combining existing vectors.
However, for creating vectors that contain more than just a few elements (i.e., vectors with larger length()
values), using the c()
function and then typing all vector elements becomes impractical.
Useful functions and shortcuts to generate continuous or regular sequences are the colon operator :
, and the functions seq()
and rep()
:
m:n
generates a numeric sequence (in steps of \(1\) or \(-1\)) fromm
ton
:
# Colon operator (with by = 1):
<- 0:10
s1
s1#> [1] 0 1 2 3 4 5 6 7 8 9 10
<- 10:0
s2 all.equal(s1, rev(s2))
#> [1] TRUE
seq()
generates numeric sequences from an initial numberfrom
to a final numberto
and allows either setting the step-widthby
or the length of the sequencelength.out
:
# Sequences with seq():
<- seq(0, 10, 1) # is short for:
s3
s3#> [1] 0 1 2 3 4 5 6 7 8 9 10
<- seq(from = 0, to = 10, by = 1)
s4 all.equal(s3, s4)
#> [1] TRUE
all.equal(s1, s3)
#> [1] TRUE
# Note: seq() is more flexible:
<- seq(0, 10, by = 2.5) # set step size
s5
s5#> [1] 0.0 2.5 5.0 7.5 10.0
<- seq(0, 10, length.out = 5) # set output length
s6 all.equal(s5, s6)
#> [1] TRUE
rep()
replicates the values provided in its first argumentx
eithertimes
times or each elementeach
times:
# Replicating vectors (with rep):
<- rep(c(0, 1), 3) # is short for:
s7
s7#> [1] 0 1 0 1 0 1
<- rep(x = c(0, 1), times = 3)
s8 all.equal(s7, s8)
#> [1] TRUE
# but differs from:
<- rep(x = c(0, 1), each = 3)
s9
s9#> [1] 0 0 0 1 1 1
Whereas :
and seq()
create numeric vectors, rep()
can be used with other data types:
rep(c(TRUE, FALSE), times = 2)
#> [1] TRUE FALSE TRUE FALSE
rep(c("A", "B"), each = 2)
#> [1] "A" "A" "B" "B"
1.4.6 Indexing vectors
We now have learned how inspect vectors and how to determine their type, length, names, and particular elements (see Section 1.4.1 above). But when data is primarily saved in vectors, an important question is:
- How can we access, test for, or replace individual vector elements?
In R, the tasks of accessing, testing, or replacing the elements of a vector (and other data structures) are known as indexing, subsetting, or filtering (Matloff, 2011, p. 45ff.). Depending on how we specify the element(s) that we wish to access, test or modify, this process comes in two main varieties (numeric vs. logical):24
- A numeric index simply is a number that denotes the position of an element in a sequence. Actually, we already used numeric indices above to access particular elements of vectors, for instance:
# Assuming vector definitions (from above):
train#> w01 w02 w03 w04 w05 w06 w07 w08 w09 w10 w11
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal"
#> w12 w13 w14 w15
#> "coal" "gold" "coal" "coal"
v1#> [1] 1 2 3 4
v2#> [1] "A" "B" "C" "D"
# Numeric indexing for 1 element:
1] # get the 1st element of train
train[#> w01
#> "coal"
2] # get the 2nd element of v1
v1[#> [1] 2
3] # get the 3rd element of v2
v2[#> [1] "C"
Importantly, selecting particular vector elements by an index requires enclosing their numeric position in square brackets []
.
By contrast, functions always enclose their arguments in round parentheses ()
.
- A logical index is the result of a test that evaluates to one or more “logical” values (i.e., each value being either
TRUE
orFALSE
).
Why do we need operations for indexing/subsetting/filtering data structures?
In R, we often store a large number of values in data structures (e.g., a vector may contain the age values of thousands of participants), but only need some of them for answering specific questions (e.g., what is the average age of all male participants?). To select only a subset of elements from a vector v
we can specify the condition or criterion for our selection in (square) brackets v[...]
.
Supplying a numeric or logical index (in square brackets) is a very flexible and powerful way for selecting elements of a vector v
(or of other data structures).
The following sections provide examples for the two main types of indexing/subsetting/filtering.
Again,
(A) Numeric indexing uses (vectors of) numbers to select (access or change) vector elements, and
(B) logical indexing uses (vectors of) logical values to select (access or change) vector elements.
A. Numeric indexing
In numeric indexing (aka. numerical indexing or numeric subsetting), we provide a numeric vector as an index to a vector. Instead of only asking for a single element, we can use a numeric vector of indices to obtain a new vector that contains the elements at the specified positions:
c(1, 10, 13)]
train[#> w01 w10 w13
#> "coal" "corn" "gold"
1:4]
train[#> w01 w02 w03 w04
#> "coal" "coal" "coal" "coal"
10:13]
train[#> w10 w11 w12 w13
#> "corn" "coal" "coal" "gold"
Actually, numeric indexing also works when elements are selected repeatedly:
c(1, 1)]
train[#> w01 w01
#> "coal" "coal"
rep(10, 3)]
train[#> w10 w10 w10
#> "corn" "corn" "corn"
Using negative indices selects all elements except the negated ones:
-13]
train[#> w01 w02 w03 w04 w05 w06 w07 w08 w09 w10 w11
#> "coal" "coal" "coal" "coal" "coal" "corn" "corn" "corn" "corn" "corn" "coal"
#> w12 w14 w15
#> "coal" "coal" "coal"
-1:-12]
train[#> w13 w14 w15
#> "gold" "coal" "coal"
Importantly, the index used to select element can be the result of a function call. For instance,
length(v2) # number of elements in v2
#> [1] 4
length(v2)] # get the last element of v2
v2[#> [1] "D"
c(1, length(v2))] # get the 1st and last elements of v2
v2[#> [1] "A" "D"
Sometimes, we either know that or wonder whether a particular vector contains particular elements.
The which()
function uses a logical test to determine the positions for which some test is TRUE
and returns the corresponding positions as a numeric vector:
which(v2 == "B")
#> [1] 2
which(v2 != "C")
#> [1] 1 2 4
which(train == "gold")
#> w13
#> 13
which(train != "coal")
#> w06 w07 w08 w09 w10 w13
#> 6 7 8 9 10 13
If we wanted to obtain the corresponding elements, we could use these numeric vectors to index or subset the same vectors:
which(v2 == "B")]
v2[#> [1] "B"
which(v2 != "C")]
v2[#> [1] "A" "B" "D"
which(train == "gold")]
train[#> w13
#> "gold"
which(train != "coal")]
train[#> w06 w07 w08 w09 w10 w13
#> "corn" "corn" "corn" "corn" "corn" "gold"
These last examples first used which()
to obtain numeric indices of train
and then applied these indices to train
.
This works, but seems a bit complicated. A more direct way of obtaining the same elements is provided by logical indexing.
B. Logical indexing
In logical indexing (or logical subsetting), we select elements of a vector v
by providing a test (i.e., a vector of “logical” objects that are either TRUE
or FALSE
) in square brackets.
Typically, the length of the logical index corresponds to the length of the vector v
.
For example, the following statement selects the 1st and 2nd elements of v1
:
c(TRUE, TRUE, FALSE, FALSE)]
v1[#> [1] 1 2
Numeric indexing and logical indexing can easily be combined.
As we have seen for numeric indices, we can use functions to create the logical index values.
We often specify some condition for some object x
and use the resulting vector of logical values as an index to select elements of x
.
Thus, to select specific elements of a vector v
, we define the selection criterion as a logical test (i.e., an expression of v
that returns a logical vector of the same length as v
) and use the result (i.e., a vector of logical values) to index vector v
. This is called “subsetting” or “filtering” (Matloff, 2011) and only returns the elements of v
for which the criterion is TRUE
.
For instance, detecting and getting all elements of v1
that are numbers below 3 could use the following criterion and indexing step:
< 3 # criterion => a logical vector of length(v1)
v1 #> [1] TRUE TRUE FALSE FALSE
< 3] # logical indexing: apply criterion to v1
v1[v1 #> [1] 1 2
Logical indexing allows us to repeat the selections from above (without the detour of numerical indexing by which()
):
== "B"]
v2[v2 #> [1] "B"
!= "C"]
v2[v2 #> [1] "A" "B" "D"
== "gold"]
train[train #> w13
#> "gold"
!= "coal"]
train[train #> w06 w07 w08 w09 w10 w13
#> "corn" "corn" "corn" "corn" "corn" "gold"
Note that each of these selections uses a vector twice: Once to conduct a test that yields logical indices (i.e., a vector of truth values that has the same length as the original vector) and a second time to select the elements for which the test is TRUE
.
Indexing by the subset()
function
An alternative to using numeric or logical indexing is provided by the subset()
function, which takes some data object x
(a vector, data frame, or list) and a logical expression as its arguments, and returns the elements for which the logical expression is TRUE
:
subset(x = train, train == "corn")
#> w06 w07 w08 w09 w10
#> "corn" "corn" "corn" "corn" "corn"
subset(x = v1, v1 > 2)
#> [1] 3 4
Note that indexing by brackets []
and by the subset()
function differ in the way in which missing (or NA
) values are handled:
<- c(1:2, NA, 4:6, NA, 8:10))
(v #> [1] 1 2 NA 4 5 6 NA 8 9 10
> 5] # missing/NA values are preserved
v[v #> [1] NA 6 NA 8 9 10
subset(v, v > 5) # missing/NA values are lost
#> [1] 6 8 9 10
As we typically do not want to lose missing values, indexing by brackets is safer than using subset()
.
Indexing a vector v
selected specific elements of v
by imposing conditions on them.
The results were new vectors that were either of the same type or logical vectors, and either contained the same number or fewer elements than the original vector.
But when creating new vectors, we often want to do more than just selecting elements. A nifty way of new vectors that contain elements from existing vectors is to randomly draw objects from a population (see Section 1.6.4 on Random sampling).
Practice
As creating and indexing/subsetting vectors is an important topic in R, we should practice these tasks more extensively:
- Indexing numeric vectors:
- Evaluate and explain the following code (in terms of numeric or logical indexing):
<- 1:10 # create a numeric vector
v
v
<- v > 5 # Condition: Which values of v are larger than 5?
crit
crit
# using crit to select values of v (for which crit is TRUE)
v[crit] > 5] # all in 1 step v[v
- Can you use numeric indexing to get the same elements as
v[v > 5]
?
6:10]
v[#> [1] 6 NA 8 9 10
6:length(v)]
v[#> [1] 6 NA 8 9 10
- Evaluate and explain the results of the following commands:
<- 0:8
x
c(2, 4, 6)]
x[< 3]
x[x %% 2 == 0] x[x
- Indexing character vectors:
- Predict, evaluate, and explain the results of the following commands:
<- c("ene", "mene", "miste", "es", "rappelt", "in", "der", "kiste")
rhyme
length(rhyme)
c(1:3, length(rhyme))]
rhyme[nchar(rhyme) <= 3]
rhyme[substr(rhyme, start = 1, stop = 1) == "m"] rhyme[
Assume the following definition of spices
:
<- c("salt", "pepper", "cinnamon", "lemongrass", "mint", "mustard", "wasabi") spices
and then use a combination of functions and numeric or logical indexing for obtaining:
the first three spices
all spices, except the first two
the last spice
all spices in
spices
with exactly four lettersall spices with eight or more letters
all spices with the letter “i” at their 2nd position
Solution
# numeric indexing:
1:3] # first three elements of spices
spices[c(-1, -2)] # all spices except first two
spices[c(length(spices))] # last element of spices
spices[
# logical indexing:
nchar(spices) == 4] # spices with exactly 4 letters
spices[nchar(spices) >= 8] # spices with at least 8 letters
spices[substr(spices, 2, 2) == "i"] # spices with an "i" at 2nd position spices[
Note: More sophisticated ways of searching text objects are covered in Appendix E on using regular expressions.
- Selecting
LETTERS
:
The R object LETTERS
stores the 26 upper-case letters of the Roman alphabet (and letters
the same letters in lowercase). Thus, the vector c("A", "B", "C")
can be obtained by numerical subsetting to get the first three letters of LETTERS
: LETTERS[1:3]
.
Solve the following tasks by using functions and numeric or logical indexing:
How many letters are in
LETTERS
?Create a vector of all
LETTERS
up to NCreate a vector of the final six
LETTERS
Create a vector of all
LETTERS
not contained in theLETTERS
up to NWhat are the positions of the vowels in
LETTERS
?Create a vector of all non-vowels in
LETTERS
Hint: The letters “A,” “E,” “I,” “O,” and “U,” are vowels.
Solution
# ?LETTERS
length(LETTERS)
<- LETTERS[1:which(LETTERS == "N")]
up_to_N <- LETTERS[(length(LETTERS) - 5):length(LETTERS)]
final_6 !(LETTERS %in% up_to_N)]
LETTERS[
<- c("A", "E", "I", "O", "U") # vector of vowels
vowels !(LETTERS %in% vowels)] # vector of non-vowels
LETTERS[
<- which(LETTERS %in% vowels) # position of vowels (as numbers)
vowix1 <- match(vowels, LETTERS) # ... (alternative solution)
vowix2 -vowix1] # vector of non-vowels LETTERS[
- Creating and modifying vectors:
Use vector creation functions to define
a character vector that repeats the letters “A,” “B,” and “C” five times
a character vector that contains each of the letters “A,” “B,” and “C” five times
a numeric vector that divides the range from 0 to 100 into 6 intervals of equal width
a numeric vector that contains all integers from 1 to 1000 that are divisible by both 7 and 13
a character vector that contains all
letters
that are not “a,” “b,” or “c”
Solution
All these objects can be created in multiple ways. Possible solutions include:
rep(LETTERS[1:3], times = 5)
rep(LETTERS[1:3], each = 5)
seq(0, 100, length.out = 7)
<- rep(1:1000)
v %% 7 == 0) & (v %% 13 == 0)]
v[(v
-(1:3)]
letters[# or:
<- c("a", "b", "c")
abc !(letters %in% abc)] letters[
- Re-creating
train
for training purposes:
We examined a train
vector above without showing how it was created:
- Use your knowledge of creating vectors to create a vector
train_2
that is a similar totrain
(from above) as possible.
Hint: Note that train
is named character vector.
Re-creating its elements is straightforward, but re-creating its names may be trickier.
Solution
- Creating the elements of the
train_2
vector is easy, and there are many ways of doing so. Iftrain
was available in our environment, we could simply copy it. If not, we can still re-create it by defining its elements in several ways:
# Solution 0 (if train was available):
<- train
train_2
# Solution 1: Using individual elements:
<- c("coal", "coal", "coal", "coal", "coal",
train_2 "corn", "corn", "corn", "corn", "corn",
"coal", "coal", "gold", "coal", "coal")
# Solution 2: Using functions and preplacing 1 element:
<- c(rep("coal", 5), rep("corn", 5), rep("coal", 5))
train_2 13] <- "gold"
train_2[
# Solution 3: Using function and preplacing 5+1 elements:
<- rep("coal", 15)
train_2 6:10] <- "corn"
train_2[13] <- "gold"
train_2[
# Print result:
train_2
- Naming the
train_2
vector is more challenging, as it requires another character vector (of equal length astrain_2
) that is assigned tonames(train_2)
. Here are some potential solutions:
# Solution 0 (if train was available):
names(train_2) <- names(train)
# Solution 1: Using individual elements:
names(train_2) <- c("w01", "w02", "w03", "w04", "w05",
"w06", "w07", "w08", "w09", "w10",
"w11", "w12", "w13", "w14", "w15")
# Solution 2: Using functions:
names(train_2) <- paste0("w", ds4psy::num_as_char(x = 1:length(train), n_pre_dec = 2, n_dec = 0))
# Print result:
train_2
# Verify equality (if train was available):
all.equal(train, train_2)
- Given the
train_2
vector (from 5.), predict, evaluate, and explain the result of the following expressions:
8:14]
train_2[c(11, 13, 7)]
train_2[99]
train_2[== "corn"]
train_2[train_2 != "corn"]
train_2[train_2
"gold" %in% train_2
match("corn", train_2)
which(train_2 == "coal")
which(train_2 != "coal")
which(train_2 == "gold")
typeof(train_2)
typeof(names(train_2))
sum(nchar(train_2))
length(train_2) > nchar(train_2)
sum(length(train_2) > nchar(train_2))
- Smart Alex asks: You started this chapter by quoting John Chambers’s statement “Everything that happens is a function call.” (in Section 1.2). But when assigning objects (as in
x <- 1:4
) or subsetting an object with brackets (x[3]
) it seems that things can also happen in R without calling functions. Discuss.
Hint: Check the documentations of ?"<-"
and ?"["
.
Solution
The statement is true, but ubiquitous R functions often appear in abbreviated forms.
Just like evaluating x
was an abbreviation for print(x)
, and x + y
was an abbreviation for sum(x, y)
, both <-
and [
actually are R functions.
To render their functional nature more obvious, we can re-write them in the standard name(argument)
form, but need to enclose their non-standard names in quotes:
# Assignment:
<- 1:4)
(x "<-"(x, 1:4))
(
# Subsetting:
3]
x["["(x, 3) # reading
# Note:
"[<-"(x, 3, 99) # writing
From linear vectors to rectangular tables
Having achieved some basic familiarity with vectors, we can now extend our knowledge about data structures by adding a second dimension, which will then yield rectangular tables of data. A good question to recall at this point is:
- If we combine the data from multiple vectors into a rectangular table, do such tables in R consist of rows of vectors, columns of vectors, or both?
(The answer was mentioned in discussing Table 1.1 and the practice task of Section 1.2.1).
References
Strictly speaking, the precise term for such sequences is atomic vectors. The prominence of vectors is partly due to the fact that they are the building blocks of other data structures. For instance, lists are also vectors, but can store elements of different types. (See Wickham, 2014a for details.)↩︎
Elements of a named vector
v
can also be accessed by named indexing (e.g.,v$name
), but as many vectors do not have names, this is more relevant for accessing parts of tables or lists.↩︎