1.4 Vectors

Vectors are the most common and most important data type in R. For our purposes, a vector is an ordered sequence of objects of the same type.

1.4.1 Basics

Think of a vector as a freight train that consists of the same type of waggons: Each waggon can contain different cargo (objects), but all of these objects must be of the same type (e.g., numbers or characters). As the waggons are indistinguishable from the outside, the contents of any particular waggon can be determined by addressing the waggon by its name (if it has one) or its number in the sequence. For instance, suppose there is a vector named train. The following functions provide some basic information about it:

# Print the vector:
train  # (see its names and elements)
#>   w001   w002   w003   w004   w005   w006   w007   w008   w009   w010   w011 
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "corn" 
#>   w012   w013   w014   w015   w016   w017   w018   w019   w020   w021   w022 
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold" "coal" 
#>   w023   w024   w025   w026   w027   w028   w029   w030   w031 
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal"

# Basic information about a vector:
typeof(train)  # Of which type are these waggons/elements?
#> [1] "character"
length(train)  # How many waggons/elements does train consist of?
#> [1] 31
names(train)   # What are the names of the waggons/elements?
#>  [1] "w001" "w002" "w003" "w004" "w005" "w006" "w007" "w008" "w009" "w010"
#> [11] "w011" "w012" "w013" "w014" "w015" "w016" "w017" "w018" "w019" "w020"
#> [21] "w021" "w022" "w023" "w024" "w025" "w026" "w027" "w028" "w029" "w030"
#> [31] "w031"

# Accessing the contents of particular waggons/elements:
train[1]       # by number: What's in 1st waggon/element?
#>   w001 
#> "coal"
train["w001"]  # by name:   What's in the waggon named "w001"?
#>   w001 
#> "coal"

To obtain information about the vector train, we applied some basic functions to it. Let’s explain each of these functions in turn:

  • Evaluating train prints the entire vector in our Console. Actually, it is merely a shorthand for print(train). If the vector is short enough to be printed in its entirety, we can obtain comprehensive information about it (i.e., its type, length, names, and all its elements). But even if the vector is too long to be printed, we can still learn its type, whether its elements are named, and what its first elements are.

  • We learned that all elements of a vector are of the same type. The function typeof(v) explicitly determines the type of a vector v.

  • As vectors are 1-dimensional sequences, they have a length. The function length(v) determines the length of a vector v.

  • Vectors can have names, but do not need to have them. The function names(v) determines the names of a vector v and returns NULL if its elements are not named.

  • The contents of particular vector elements can be accessed by their number or their name (if v contains names). To access vector elements, R uses square brackets []. The command v[n] returns the n-th element of a vector v, and the command v["abc"] returns the element named “abc” of a vector v (if v contains an element with this name).

We will learn more about accessing particular vector elements when we learn more about indexing below (in Section 1.4.5).

Practice

The following objects v1 and v2 are both vectors (i.e., an ordered sequence of elements) that have a length of 4 elements, but are of different types:

#> [1] 1 2 3 4
#> [1] "A" "B" "C" "D"
  1. Use basic vector functions to answer the following questions:
  • Of what type are the elements of v1 and v2?
  • What is their i-th (e.g., 2nd) element?
  • What is their length?

Applying basic vector functions:

# Of which type are the vectors?
typeof(v1)
#> [1] "integer"
typeof(v2)
#> [1] "character"
# What is the 2nd element?
v1[2]        
#> [1] 2
v2[2]
#> [1] "B"
# What is their length?
length(v1)
#> [1] 4
length(v2)
#> [1] 4
  1. Additional vector functions allow answering question like the following:
  • What is the reverse sequence of v2?
  • Do v1 or v2 contain some element "A"?
  • Which element of v2 has a particular value (e.g., "B" or "Z")?

To answer these questions, we can apply slightly more advanced functions to vectors:

rev(v2)      # reverse of v2? 
#> [1] "D" "C" "B" "A"
"A" %in% v1  # does v1 contain "A"? 
#> [1] FALSE
"A" %in% v2  # does v2 contain "A"? 
#> [1] TRUE
which(v2 == "C")  # which element of v2 is "C"? 
#> [1] 3
which(v2 == "X")  # which element of v2 is "X"? 
#> integer(0)

We will learn more about these functions later. For now, just try to understand them and remember that there are many ways to probe and manipulate vectors.

1.4.2 Creating vectors

To create a new vector, we can combine several objects of the same type with the c() function, separating vector elements by commas:

# Creating vectors: 
c(1, 2, 3)
#> [1] 1 2 3

# Combining scalar objects (defined above): 
c(a, b)  
#> [1] 100 200

v <- c(a, b, c)
v
#> [1] 100 200   3

v <- c(c, c, c)  # vectors can have repeated elements
v
#> [1] 3 3 3

# Combining scalar objects and vectors (defined above): 
v <- c(a, b, v)  # Note that vectors can contain vectors, ...
v
#> [1] 100 200   3   3   3

v <- c(v, v)     # but the result is only 1 vector, not 2.
v
#>  [1] 100 200   3   3   3 100 200   3   3   3

# Combining characters to vectors:
w <- c("one", "two", "three")
w
#> [1] "one"   "two"   "three"

w <- c(w, "four", "5", "many")
w
#> [1] "one"   "two"   "three" "four"  "5"     "many"

Most functions in R can be applied to vectors and typically either return some information about or a transformed version of the vector:

length(v)
#> [1] 10
typeof(v)
#> [1] "double"
sum(v)
#> [1] 618

length(w)
#> [1] 6
typeof(w)
#> [1] "character"
# sum(w)  # would yield an error

values <- c(2>1, 1>2, 2==2)
values   # vector of truth values
#> [1]  TRUE FALSE  TRUE
!values  # reverses truth values
#> [1] FALSE  TRUE FALSE

When combining data of different types (e.g., numbers and characters) in a vector, R coerces them into a common type (typically characters):

x <- c(1, "two", 3)
x
#> [1] "1"   "two" "3"
typeof(x)  # converted 1 to "1"
#> [1] "character"


y <- c(TRUE, "FALSE")
y
#> [1] "TRUE"  "FALSE"
typeof(y)  # converted TRUE to "TRUE"
#> [1] "character"

Practice

  • What happens when we combine truth values and numeric objects into a vector?
z <- c(TRUE, FALSE, 100)
z
typeof(z)
sum(z)

Calculating with logical values

When combining truth values and numeric objects in a vector, a numeric vector is returned. Any value of TRUE is interpreted as the number 1 and any value of FALSE is interpreted as the number 0. The same interpretation of truth values is made when applying arithmetic functions to truth values:

sum(c(TRUE, FALSE, TRUE))
#> [1] 2
mean(c(TRUE, FALSE, FALSE))
#> [1] 0.3333333

1.4.3 Scalar objects are vectors

Actually, R has no dedicated type of scalar objects. Instead, individual numbers (of type integer or double) or text strings (of type character) are actually vectors of length 1. We can verify this by applying the is.vector and length functions to the scalars defined above:

a
#> [1] 100
is.vector(a)
#> [1] TRUE
length(a)
#> [1] 1
typeof(a)
#> [1] "double"

d
#> [1] "weird"
is.vector(d)
#> [1] TRUE
length(d)
#> [1] 1
typeof(d)
#> [1] "character"

e
#> [1] TRUE
is.vector(e)
#> [1] TRUE
length(e)
#> [1] 1
typeof(e)
#> [1] "logical"

1.4.4 Vector creation functions

For creating vectors with more than just a few elements (i.e., with larger length values), using the c function and then typing all vector elements becomes impractical. Useful functions and shortcuts to generate continuous or regular sequences are the functions seq (and its variant :) and rep:

# Sequences (with sep):
s1 <- seq(0, 100, 1)  # is short for: 
s2 <- seq(from = 0, to = 100, by = 1)
s2
#>  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
#> [26] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
#> [51] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
#>  [ reached getOption("max.print") -- omitted 26 entries ]
all.equal(s1, s2)
#> [1] TRUE

# Shorter version (with by = 1):
s3 <- 0:100
all.equal(s1, s3)
#> [1] TRUE

# But seq allows different step sizes:
s4 <- seq(0, 100, by = 25)
s4
#> [1]   0  25  50  75 100

# Replicating vectors (with rep):
s5 <- rep(c(0, 1), 3)  # is short for:
s5 <- rep(x = c(0, 1), times = 3)
s5
#> [1] 0 1 0 1 0 1

A common task in psychology and statistics is drawing a sample from a given set of objects. In R, the sample function allows drawing a sample of size size from a population x, and a logical argument replace specifies whether the sample is to be drawn with or without replacement. The result of sample is a vector of length size:

# Sampling vector elements (with sample):
sample(x = 1:3, size = 10, replace = TRUE)
#>  [1] 1 2 1 1 3 3 2 2 3 2
# Note:
# sample(1:3, 10)  
# would yield an error (as replace = FALSE by default). 

# Binary sample (coin flip): 
coin <- c("H", "T")    # 2 events: Heads or Tails
sample(coin, 5, TRUE)  # is short for: 
#> [1] "T" "H" "H" "T" "H"
sample(x = coin, size = 5, replace = TRUE)     # flip coin 5 times
#> [1] "H" "T" "H" "T" "H"
sample(x = coin, size = 1000, replace = TRUE)  # flip coin 1000 times
#>  [1] "H" "T" "H" "T" "T" "T" "H" "H" "H" "T" "T" "H" "H" "H" "H" "T" "T" "T" "H"
#> [20] "H" "T" "H" "T" "H" "T" "T" "T" "H" "H" "T" "H" "H" "T" "T" "T" "H" "T" "H"
#> [39] "T" "T" "H" "T" "T" "H" "H" "H" "H" "H" "T" "H" "T" "H" "H" "H" "H" "H" "T"
#> [58] "T" "T" "H" "T" "H" "T" "T" "H" "T" "T" "H" "H" "H" "H" "T" "H" "T" "H"
#>  [ reached getOption("max.print") -- omitted 925 entries ]

Practice

  • Evaluate the following expressions and explain their results (e.g., by looking up the documentation of ?seq and ?sample):
seq(0, 10, by = 3)
seq(0, 10, length.out = 10) 

sample(c("A", "B", "C"), size = 3)
sample(c("A", "B", "C"), size = 4)
sample(c("A", "B", "C"), size = 5, replace = TRUE)

1.4.5 Indexing vectors

We have learned above how inspect vectors and how to determine their type, length, names, and particular elements (see Section 1.4.1 above). An important concept in the context of vectors (and other R data structures) is the notion of a numeric index, which is a number that denotes the position of an element in a sequence. This sounds more difficult than it is: We already used numeric indices above to access particular elements of vectors, for instance:

# Assuming vector definitions (from above):
train
#>   w001   w002   w003   w004   w005   w006   w007   w008   w009   w010   w011 
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "corn" 
#>   w012   w013   w014   w015   w016   w017   w018   w019   w020   w021   w022 
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold" "coal" 
#>   w023   w024   w025   w026   w027   w028   w029   w030   w031 
#> "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal" "coal"
v1
#> [1] 1 2 3 4
v2
#> [1] "A" "B" "C" "D"

# Numeric indexing for 1 element:
train[1]  # get the 1st element of train
#>   w001 
#> "coal"
v1[2]     # get the 2nd element of v1
#> [1] 2
v2[3]     # get the 3rd element of v2
#> [1] "C"

Again, selecting vector elements by an index requires enclosing it in square brackets [], whereas functions always enclose their arguments in round parentheses ().

In R, we often store a large number of values in vectors (e.g., the age values of 5,000 participants), but only need some of them for answering specific questions (e.g., what is the average age of all male participants?). To select only a subset of elements from a vector v we can specify the condition or criterion for our selection in (square) brackets v[...]. Supplying a numeric or logical index (in square brackets) is a very flexible and powerful way for selecting elements of a vector v (or of other data structures). We can distinguish between 2 types of indexing (or subsetting):

Numeric indexing

In numeric indexing (or numeric subsetting), we provide a numeric vector as an index to a vector. Instead of only asking for a single element, we can use a numeric vector of indices to obtain a new vector that contains the elements at the specified positions:

train[c(1, 20, 30)]
#>   w001   w020   w030 
#> "coal" "corn" "coal"
train[1:4]
#>   w001   w002   w003   w004 
#> "coal" "coal" "coal" "coal"
train[20:22]
#>   w020   w021   w022 
#> "corn" "gold" "coal"

Actually, numeric indexing also works when elements are selected repeatedly:

train[c(1, 1)]
#>   w001   w001 
#> "coal" "coal"
train[rep(21, 3)]
#>   w021   w021   w021 
#> "gold" "gold" "gold"

Importantly, the index used to select element can be the result of a function call. For instance,

length(v2)      # number of elements in v2
#> [1] 4
v2[length(v2)]  # get the last element of v2
#> [1] "D"
v2[c(1, length(v2))]  # get the 1st and last elements of v2
#> [1] "A" "D"

Sometimes, we either know that or wonder whether a particular vector contains particular elements. The which() function returns the corresponding index or indices (as a numeric vector):

which(v2 == "B")
#> [1] 2
which(v2 != "C")
#> [1] 1 2 4
which(train == "gold")
#> w021 
#>   21
which(train != "coal")
#> w011 w012 w013 w014 w015 w016 w017 w018 w019 w020 w021 
#>   11   12   13   14   15   16   17   18   19   20   21

If we wanted to obtain the corresponding elements, we could use these numeric vectors to index or subset the same vectors:

v2[which(v2 == "B")]
#> [1] "B"
v2[which(v2 != "C")]
#> [1] "A" "B" "D"
train[which(train == "gold")]
#>   w021 
#> "gold"
train[which(train != "coal")]
#>   w011   w012   w013   w014   w015   w016   w017   w018   w019   w020   w021 
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold"

These last examples first used which() to obtain numeric indices of train and then applied these indices to train. This works, but seems a bit complicated. We can get the same elements by using logical indexing.

Logical indexing

In logical indexing (or logical subsetting), we select elements of a vector v by providing a vector of truth values (i.e., logical values that are either TRUE or FALSE) in square brackets. Typically, the length of the logical index corresponds to the length of the vector v. For example, the following statement selects the 1st and 2nd elements of v1:

v1[c(TRUE, TRUE, FALSE, FALSE)]
#> [1] 1 2

As with numeric indices, we can use function calls to determine the logical indices. This allows applying a test criterion (i.e., typically a test of v that returns a logical vector of the same length) to a vector v and only return the elements for which the criterion is TRUE. For instance, getting all

v1 < 3      # criterion [yielding a logical vector of length(v1)]
#> [1]  TRUE  TRUE FALSE FALSE
v1[v1 < 3]  # applying logical index to v1
#> [1] 1 2

This allows us to repeat the selections from above by logical indexing:

v2[v2 == "B"]
#> [1] "B"
v2[v2 != "C"]
#> [1] "A" "B" "D"
train[train == "gold"]
#>   w021 
#> "gold"
train[train != "coal"]
#>   w011   w012   w013   w014   w015   w016   w017   w018   w019   w020   w021 
#> "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "corn" "gold"

Note that each of these selections uses a vector twice: Once to conduct a test that yields logical indices (i.e., a vector of truth values that has the same length as the original vector) and a second time to select the elements for which the test criterion is TRUE.

Practice

As indexing (or subsetting) vectors is quite important in R, we practice it more extensively:

  1. Indexing numeric vectors:
  • Evaluate and explain the following code (in terms of numeric or logical indexing):
v <- 1:10  # create a numeric vector
v

crit <- v > 5  # Condition: Which values of v are larger than 5?
crit

v[crit]   # using crit to select values of v (for which crit is TRUE)
v[v > 5]  # all in 1 step 
  • Can you use numeric indexing to get the same elements as v[v > 5]?
v[6:10]
#> [1] 100 200   3   3   3
v[6:length(v)]
#> [1] 100 200   3   3   3
  • Evaluate and explain the results of the following commands:
x <- 0:8

x[c(2, 4, 6)]
x[x < 3]
x[x %% 2 == 0] 
  1. Indexing character vectors:
  • Evaluate and explain the results of the following commands:
rhyme <- c("ene", "mene", "miste", "es", "rappelt", "in", "der", "kiste")

length(rhyme)
rhyme[c(1:3, length(rhyme))]
rhyme[nchar(rhyme) <= 3]
rhyme[substr(rhyme, start = 1, stop = 1) == "m"]
  • Assume the following definition of spices:
spices <- c("salt", "pepper", "cinnamon", "lemongrass", "mint", "mustard", "wasabi")

Can you use functions and logical indexing to obtain all spices in spices with exactly 4 letters, or all spices with the letter "i" at their 2nd position?

spices[nchar(spices) == 4]           # spices with exactly 4 letters
spices[substr(spices, 2, 2) == "i"]  # spices with an "i" at 2nd position
  1. The R object LETTERS stores the 26 upper-case letters of the Roman alphabet (and letters the same letters in lower-case). Thus, the vector c("A", "B", "C") can also be obtained by numerical subsetting to get the first 3 letters of LETTERS: LETTERS[1:3]. Can you combine sampling and subsetting to get a random vector of 10 elements, which are sampled (with replacement) from the letters “U” to “Z”?
# ?LETTERS
U_to_Z <- LETTERS[21:26]  # subsetting yields "U" "V" "W" "X" "Y" "Z"
sample(x = U_to_Z, size = 10, replace = TRUE)