4 Vectors
R operates on named data structures. These include vectors, factors, matrices, arrays, lists and data frames. For now, let’s think about data structures as containers that store different types of data.
This chapter focuses on vectors. Vectors are not just another data structure. They are a central component of R.
4.1 Generating sequences
The simplest structure in R is the numeric vector. It consists of an ordered collection of numbers. A vector can also contain strings, or logical values, but not a mixture.
Let’s start with numeric vectors and how to generate them in R.
:
operator
:
generates a sequence from a number to another number in steps of 1 or -1.
:
is an operator. Operators are used to perform operations on variables and values.
## [1] 1 2 3 4
## [1] 1 0 -1 -2 -3 -4
## [1] 8.5 7.5 6.5 5.5 4.5
## [1] 4 5 6 7 8
c()
c()
combines values into a vector.
## [1] 1 2 3 4 5
## [1] 18 9 8 7 6 5
If the arguments to c()
are themselves vectors, c()
flattens the vectors and combines them into one single vector.
## [1] 1 2 3 4 5 6
seq(from, to)
seq(from, to)
is a generic function to generate regular sequences. It has five arguments, but not all of them will be specified in one call.
The two arguments from
and to
specify the beginning and end of the sequence.
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
If these are the only two arguments given, then the result is the same as the colon operator :
.
## [1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
The colon operator :
works for sequences that grow by 1 only. But the seq()
function supports optional arguments (by
) to specify increment of the sequence.
## [1] 0 2 4 6 8 10 12 14 16 18 20
length.out
specifies a length for the output sequence, and then R will calculate the necessary increment. The increment need not be an integer; R can create sequences with fractional increments.
## [1] 0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0
There are three other specialist sequence functions that are faster and easier to use, which cover specific use cases.
seq.int()
seq.int()
lets us create a sequence from one number to another.
## [1] 1 2 3 4 5 6 7 8 9 10
## [1] 1 3 5 7 9
## [1] 1 3 5 7 9 11 13 15 17 19
Note: Arguments to seq()
, and to other R functions, can be given in the named form, where the order of arguments in which they appear is irrelevant. For instance, seq.int(from = 1, to = 10)
is the same as seq.int(to = 10, from = 1)
.
However, the interpretation of the unnamed arguments of a function is not standard. Therefore, it is recommended to always name the arguments.
seq_len()
seq_len()
creates a sequence from 1 up to its input.
## [1] 1 2 3 4 5
## integer(0)
seq_len(5)
returns the same result as 1 : 5
.
seq_along()
seq_along()
creates a sequence from 1 up to the length of its input.
## [1] 10.4
## [1] 5.6
## [1] 3.1
## [1] 6.4
## [1] 21.7
This returns the same result as below.
## [1] 10.4
## [1] 5.6
## [1] 3.1
## [1] 6.4
## [1] 21.7
Note: However, there are times when iterating over 1:length(x)
will fail. That’s when x
is empty and length(x)
is 0.
## [1] 1 0
Therefore, it is recommended that we use seq_along(x)
whenever we can. It always returns a value the same length as x
.
## integer(0)
4.2 Arithmetic operations
Vectors can be used in arithmetic expressions. In these cases, the operations are performed element by element on entire vectors.
The arithmetic operators include +
, -
, *
, /
and ^
(raising to a power).
## [1] 7 9 11 13 15
## [1] 1 4 9 16 25
The colon operator :
has high priority within an expression. Compare the three examples below.
## [1] 0 1 2 3 4
## [1] 1 2 3 4
## [1] 4 1 0 1 4
common arithmetic functions
Common arithmetic functions are available, e.g., log()
, exp()
, sin()
, cos()
, tan()
, sqrt()
.
max()
and min()
select the largest and smallest elements of a vector respectively. range()
is a function whose value is a vector of length two, namely c(min(x), max(x))
. length(x)
is the number of elements in x
. sum(x)
gives the total of the elements in x
.
## [1] 8.8
## [1] 4
## [1] 11.03328
## [1] 121.7333
vector recycling
So far the vectors we’ve seen occurring in the same expression are of the same length. What happens if we try to do arithmetic on vectors of different lengths? R will recycle elements in the shorter vector to match the longer one.
## [1] 2 4 6 8 10 7 9 11 13 15 12 14 16 18 20
In the case above, the vector 1:5
is repeated three times to match 1:15
.
The operations below are performed between every vector element and the scalar. The scalar is repeated. (Although in R we do not really have scalars; a “scalar” is simply a numeric vector with one element.)
## [1] 0 1 3 5 9 11
## [1] 0.3333333 0.6666667 1.0000000 1.3333333 1.6666667 2.0000000 2.3333333 2.6666667 3.0000000 3.3333333
## [1] 1 4 9 16 25
If the length of the longer vector isn’t a multiple of the length of the shorter one, a warning will be given.
## Warning in 1:5 + 1:7: longer object length is not a multiple of shorter object length
## [1] 2 4 6 8 10 7 9
R is vectorized
All the arithmetic operators in R are vectorized. This means that an operator or a function will act on each element of a vector without the need for us to explicitly write a loop. Vector operations are one of R’s great strengths.
Let’s see an example. We want to write an R program to multiply two vectors of integers type and length 6. We can of course write a for
loop …
vec1 <- 1 : 6
vec2 <- c(4, 5, 6, 7, 8, 9)
vec<- c()
for (i in seq_along(vec1)){
vec <- c(vec, vec1[i] * vec2[i])
}
print(vec)
## [1] 4 10 18 28 40 54
Except that we don’t actually need a for
loop! The built-in implicit looping over elements is much faster than explicitly writing our own loop. As we see below, the operator is applied to corresponding elements from both vectors. Specifically, 1
is multiplied by 4
, 2
by 5
, 3
by 6
, etc.
## [1] 4 10 18 28 40 54
As another example, we can recenter an entire vector in one expression simply by subtracting the mean of its contents. Here the mean of u
is extracted from the elements of u
one by one.
## [1] -8.8 -7.8 -7.8 -6.8 -5.8 -3.8 -0.8 4.2 12.2 25.2
## [1] 8.8
Similarly, vectorized operation could happen when a function takes a vector as an input and calculates a summary statistic. The function cor()
applies itself to every element of the vectors x
and y
and return a vector of results.
## [1] 0.9068053
useful built-in functions operating on vectors
head()
, tail()
head()
and tail()
returns the first or last parts of a vector.
## [1] "a" "b" "c" "d" "e" "f"
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
## [1] "u" "v" "w" "x" "y" "z"
sort()
sort()
sorts a vector into ascending or descending order.
## [1] 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 12
## [1] 12 11 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2
set operations
union()
, intersect()
,setdiff()
, setequal()
, and is.element()
perform set operations.
union()
, intersect()
, setdiff()
and setequal()
will discard any duplicated values in the arguments.
Set union
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Intersection
## [1] 10 11 12 13 14 15
Difference
## [1] 1 2 3 4 5 6 7 8 9
## [1] 16 17 18 19 20 21 22 23 24 25
Equality
## [1] FALSE
Membership on two vectors
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
4.3 Logical vectors
R allows manipulation of logical values. R has two logical values, TRUE
and FALSE
. 5 These are often called Boolean values in other programming languages.
Logical vectors are generated by conditions. The logical operators are !
, &
and |
.
The comparison operators compare two values and return TRUE
or FALSE
. The comparison operators are <
, <=
, >
, >=
, ==
(exact equality) and !=
(inequality).
## [1] TRUE
We can also compare a vector against a single scalar, as in arithmetic operations. R will expand the scalar to the vector’s length and then perform the element-wise comparison.
## [1] TRUE TRUE TRUE
## [1] TRUE FALSE TRUE
## [1] FALSE FALSE FALSE TRUE TRUE
Again, as in arithmetic operations, we can compare entire vectors at once. R will perform an element-by-element comparison and return a vector of logical values, one for each comparison.
## [1] TRUE TRUE FALSE
c1
and c2
are logical expressions. c1 & c2
is their intersection (“and”); c1 | c2
is their union (“or”); and !c1
is the negation of c1
.
## [1] FALSE TRUE FALSE
## [1] TRUE TRUE TRUE
## [1] FALSE FALSE TRUE
Note: TRUE
and FALSE
are often abbreviated as T
and F
. However, T
and F
are just variables which are set to TRUE
and FALSE
by default, but are not reserved words and hence can be overwritten by the user. Therefore, we should always use TRUE
and FALSE
.
logical vectors coerced into numeric vectors
Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors. FALSE
becomes 0 and TRUE
becomes 1.
## [1] 2
## [1] 0.6666667
The above expressions can be useful if we want to find out if any case meets a condition, or how many. For instance, if at least one case is evaluated to be TRUE
, then sum(c(TRUE, FALSE, TRUE))
should be larger or equal to 1.
4.4 Operators
So far we have discussed assignment, arithmetic, logical and comparison operations. We use operators to perform these operations on variables and values. More formally, an operator is a function that takes one or two arguments and can be written without parentheses.
To sum up, in R we have:
- assignment operator:
<-
- arithmetic operators:
+
,-
,*
,/
,^
(raising to a power) and%%
(modulus; remainder from division) - comparison operators:
<
,<=
,>
,>=
,==
(exact equality), and!=
(inequality) - logical operators:
!
(not),&
(and) and|
(or)
R also has special operators like %in%
, which returns a logical vector indicating if there is a match or not for its left operand.
## [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
operator precedence
When we enter an expression in R, R always evaluates some expressions first over others. We call this order of operations operator precedence. Operator with higher precedence is evaluated first and operator with lowest precedence is evaluated at last. Operators of equal precedence are evaluated from left to right.
What are the results of the following operations? Use ?Syntax
to check the precedence of operators.
## [1] 1 4 9 16 25
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [1] FALSE FALSE
The operator precedence, from high to low, is:
[
,[[
indexing^
exponentiation:
sequence operator*
,/
multiply, divide+
,-
add, subtract<
,>
,<=
,>=
,==
,!=
comparison!
negation&
and|
or<-
assignment?
help
4.5 Character vectors
In R, we can also enter expressions with characters, using a pair of double or single quotes. Characters are printed using double quotes.
## [1] "Hello World!"
## [1] "Hello again!"
In R, character vector is the basic unit of text.
You may also have come across the term “string”. How is “string” related to “character”? In R, “string” is an informal term meaning “element of a character vector”. Most string manipulation functions operate on vectors of strings, in the same way that arithmetic operations are vectorized.
c()
, paste()
Strings may be concatenated into a vector by c()
.
## [1] "this" "is" "a" "character" "vector" "."
## [1] "string" "string" "string"
We can also use the paste()
function to concatenate strings. paste()
concatenates one or more objects; by default they are separated in the result by a single blank character. We can change how the resulting character is separated by the argument sep
, which takes a character string.
## [1] "1st, 2nd, 3rd"
## [1] "X1" "Y2" "X3" "Y4" "X5" "Y6" "X7" "Y8" "X9" "Y10"
## [1] "NYU Shanghai" "NYU New York" "NYU Abu Dhabi"
Note: Recycling of short vectors takes place here; c("X", "Y")
is repeated 5 times to match the sequence 1:10
.
paste0()
is equivalent to paste(..., sep = "")
, a slightly more efficient version in this case.
nth <- paste0(1:12, c("st", "nd", "rd", rep("th", 9)))
paste(month.abb, "is the", nth, "month of the year.")
## [1] "Jan is the 1st month of the year." "Feb is the 2nd month of the year."
## [3] "Mar is the 3rd month of the year." "Apr is the 4th month of the year."
## [5] "May is the 5th month of the year." "Jun is the 6th month of the year."
## [7] "Jul is the 7th month of the year." "Aug is the 8th month of the year."
## [9] "Sep is the 9th month of the year." "Oct is the 10th month of the year."
## [11] "Nov is the 11th month of the year." "Dec is the 12th month of the year."
nchar()
, length()
Unlike in some languages, R does not distinguish between whole strings and individual characters; a string containing one character is treated the same as any other string. Therefore, “character” in R does not refer to an individual character.
To count the number of single characters in a string, use nchar()
. To get the length of a character vector, use length()
.
## [1] 5
## [1] 1
4.6 Missing values
NA
In some cases, the elements of a vector may not be completely known. R assigns the special value NA
to these elements to indicate that they are “not available” or “missing”. NA
is a logical constant of length 1.
In general, any operation on an NA
becomes an NA
. The motivation for this rule is that if the specification of an operation is incomplete, the result cannot be known and hence is not available.
## [1] NA
## [1] NA
The function is.na()
evaluates whether an element is NA
.
## [1] FALSE FALSE FALSE TRUE
Note: The logical expression x == NA
is different from is.na(x)
. As said earlier, any operation on an NA
becomes an NA
; hence x == NA
will return a vector of the same length as x
, whose values are all NA
.
Functions are very careful about values that are not available. NA
value in the vector as an argument may cause a function to return NA
or an error.
## [1] NA
## [1] NA
We can decide if we want to ignore the NA
s by setting na.rm
to be TRUE
.
## [1] 1.4
## [1] 1.140175
4.7 Indexing
Sometimes we want to access part of a vector. This is called indexing (or subsetting, subscripting, slicing). We can access the vector elements by referring to its index number inside brackets []
, which is the indexing operator.
v[index vector]
Below we discuss the rules of indexing.
A vector of positive numbers selects elements by their position.
The corresponding elements of the vector are selected, concatenated, and returned in the order that they are referenced.
## [1] 1
## [1] 1 4 9
## [1] 49 25 9 1
Very important: The first element has an index of 1, not 0 as in some other programming languages.
A vector of negative numbers excludes elements at specified locations.
All other values will be returned.
## [1] 1 9 16 25 36 49 64 81 100
## [1] 1 9 25 36 49 64 81 100
## [1] 36 49 64 81 100
Note: Mixing positive and negative values is not allowed.
x[c(1, -1)]
A logical vector selects elements based on a condition.
This returns the slice of the vector containing the elements where the index is TRUE
.
## [1] 1 4 9
This is essentially doing:
## [1] 1 4 9
Remember that logical operations are element-wise. The index vector is recycled to the same length as x
. Values corresponding to TRUE
in the index vector are selected and those corresponding to FALSE
are omitted.
More examples:
## [1] 49 64 81 100
## [1] 1 100
## [1] 1 4 9 16 25 36 49 64 81 100
## [1] 4 16 36 64 100
## [1] 1 9 25 49 81
Using names to access named elements
This only applies to named vectors. It works in the same way as using positive numbers to select elements.
We can use a character vector of names to access the part of the vector containing the elements with those names.
#Nobel laureates in Literature
years <- c(2016, 2012, 1954, 1953, 1950)
names(years) <- c("Bob Dylan", "Mo Yan", "Ernest Hemingway", "Winston Churchill", "Bertrand Russell")
years
## Bob Dylan Mo Yan Ernest Hemingway Winston Churchill Bertrand Russell
## 2016 2012 1954 1953 1950
## Bob Dylan
## 2016
## Bob Dylan Winston Churchill
## 2016 1953
This option is particularly useful in connection with data frames, as we shall see in later chapters.
To change the value of a specific item, refer to the index number.
## [1] 1 4 9 16 25 -36 49 64 81 100
## [1] 1 4 9 16 25 36 49 64 81 100 0
Appending value(s) to a vector
Vector constructor.
## [1] 1 4 9 16 25 36 49 64 81 100 121 144
Element assignment.
## [1] 1 4 9 16 25 36 49 64 81 100 121 144
If we assign value(s) to the position past the end of the vector, R extends the vector and fills it with NA
s.
## [1] 1 4 9 16 25 36 49 64 81 100 NA NA NA NA 225
4.8 Data types
A vector can contain numbers, strings, or logical values, but not a mixture. Vectors must have their values all of the same mode.
In R, every object has a mode. It indicates how an object is stored in memory. Is it a number, a character string, a function, or something else?
More often, you may hear class. In R, every object also has a class, which determines what information an object contains, and how an object will be interpreted and used.
To confuse you even further, an object may have a mode “numeric”, but it has the class “Date”. In the example below, d
consists of a single number (the number of days since January 1, 1970), but is interpreted as a date.
## [1] "numeric"
## [1] "Date"
Does this sound too complicated? Don’t worry! Modes mostly exist for legacy purposes, so in practice you should only ever need to use an object’s class.
To get a vector’s class, use class()
. To evaluate a vector’s class, use is.*
functions.
## [1] TRUE
## [1] TRUE
## [1] TRUE
coercion
If we create a vector from mixed elements, R will convert them to a single type. The rule is to convert from more specific types to more general types.
## [1] 1 2 3 1 0
## [1] "1" "2" "3" "A" "B" "C"
## [1] "A" "B" "C" "TRUE" "FALSE"
## [1] "1" "2" "3" "A" "B" "C" "TRUE" "FALSE"
Coercion rules, as exemplified in these examples above:
- Logical values are converted to numbers:
TRUE
is converted to1
andFALSE
to 0. - The ordering is roughly logical < numeric < character.
Note: Object attributes will be dropped when an object is coerced from one type to another. We will explain attributes towards the end of this module.
We can change the type of an object using as.*
functions.
## [1] "1"
## [1] 1
## [1] 1
empty vectors
A vector can be empty and still have a mode.
## character(0)
## numeric(0)
We can create empty vectors of a specified data type and a specified length using vector()
.
## [1] 0 0 0 0 0 0
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
## [1] "" "" "" "" "" ""
## [1] 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i
## [1] 00 00 00 00 00 00
4.9 Attributes
Attributes are properties of an object. Mode is a special case of an object’s attribute.
All vectors also have a length attribute, which tells us how many elements they contain.
## [1] 5
## [1] 3
## [1] 1
## [1] 5
Named vectors have a name attribute.
## banana apple kiwi fruit
## 1 2 3 4
The function names()
can be used to add elements’ names as well as to retrieve the names of a vector.
## [1] "banana" "apple" "kiwi fruit" ""
Although some may say there are three. As we will see shortly,
NA
is also a logical value, meaning “not available”.↩︎