A vector is a collection of values. “Vector” means different things in different fields (mathematics, geometry, biology), but in R it is a fancy name for a collection of values. We call the individual values elements of the vector. It is one of the most common data structures you will work with in R.
We can make vectors with the function
c( ), for example
means “combine.” R is obsessed with vectors, in R even single numbers
are vectors of length one. Many things that can be done with a single
number can also be done with a vector. For example arithmetic can be
done on vectors as it can be on single numbers.
Let’s say that we have a group of patients in our clinic. We can store their names in a vector.
<- c("Maria", "John", "Ali", "Luis", "Mei" ) patients patients
##  "Maria" "John" "Ali" "Luis" "Mei"
If we later wanted to add a name, it’s easy to do so
<- c(patients, "Emma") patients patients
##  "Maria" "John" "Ali" "Luis" "Mei" "Emma"
Maybe we also want to store the weights of these patients. Since these are their weights in pounds, we will call our object
<- c(122, 320, 217, 142, 174, 252) weight_lb weight_lb
##  122 320 217 142 174 252
So far, we have created vectors of two different data types: character and numeric.
You can do arithmatic with numeric vectors. For example, let’s convert the weight of our patients in lbs to the the weight in kilograms by multiplying each weight in lbs by 2.2.
We could do this one by one:
122 / 2.2
##  55.45455
320 / 2.2
##  145.4545
217 / 2.2
##  98.63636
But that would be a long and tedious process, especially if you had more than 6 patients.
Instead, let’s divide the vector by 2.2 and save that to a new object. We will call this object
<- weight_lb / 2.2 weight_kg #you could also round the weight <- round((weight_lb / 2.2), digits = 2) weight_kg weight_kg
##  55.45 145.45 98.64 64.55 79.09 114.55
We could use the
mean() function to find out the mean weight of patients at our clinic.
##  204.5
You can not do this with character vectors. Remember we used
c() to add a value to our character vector.
## Error in patients + "Sue": non-numeric argument to binary operator
You can combine two vectors together
There are numerous data types. Some of the other most common data types you will encounter are numeric data, character data and logical data. Vectors of one data type only are called atomic vectors. Read more about vectors and data types in the book R for Data Science
Another common data type is logical data. Logical data is the values
We want to record if our patients have been fully vaccinated. We will record this as TRUE if they have been, FALSE if they have not been, and NA if we do not have this information.
<- c(TRUE, TRUE, FALSE, NA, TRUE, FALSE) vax_status vax_status
##  TRUE TRUE FALSE NA TRUE FALSE
All vector types have a length property which you can determine with the
##  6
Its helpful to think of the length of a vector as the number of elements in the vector.
You can always find out the data type of your vector with the
##  "character"
##  "numeric"
##  "logical"
4.9.1 Missing Data
R also has many tools to help with missing data, a very common occurrence.
Suppose you tried to calculate the mean of a vector with some missing values (represented here with the logical
NA. For example, what if we had failed to capture the weight of some of the patients at our clinic, so our weight vector looks as follows:
<- c(122, NA, 217, NA, 174, 252) missing_wgt mean(missing_wgt)
##  NA
The missing values cause an error, and the mean cannot be correctly calculated.
To get around this, you can use the argument
na.rm = TRUE. This says to remove the NA values before attempting to perform the calculation.
mean(missing_wgt, na.rm = TRUE)
##  191.25
For more on how to work with missing data, check out this Data Carpentry lesson
Another important type of vector is a factor. Factors are a way that R stores categorical variables. In a factor, the levels of a categorical value are mapped onto an vector of integers, much like if you were coding responses to a survey. So, it is important to be careful because factors look like character data, but need to be treated like numeric data. For more on factors check out the Software Carpentry lesson on R
4.9.2 Mixing types
We said above that vectors are supposed to have only one data type, but what happens if we mix multiple data types in one vector?
Sometimes the best way to understand R is to try some examples and see what it does.
Create a vector with some patient names and weights together. What happens? What if you combine names and vax status? All three?
Because vectors can only contain one type of thing, R chooses a lowest common denominator type of vector, a type that can contain everything we are trying to put in it. A different language might stop with an error, but R tries to soldier on as best it can. A number can be represented as a character string, but a character string can not be represented as a number, so when we try to put both in the same vector R converts everything to a character string.
4.9.3 Indexing and Subsetting vectors
Access elements of a vector with
[ ], for example
##  "Maria"
##  "Luis"
You can also assign to a specific element of a vector.
2] <- "Jon" patients[ patients
##  "Maria" "Jon" "Ali" "Luis" "Mei" "Emma"
Can we use a vector to index another vector? Yes!
<- c(1,2,5) vaxInd patients[vaxInd]
##  "Maria" "Jon" "Mei"
<- patients[vaxInd] vax_patients vax_patients
##  "Maria" "Jon" "Mei"
We could equivalently have written:
c(1, 3, 5)] patients[
##  "Maria" "Ali" "Mei"
4.9.4 Data frames and tibbles
The other main data structure you are likely to work with is the data frame. Vectors are one dimensional data structures. A data frame is two dimensional (often called tabular or rectangular data). This is similar to data as you might be used to seeing it in a spreadsheet. You can think of a data frame as consisting of columns of vectors. As vectors,each column in a data frame is of one data type, but the data frame over all can hold multiple data types.
A tibble is a particular class of data frame which is common in the
tidyverse family of packages. Tibbles are useful for their printing properties and because they are less likely try to change the data type of columns on import (e.g. from character to factor).
Since vectors form the columns of data frames, we can take the vectors we created for our patient data and combine them together in a data frame.
<- data.frame(patients, weight_lb, weight_kg, vax_status) patient_data patient_data
## patients weight_lb weight_kg vax_status ## 1 Maria 122 55.45 TRUE ## 2 Jon 320 145.45 TRUE ## 3 Ali 217 98.64 FALSE ## 4 Luis 142 64.55 NA ## 5 Mei 174 79.09 TRUE ## 6 Emma 252 114.55 FALSE