7.2 Logical Indexing
The second way to index vectors is with logical vectors. A logical vector is a vector that only contains TRUE and FALSE values. In R, true values are designated with TRUE, and false values with FALSE. When you index a vector with a logical vector, R will return values of the vector for which the indexing vector is TRUE. If that was confusing, think about it this way: a logical vector, combined with the brackets [ ]
, acts as a filter for the vector it is indexing. It only lets values of the vector pass through for which the logical vector is TRUE.
You could create logical vectors directly using c()
. For example, I could access every other value of the following vector as follows:
a <- c(1, 2, 3, 4, 5)
a[c(TRUE, FALSE, TRUE, FALSE, TRUE)]
## [1] 1 3 5
As you can see, R returns all values of the vector a
for which the logical vector is TRUE.
However, creating logical vectors using c()
is tedious. Instead, it’s better to create logical vectors from existing vectors using comparison operators like < (less than), == (equals to), and != (not equal to). A complete list of the most common comparison operators is in Figure 7.3. For example, let’s create some logical vectors from our boat.ages
vector:
# Which ages are > 100?
boat.ages > 100
## [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
# Which ages are equal to 23?
boat.ages == 23
## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
# Which boat names are equal to c?
boat.names == "c"
## [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
You can also create logical vectors by comparing a vector to another vector of the same length. When you do this, R will compare values in the same position (e.g.; the first values will be compared, then the second values, etc.). For example, we can compare the boat.cost
and boat.price
vectors to see which boats sold for a higher price than their cost:
# Which boats had a higher price than cost?
boat.prices > boat.costs
## [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
# Which boats had a lower price than cost?
boat.prices < boat.costs
## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
Once you’ve created a logical vector using a comparison operator, you can use it to index any vector with the same length. Here, I’ll use logical vectors to get the prices of boats whose ages were greater than 100:
# What were the prices of boats older than 100?
boat.prices[boat.ages > 100]
## [1] 53 54 264 532
Here’s how logical indexing works step-by-step:
# Which boats are older than 100 years?
boat.ages > 100
## [1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
# Writing the logical index by hand (you'd never do this!)
# Show me all of the boat prices where the logical vector is TRUE:
boat.prices[c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE, FALSE)]
## [1] 53 54 264 532
# Doing it all in one step! You get the same answer:
boat.prices[boat.ages > 100]
## [1] 53 54 264 532
7.2.1 &
(and), |
(or), %in%
In addition to using single comparison operators, you can combine multiple logical vectors using the OR (which looks like |
and AND &
commands. The OR |
operation will return TRUE if any of the logical vectors is TRUE, while the AND &
operation will only return TRUE if all of the values in the logical vectors is TRUE. This is especially powerful when you want to create a logical vector based on criteria from multiple vectors.
For example, let’s create a logical vector indicating which boats had a price greater than 200 OR less than 100, and then use that vector to see what the names of these boats were:
# Which boats had prices greater than 200 OR less than 100?
boat.prices > 200 | boat.prices < 100
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
# What were the NAMES of these boats
boat.names[boat.prices > 200 | boat.prices < 100]
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i"
You can combine as many logical vectors as you want (as long as they all have the same length!):
# Boat names of boats with a color of black OR with a price > 100
boat.names[boat.colors == "black" | boat.prices > 100]
## [1] "a" "e" "g" "i" "j"
# Names of blue boats with a price greater than 200
boat.names[boat.colors == "blue" & boat.prices > 200]
## [1] "e"
You can combine as many logical vectors as you want to create increasingly complex selection criteria. For example, the following logical vector returns TRUE for cases where the boat colors are black OR brown, AND where the price was less than 100:
# Which boats were eithe black or brown, AND had a price less than 100?
(boat.colors == "black" | boat.colors == "brown") & boat.prices < 100
## [1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
# What were the names of these boats?
boat.names[(boat.colors == "black" | boat.colors == "brown") & boat.prices < 100]
## [1] "a" "i"
When using multiple criteria, make sure to use parentheses when appropriate. If I didn’t use parentheses above, I would get a different answer.
The %in%
operation helps you to easily create multiple OR arguments.Imagine you have a vector of categorical data that can take on many different values. For example, you could have a vector x indicating people’s favorite letters.
x <- c("a", "t", "a", "b", "z")
Now, let’s say you want to create a logical vector indicating which values are either a or b or c or d. You could create this logical vector with multiple | (OR) commands:
x == "a" | x == "b" | x == "c" | x == "d"
## [1] TRUE FALSE TRUE TRUE FALSE
However, this takes a long time to write. Thankfully, the %in%
operation allows you to combine multiple OR comparisons much faster. To use the %in%
function, just put it in between the original vector, and a new vector of possible values. The %in%
function goes through every value in the vector x, and returns TRUE if it finds it in the vector of possible values – otherwise it returns FALSE.
x %in% c("a", "b", "c", "d")
## [1] TRUE FALSE TRUE TRUE FALSE
As you can see, the result is identical to our previous result.
7.2.2 Counts and percentages from logical vectors
Many (if not all) R functions will interpret TRUE values as 1 and FALSE values as 0. This allows us to easily answer questions like “How many values in a data vector are greater than 0?” or “What percentage of values are equal to 5?” by applying the sum()
or mean()
function to a logical vector.
We’ll start with a vector x of length 10, containing 3 positive numbers and 5 negative numbers.
x <- c(1, 2, 3, -5, -5, -5, -5, -5)
We can create a logical vector to see which values are greater than 0:
x > 0
## [1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Now, we’ll use sum()
and mean()
on that logical vector to see how many of the values in x are positive, and what percent are positive. We should find that there are 5 TRUE values, and that 50% of the values (5 / 10) are TRUE.
sum(x > 0)
## [1] 3
mean(x > 0)
## [1] 0.38
This is a really powerful tool. Pretty much any time you want to answer a question like “How many of X are Y” or “What percent of X are Y”, you use sum()
or mean()
function with a logical vector as an argument.
7.2.3 Additional Logical functions
R has lots of special functions that take vectors as arguments, and return logical vectors based on multiple criteria. For example, you can use the is.na()
function to test which values of a vector are missing. Table 7.1 contains some that I frequently use:
Function | Description | Example | Result |
---|---|---|---|
is.na(x) |
Which values in x are NA? | is.na(c(2, NA, 5)) |
FALSE, TRUE, FALSE |
is.finite(x) |
Which values in x are numbers? | is.finite(c(NA, 89, 0)) |
FALSE, TRUE, TRUE |
duplicated(x) |
Which values in x are duplicated? | duplicated(c(1, 4, 1, 2)) |
FALSE, FALSE, TRUE, FALSE |
which(x) |
Which values in x are TRUE? | which(c(TRUE, FALSE, TRUE)) |
1, 3 |
Logical vectors aren’t just good for indexing, you can also use them to figure out which values in a vector satisfy some criteria. To do this, use the function which()
. If you apply the function which()
to a logical vector, R will tell you which values of the index are TRUE. For example:
# A vector of sex information
sex <- c("m", "m", "f", "m", "f", "f")
# Which values of sex are m?
which(sex == "m")
## [1] 1 2 4
# Which values of sex are f?
which(sex == "f")
## [1] 3 5 6