1.8 Extracting / subsetting
Sometimes you want just part of an object. In some cases you will use square [ ] brackets or double square [[ ]] brackets, and in other cases you will use a dollar sign $.
1.8.1 Extracting elements from a vector
Given the following 3-element vector x,
## [1] 8 4 10
we can, for example, extract the 2nd element,
## [1] 4
the first 2 elements,
## [1] 8 4
the first and third elements,
## [1] 8 10
all but the first element of x,
## [1] 4 10
or the elements that meet a certain condition.
## [1] 8 10
1.8.2 Extracting elements from a matrix or array
Similarly, we can extract elements from a matrix or array, but now we need multiple indices separated by commas. For example, given the following 2-dimensional matrix x,
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
we can, for example, extract the element in the 2nd row and 3rd column (the first index refers to the row, the second to the column),
## [1] 6
the first row (make sure to include the comma!),
## [1] 1 3 5
or the third column (make sure to include the comma!).
## [1] 5 6
If you leave out the comma, you will get an answer not an error. For example:
## [1] 3
But it is ambiguous to say “the 3rd element of a matrix” since you could go down columns or across rows. R has a default, but rather than try to remember what that is, just do not forget the comma and then there is no ambiguity.
NOTE: Best practice is to avoid coding shortcuts unless there is a clear need (like optimizing speed). Use clear, unambiguous code.
Subsetting an array is similar to subsetting a matrix. However, unlike a matrix, an array can have more than two dimensions so you must have as many indices as you have dimensions. For example, given the 3-dimensional array z,
## , , 1
##
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## , , 2
##
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
we can extract a single element,
## [1] 6
a sub-vector,
## [1] 5 7
or a sub-matrix,
## [,1] [,2]
## [1,] 3 7
## [2,] 4 8
1.8.3 Extracting elements from a list
For a list, you can use single square [ ] brackets or double square [[ ]] brackets, depending on what you want to extract. For example, given the list x, containing a character string, a numeric vector, and the factor object y,
## [[1]]
## [1] "5"
##
## [[2]]
## [1] 1 2 3
##
## [[3]]
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
we can use [ ] to extract a sub-list containing only, for example, the first element,
## [[1]]
## [1] "5"
## [1] "list"
or multiple elements,
## [[1]]
## [1] "5"
##
## [[2]]
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
## [1] "list"
or we can use [[ ]] to extract a single element, which will have the class of that element.
## [1] "5"
## [1] "character"
## [1] 1 2 3
## [1] "numeric"
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
## [1] "factor"
1.8.4 Extracting elements from a data frame
Recall that a data.frame is special type of list where each element is one of the columns. You can access the elements of a data.frame in a number of ways, including the $ method. Also, when subsetting we can use the column names. For example, given the data frame x,
x <- data.frame(outcome = c(1, 0, 1, 1),
exposure = factor(c("yes", "yes", "no", "no"),
levels = c("no", "yes"),
labels = c("No", "Yes")),
age = c(24, 55, 39, 18))
x## outcome exposure age
## 1 1 Yes 24
## 2 0 Yes 55
## 3 1 No 39
## 4 1 No 18
we can extract a data.frame made up of a subset of columns using [ ],
## outcome exposure
## 1 1 Yes
## 2 0 Yes
## 3 1 No
## 4 1 No
## [1] "data.frame"
## outcome exposure
## 1 1 Yes
## 2 0 Yes
## 3 1 No
## 4 1 No
## [1] "data.frame"
a single column of the data.frame, returned as the class of that column, using [[ ]] or $.
## [1] 24 55 39 18
## [1] "numeric"
## [1] 24 55 39 18
## [1] "numeric"
## [1] 24 55 39 18
## [1] "numeric"
When using the $ method, if the variable name has spaces, then enclose it in ` ` (not regular quotes) when extracting it. To illustrate, let’s change the names of this data.frame by assigning a new value to its names().
## Outcome Level Exposure Age
## 1 1 Yes 24
## 2 0 Yes 55
## 3 1 No 39
## 4 1 No 18
To extract the first column, whose name has a space, we must enclose the name in ` `.
## [1] 1 0 1 1
You can also extract elements of a data.frame using matrix indexing. For example, you can extract the first column of x, returning a vector,
## [1] 1 0 1 1
## [1] "numeric"
the first row of x, returning a data.frame with 1 row,
## Outcome Level Exposure Age
## 1 1 Yes 24
## [1] "data.frame"
or the first column of x, returning a vector of the class of that column,
## [1] 1 0 1 1
## [1] "numeric"
or a data.frame if you include drop=F.
## Outcome Level
## 1 1
## 2 0
## 3 1
## 4 1
## [1] "data.frame"
If extracting more than 1 column, drop=F is not necessary to return a data.frame.
## Outcome Level Exposure
## 1 1 Yes
## 2 0 Yes
## 3 1 No
## 4 1 No
## [1] "data.frame"
In any of these column extraction via matrix-subsetting examples, you can use the column names.
## Outcome Level
## 1 1
## 2 0
## 3 1
## 4 1
## [1] "data.frame"
Some data.frame objects have rownames. By default, R just assigns numbers.
## [1] "1" "2" "3" "4"
But suppose we have a data.frame with, say, participant IDs as the row names.
## Outcome Level Exposure Age
## B239 1 Yes 24
## B211 0 Yes 55
## B101 1 No 39
## B439 1 No 18
Then you can subset rows using row names.
## Outcome Level Exposure Age
## B211 0 Yes 55
## B439 1 No 18
You can also subset rows of a data.frame using logical statements about the values in the data.frame. For example, suppose we want to extract the rows with Exposure = “Yes”. We use the == (double equal sign) operator for the logical equals operator (see Section 1.13 for more on logical operators).
## Outcome Level Exposure Age
## B239 1 Yes 24
## B211 0 Yes 55