1.8 Extracting / subsetting
Sometimes you want just part of an object. In some cases you will use square [ ]
brackets or double square [[ ]]
brackets, and in other cases you will use a dollar sign $
.
1.8.1 Extracting elements from a vector
Given the following 3-element vector x
,
## [1] 8 4 10
we can, for example, extract the 2nd element,
## [1] 4
the first 2 elements,
## [1] 8 4
the first and third elements,
## [1] 8 10
all but the first element of x,
## [1] 4 10
or the elements that meet a certain condition.
## [1] 8 10
1.8.2 Extracting elements from a matrix or array
Similarly, we can extract elements from a matrix or array, but now we need multiple indices separated by commas. For example, given the following 2-dimensional matrix x
,
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
we can, for example, extract the element in the 2nd row and 3rd column (the first index refers to the row, the second to the column),
## [1] 6
the first row (make sure to include the comma!),
## [1] 1 3 5
or the third column (make sure to include the comma!).
## [1] 5 6
If you leave out the comma, you will get an answer not an error. For example:
## [1] 3
But it is ambiguous to say “the 3rd element of a matrix” since you could go down columns or across rows. R has a default, but rather than try to remember what that is, just do not forget the comma and then there is no ambiguity.
NOTE: Best practice is to avoid coding shortcuts unless there is a clear need (like optimizing speed). Use clear, unambiguous code.
Subsetting an array is similar to subsetting a matrix. However, unlike a matrix, an array can have more than two dimensions so you must have as many indices as you have dimensions. For example, given the 3-dimensional array z
,
## , , 1
##
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## , , 2
##
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
we can extract a single element,
## [1] 6
a sub-vector,
## [1] 5 7
or a sub-matrix,
## [,1] [,2]
## [1,] 3 7
## [2,] 4 8
1.8.3 Extracting elements from a list
For a list, you can use single square [ ]
brackets or double square [[ ]]
brackets, depending on what you want to extract. For example, given the list x
, containing a character string, a numeric vector, and the factor object y,
## [[1]]
## [1] "5"
##
## [[2]]
## [1] 1 2 3
##
## [[3]]
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
we can use [ ]
to extract a sub-list containing only, for example, the first element,
## [[1]]
## [1] "5"
## [1] "list"
or multiple elements,
## [[1]]
## [1] "5"
##
## [[2]]
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
## [1] "list"
or we can use [[ ]]
to extract a single element, which will have the class of that element.
## [1] "5"
## [1] "character"
## [1] 1 2 3
## [1] "numeric"
## [1] Underweight Underweight Normal Overweight Normal
## Levels: Underweight Normal Overweight
## [1] "factor"
1.8.4 Extracting elements from a data frame
Recall that a data.frame
is special type of list
where each element is one of the columns. You can access the elements of a data.frame
in a number of ways, including the $
method. Also, when subsetting we can use the column names. For example, given the data frame x
,
x <- data.frame(outcome = c(1, 0, 1, 1),
exposure = factor(c("yes", "yes", "no", "no"),
levels = c("no", "yes"),
labels = c("No", "Yes")),
age = c(24, 55, 39, 18))
x
## outcome exposure age
## 1 1 Yes 24
## 2 0 Yes 55
## 3 1 No 39
## 4 1 No 18
we can extract a data.frame
made up of a subset of columns using [ ]
,
## outcome exposure
## 1 1 Yes
## 2 0 Yes
## 3 1 No
## 4 1 No
## [1] "data.frame"
## outcome exposure
## 1 1 Yes
## 2 0 Yes
## 3 1 No
## 4 1 No
## [1] "data.frame"
a single column of the data.frame
, returned as the class of that column, using [[ ]]
or $
.
## [1] 24 55 39 18
## [1] "numeric"
## [1] 24 55 39 18
## [1] "numeric"
## [1] 24 55 39 18
## [1] "numeric"
When using the $
method, if the variable name has spaces, then enclose it in ` `
(not regular quotes) when extracting it. To illustrate, let’s change the names of this data.frame
by assigning a new value to its names()
.
## Outcome Level Exposure Age
## 1 1 Yes 24
## 2 0 Yes 55
## 3 1 No 39
## 4 1 No 18
To extract the first column, whose name has a space, we must enclose the name in ` `
.
## [1] 1 0 1 1
You can also extract elements of a data.frame
using matrix indexing. For example, you can extract the first column of x
, returning a vector,
## [1] 1 0 1 1
## [1] "numeric"
the first row of x
, returning a data.frame
with 1 row,
## Outcome Level Exposure Age
## 1 1 Yes 24
## [1] "data.frame"
or the first column of x
, returning a vector of the class of that column,
## [1] 1 0 1 1
## [1] "numeric"
or a data.frame
if you include drop=F
.
## Outcome Level
## 1 1
## 2 0
## 3 1
## 4 1
## [1] "data.frame"
If extracting more than 1 column, drop=F
is not necessary to return a data.frame.
## Outcome Level Exposure
## 1 1 Yes
## 2 0 Yes
## 3 1 No
## 4 1 No
## [1] "data.frame"
In any of these column extraction via matrix-subsetting examples, you can use the column names.
## Outcome Level
## 1 1
## 2 0
## 3 1
## 4 1
## [1] "data.frame"
Some data.frame
objects have rownames
. By default, R just assigns numbers.
## [1] "1" "2" "3" "4"
But suppose we have a data.frame
with, say, participant IDs as the row names.
## Outcome Level Exposure Age
## B239 1 Yes 24
## B211 0 Yes 55
## B101 1 No 39
## B439 1 No 18
Then you can subset rows using row names.
## Outcome Level Exposure Age
## B211 0 Yes 55
## B439 1 No 18
You can also subset rows of a data.frame
using logical statements about the values in the data.frame
. For example, suppose we want to extract the rows with Exposure = “Yes”. We use the ==
(double equal sign) operator for the logical equals operator (see Section 1.13 for more on logical operators).
## Outcome Level Exposure Age
## B239 1 Yes 24
## B211 0 Yes 55