1.8 Extracting / subsetting

Sometimes you want just part of an object. In some cases you will use square [ ] brackets or double square [[ ]] brackets, and in other cases you will use a dollar sign $.

1.8.1 Extracting elements from a vector

Given the following 3-element vector x,

x <- c(8, 4, 10)
x

## [1]  8  4 10

we can, for example, extract the 2^nd element,

x[2]

## [1] 4

the first 2 elements,

x[1:2]

## [1] 8 4

the first and third elements,

x[c(1,3)]

## [1]  8 10

all but the first element of x,

x[-1]

## [1]  4 10

or the elements that meet a certain condition.

x[x > 5]

## [1]  8 10

1.8.2 Extracting elements from a matrix or array

Similarly, we can extract elements from a matrix or array, but now we need multiple indices separated by commas. For example, given the following 2-dimensional matrix x,

x <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
x

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

we can, for example, extract the element in the 2^nd row and 3^rd column (the first index refers to the row, the second to the column),

x[2,3]

## [1] 6

the first row (make sure to include the comma!),

x[1,]

## [1] 1 3 5

or the third column (make sure to include the comma!).

x[,3]

## [1] 5 6

If you leave out the comma, you will get an answer not an error. For example:

x[3]

## [1] 3

But it is ambiguous to say “the 3^rd element of a matrix” since you could go down columns or across rows. R has a default, but rather than try to remember what that is, just do not forget the comma and then there is no ambiguity.

NOTE: Best practice is to avoid coding shortcuts unless there is a clear need (like optimizing speed). Use clear, unambiguous code.

Subsetting an array is similar to subsetting a matrix. However, unlike a matrix, an array can have more than two dimensions so you must have as many indices as you have dimensions. For example, given the 3-dimensional array z,

z <- array(c(1,2,3,4,5,6,7,8), dim = c(2,2,2))
z

## , , 1
## 
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## , , 2
## 
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8

we can extract a single element,

z[2,1,2]

## [1] 6

a sub-vector,

z[1,,2]

## [1] 5 7

or a sub-matrix,

z[,2,]

##      [,1] [,2]
## [1,]    3    7
## [2,]    4    8

1.8.3 Extracting elements from a list

For a list, you can use single square [ ] brackets or double square [[ ]] brackets, depending on what you want to extract. For example, given the list x, containing a character string, a numeric vector, and the factor object y,

x <- list("5", c(1,2,3), y)
x

## [[1]]
## [1] "5"
## 
## [[2]]
## [1] 1 2 3
## 
## [[3]]
## [1] Underweight Underweight Normal      Overweight  Normal     
## Levels: Underweight Normal Overweight

we can use [ ] to extract a sub-list containing only, for example, the first element,

x[1]

## [[1]]
## [1] "5"

class(x[1])

## [1] "list"

or multiple elements,

x[c(1,3)]

## [[1]]
## [1] "5"
## 
## [[2]]
## [1] Underweight Underweight Normal      Overweight  Normal     
## Levels: Underweight Normal Overweight

class(x[c(1,3)])

## [1] "list"

or we can use [[ ]] to extract a single element, which will have the class of that element.

x[[1]]

## [1] "5"

class(x[[1]])

## [1] "character"

x[[2]]

## [1] 1 2 3

class(x[[2]])

## [1] "numeric"

x[[3]]

## [1] Underweight Underweight Normal      Overweight  Normal     
## Levels: Underweight Normal Overweight

class(x[[3]])

## [1] "factor"

1.8.4 Extracting elements from a data frame

Recall that a data.frame is special type of list where each element is one of the columns. You can access the elements of a data.frame in a number of ways, including the $ method. Also, when subsetting we can use the column names. For example, given the data frame x,

x <- data.frame(outcome  = c(1, 0, 1, 1),
                exposure = factor(c("yes", "yes", "no", "no"),
                                  levels = c("no", "yes"),
                                  labels = c("No", "Yes")),
                age      = c(24, 55, 39, 18))
x

##   outcome exposure age
## 1       1      Yes  24
## 2       0      Yes  55
## 3       1       No  39
## 4       1       No  18

we can extract a data.frame made up of a subset of columns using [ ],

x[1:2]

##   outcome exposure
## 1       1      Yes
## 2       0      Yes
## 3       1       No
## 4       1       No

class(x[1:2])

## [1] "data.frame"

x[c("outcome", "exposure")]

##   outcome exposure
## 1       1      Yes
## 2       0      Yes
## 3       1       No
## 4       1       No

class(x[c("outcome", "exposure")])

## [1] "data.frame"

a single column of the data.frame, returned as the class of that column, using [[ ]] or $.

x[[3]]

## [1] 24 55 39 18

class(x[[3]])

## [1] "numeric"

x[["age"]]

## [1] 24 55 39 18

class(x[["age"]])

## [1] "numeric"

x$age

## [1] 24 55 39 18

class(x$age)

## [1] "numeric"

When using the $ method, if the variable name has spaces, then enclose it in ` ` (not regular quotes) when extracting it. To illustrate, let’s change the names of this data.frame by assigning a new value to its names().

names(x) <- c("Outcome Level", "Exposure", "Age")

x

##   Outcome Level Exposure Age
## 1             1      Yes  24
## 2             0      Yes  55
## 3             1       No  39
## 4             1       No  18

To extract the first column, whose name has a space, we must enclose the name in ` `.

x$`Outcome Level`

## [1] 1 0 1 1

You can also extract elements of a data.frame using matrix indexing. For example, you can extract the first column of x, returning a vector,

x[,1]

## [1] 1 0 1 1

class(x[,1])

## [1] "numeric"

the first row of x, returning a data.frame with 1 row,

x[1,]

##   Outcome Level Exposure Age
## 1             1      Yes  24

class(x[1,])

## [1] "data.frame"

or the first column of x, returning a vector of the class of that column,

x[,1]

## [1] 1 0 1 1

class(x[,1])

## [1] "numeric"

or a data.frame if you include drop=F.

x[,1,drop=F]

##   Outcome Level
## 1             1
## 2             0
## 3             1
## 4             1

class(x[,1,drop=F])

## [1] "data.frame"

If extracting more than 1 column, drop=F is not necessary to return a data.frame.

x[,1:2]

##   Outcome Level Exposure
## 1             1      Yes
## 2             0      Yes
## 3             1       No
## 4             1       No

class(x[,1:2])

## [1] "data.frame"

In any of these column extraction via matrix-subsetting examples, you can use the column names.

x[, "Outcome Level", drop=F]

##   Outcome Level
## 1             1
## 2             0
## 3             1
## 4             1

class(x[, "Outcome Level", drop=F])

## [1] "data.frame"

Some data.frame objects have rownames. By default, R just assigns numbers.

rownames(x)

## [1] "1" "2" "3" "4"

But suppose we have a data.frame with, say, participant IDs as the row names.

rownames(x) <- c("B239", "B211", "B101", "B439")
x

##      Outcome Level Exposure Age
## B239             1      Yes  24
## B211             0      Yes  55
## B101             1       No  39
## B439             1       No  18

Then you can subset rows using row names.

x[c("B211", "B439"),]

##      Outcome Level Exposure Age
## B211             0      Yes  55
## B439             1       No  18

You can also subset rows of a data.frame using logical statements about the values in the data.frame. For example, suppose we want to extract the rows with Exposure = “Yes”. We use the == (double equal sign) operator for the logical equals operator (see Section 1.13 for more on logical operators).

x[x$Exposure == "Yes",]

##      Outcome Level Exposure Age
## B239             1      Yes  24
## B211             0      Yes  55