4.2 Select a subset of observations
To limit your dataset to a subset of observations in base R, use brackets [ ]
or subset()
. With brackets you can subset based on row numbers, row names, or a logical expression. With subset()
, you must use a logical expression. Selecting a subset of observations is called filtering.
NOTE: Filtering allows you to implement inclusion and exclusion criteria. To exclude, either use the opposite logical statement or use the “not” operator !
. For example, if you want to exclude those age 65 years or older, you could use either Age < 65
or !(Age >= 65)
.
## [1] 530
# Keep only a subset of rows using row numbers
subdat <- mydat[1:10, ]
# Number of rows in filtered data
nrow(subdat)
## [1] 10
## [1] 520
## [1] 155
# The number of rows is the same as the sum of observations
# where the condition was TRUE
sum(mydat$Age >= 65, na.rm=T)
## [1] 155
# To make it an exclusion, either reverse the logical
# expression or use !
subdat <- mydat[mydat$Age < 65, ]
nrow(subdat)
## [1] 375
## [1] 375
# Same thing but using subset
# Have to use a logical expression - will not work with row numbers
# Inclusion
subdat <- subset(mydat, subset = Age >= 65)
nrow(subdat)
## [1] 155
## [1] 375
In tidyverse
, use filter()
.