4.2 Select a subset of observations
To limit your dataset to a subset of observations in base R, use brackets [ ] or subset(). With brackets you can subset based on row numbers, row names, or a logical expression. With subset(), you must use a logical expression. Selecting a subset of observations is called filtering.
NOTE: Filtering allows you to implement inclusion and exclusion criteria. To exclude, either use the opposite logical statement or use the “not” operator !. For example, if you want to exclude those age 65 years or older, you could use either Age < 65 or !(Age >= 65).
## [1] 530
# Keep only a subset of rows using row numbers
subdat <- mydat[1:10, ]
# Number of rows in filtered data
nrow(subdat)## [1] 10
## [1] 520
## [1] 155
# The number of rows is the same as the sum of observations
# where the condition was TRUE
sum(mydat$Age >= 65, na.rm=T)## [1] 155
# To make it an exclusion, either reverse the logical
# expression or use !
subdat <- mydat[mydat$Age < 65, ]
nrow(subdat)## [1] 375
## [1] 375
# Same thing but using subset
# Have to use a logical expression - will not work with row numbers
# Inclusion
subdat <- subset(mydat, subset = Age >= 65)
nrow(subdat)## [1] 155
## [1] 375
In tidyverse, use filter().