4.4 Sort
Sorting a dataset means arranging the observations so they are in order by the values of one or more variables. For example, you could sort from youngest to oldest age, or by sex, or by sex first and then by age within sex. Do not use the base R function sort()
for sorting a data.frame
– it is only used for sorting a vector. Instead, use order()
to sort a data frame on one or more variables.
## ID Age Sex
## 1 1 85 F
## 2 2 86 F
## 3 3 83 F
## 4 4 83 F
## 5 5 85 F
## 6 6 79 M
# Sort by one variable
mydat <- mydat[order(mydat$Age), ]
# Now it is sorted from youngest to oldest
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 179 208 42 F
## 193 222 42 F
## 213 242 42 F
## 429 458 42 F
## 433 462 42 F
## 478 507 42 M
# Sort in reverse order
mydat <- mydat[order(mydat$Age, decreasing = T), ]
# Now it is sorted from oldest to youngest
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 7 7 90 F
## 8 8 90 F
## 20 20 90 F
## 21 21 90 F
## 54 54 90 F
## 66 66 90 F
# Sort by more than one variable
mydat <- mydat[order(mydat$Sex, mydat$Age), ]
# Now it is sorted with Females first and
# from youngest to oldest within sex
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 179 208 42 F
## 193 222 42 F
## 213 242 42 F
## 429 458 42 F
## 433 462 42 F
## 80 109 43 F
# If you use decreasing = T, it reverses all the variables
# If you want to reverse the sort only for a subset of variables,
# use -1*x for a numeric variable or relabel a factor in the
# order you want
mydat <- mydat[order(mydat$Sex, mydat$Age, decreasing = T), ]
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 51 51 84 M
## 42 42 83 M
## 28 28 82 M
## 44 44 81 M
## 61 61 81 M
## 56 56 80 M
NOTE: Why does order()
work for sorting? The function order()
returns row numbers in the sorted order. By inserting order()
within brackets []
you are selecting all the rows, but in a permuted order.
In tidyverse
, use arrange()
, with desc()
(“descending”) to reverse the order.
# View the first 6 rows
mydat_tibble %>%
select(ID, Age, Sex) %>%
head()
# Sort by one variable
mydat_tibble <- mydat_tibble %>%
arrange(Age)
# Now it is sorted from youngest to oldest
mydat_tibble %>%
select(ID, Age, Sex) %>%
head()
# Sort in reverse order
mydat_tibble <- mydat_tibble %>%
arrange(desc(Age))
# Now it is sorted from oldest to youngest
mydat_tibble %>%
select(ID, Age, Sex) %>%
head()
# Sort by more than one variable
mydat_tibble <- mydat_tibble %>%
arrange(Sex, Age)
# Now it is sorted with Females first and
# from youngest to oldest within sex
mydat_tibble %>%
select(ID, Age, Sex) %>%
head()
# Unlike with base R, you can reverse the order of single
# variables in tidyverse
mydat_tibble <- mydat_tibble %>%
arrange(desc(Sex), Age)
# Now it is sorted with Males first and
# from youngest to oldest within sex
mydat_tibble %>%
select(ID, Age, Sex) %>%
head()
NOTE: Whether using the base R function order()
or the tidyverse
function arrange()
, NA values are always sorted at the end, even if you use decreasing = T
(for order()
) or desc()
(for arrange()
).