4.4 Sort
Sorting a dataset means arranging the observations so they are in order by the values of one or more variables. For example, you could sort from youngest to oldest age, or by sex, or by sex first and then by age within sex. Do not use the base R function sort()
for sorting a data.frame
– it is only used for sorting a vector. Instead, use order()
to sort a data frame on one or more variables.
# View the first 6 rows
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 1 1 85 F
## 2 2 86 F
## 3 3 83 F
## 4 4 83 F
## 5 5 85 F
## 6 6 79 M
# Sort by one variable
<- mydat[order(mydat$Age), ]
mydat
# Now it is sorted from youngest to oldest
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 179 208 42 F
## 193 222 42 F
## 213 242 42 F
## 429 458 42 F
## 433 462 42 F
## 478 507 42 M
# Sort in reverse order
<- mydat[order(mydat$Age, decreasing = T), ]
mydat
# Now it is sorted from oldest to youngest
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 7 7 90 F
## 8 8 90 F
## 20 20 90 F
## 21 21 90 F
## 54 54 90 F
## 66 66 90 F
# Sort by more than one variable
<- mydat[order(mydat$Sex, mydat$Age), ]
mydat
# Now it is sorted with Females first and
# from youngest to oldest within sex
head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 179 208 42 F
## 193 222 42 F
## 213 242 42 F
## 429 458 42 F
## 433 462 42 F
## 80 109 43 F
# If you use decreasing = T, it reverses all the variables
# If you want to reverse the sort only for a subset of variables,
# use -1*x for a numeric variable or relabel a factor in the
# order you want
<- mydat[order(mydat$Sex, mydat$Age, decreasing = T), ]
mydat head(mydat[, c("ID", "Age", "Sex")])
## ID Age Sex
## 51 51 84 M
## 42 42 83 M
## 28 28 82 M
## 44 44 81 M
## 61 61 81 M
## 56 56 80 M
NOTE: Why does order()
work for sorting? The function order()
returns row numbers in the sorted order. By inserting order()
within brackets []
you are selecting all the rows, but in a permuted order.
In tidyverse
, use arrange()
, with desc()
(“descending”) to reverse the order.
# View the first 6 rows
%>%
mydat_tibble select(ID, Age, Sex) %>%
head()
# Sort by one variable
<- mydat_tibble %>%
mydat_tibble arrange(Age)
# Now it is sorted from youngest to oldest
%>%
mydat_tibble select(ID, Age, Sex) %>%
head()
# Sort in reverse order
<- mydat_tibble %>%
mydat_tibble arrange(desc(Age))
# Now it is sorted from oldest to youngest
%>%
mydat_tibble select(ID, Age, Sex) %>%
head()
# Sort by more than one variable
<- mydat_tibble %>%
mydat_tibble arrange(Sex, Age)
# Now it is sorted with Females first and
# from youngest to oldest within sex
%>%
mydat_tibble select(ID, Age, Sex) %>%
head()
# Unlike with base R, you can reverse the order of single
# variables in tidyverse
<- mydat_tibble %>%
mydat_tibble arrange(desc(Sex), Age)
# Now it is sorted with Males first and
# from youngest to oldest within sex
%>%
mydat_tibble select(ID, Age, Sex) %>%
head()
NOTE: Whether using the base R function order()
or the tidyverse
function arrange()
, NA values are always sorted at the end, even if you use decreasing = T
(for order()
) or desc()
(for arrange()
).