4.4 Sort

Sorting a dataset means arranging the observations so they are in order by the values of one or more variables. For example, you could sort from youngest to oldest age, or by sex, or by sex first and then by age within sex. Do not use the base R function sort() for sorting a data.frame – it is only used for sorting a vector. Instead, use order() to sort a data frame on one or more variables.

# View the first 6 rows
head(mydat[, c("ID", "Age", "Sex")])
##   ID Age Sex
## 1  1  85   F
## 2  2  86   F
## 3  3  83   F
## 4  4  83   F
## 5  5  85   F
## 6  6  79   M
# Sort by one variable
mydat <- mydat[order(mydat$Age), ]

# Now it is sorted from youngest to oldest
head(mydat[, c("ID", "Age", "Sex")])
##      ID Age Sex
## 179 208  42   F
## 193 222  42   F
## 213 242  42   F
## 429 458  42   F
## 433 462  42   F
## 478 507  42   M
# Sort in reverse order
mydat <- mydat[order(mydat$Age, decreasing = T), ]

# Now it is sorted from oldest to youngest
head(mydat[, c("ID", "Age", "Sex")])
##    ID Age Sex
## 7   7  90   F
## 8   8  90   F
## 20 20  90   F
## 21 21  90   F
## 54 54  90   F
## 66 66  90   F
# Sort by more than one variable
mydat <- mydat[order(mydat$Sex, mydat$Age), ]

# Now it is sorted with Females first and
# from youngest to oldest within sex
head(mydat[, c("ID", "Age", "Sex")])
##      ID Age Sex
## 179 208  42   F
## 193 222  42   F
## 213 242  42   F
## 429 458  42   F
## 433 462  42   F
## 80  109  43   F
# If you use decreasing = T, it reverses all the variables
# If you want to reverse the sort only for a subset of variables,
# use -1*x for a numeric variable or relabel a factor in the 
# order you want
mydat <- mydat[order(mydat$Sex, mydat$Age, decreasing = T), ]
head(mydat[, c("ID", "Age", "Sex")])
##    ID Age Sex
## 51 51  84   M
## 42 42  83   M
## 28 28  82   M
## 44 44  81   M
## 61 61  81   M
## 56 56  80   M

NOTE: Why does order() work for sorting? The function order() returns row numbers in the sorted order. By inserting order() within brackets [] you are selecting all the rows, but in a permuted order.

In tidyverse, use arrange(), with desc() (“descending”) to reverse the order.

# View the first 6 rows
mydat_tibble %>% 
  select(ID, Age, Sex) %>% 
  head()

# Sort by one variable
mydat_tibble <- mydat_tibble %>% 
  arrange(Age)

# Now it is sorted from youngest to oldest
mydat_tibble %>% 
  select(ID, Age, Sex) %>% 
  head()

# Sort in reverse order
mydat_tibble <- mydat_tibble %>% 
  arrange(desc(Age))

# Now it is sorted from oldest to youngest
mydat_tibble %>% 
  select(ID, Age, Sex) %>% 
  head()

# Sort by more than one variable
mydat_tibble <- mydat_tibble %>% 
  arrange(Sex, Age)

# Now it is sorted with Females first and
# from youngest to oldest within sex
mydat_tibble %>% 
  select(ID, Age, Sex) %>% 
  head()

# Unlike with base R, you can reverse the order of single
# variables in tidyverse
mydat_tibble <- mydat_tibble %>% 
  arrange(desc(Sex), Age)

# Now it is sorted with Males first and
# from youngest to oldest within sex
mydat_tibble %>% 
  select(ID, Age, Sex) %>% 
  head()

NOTE: Whether using the base R function order() or the tidyverse function arrange(), NA values are always sorted at the end, even if you use decreasing = T (for order()) or desc() (for arrange()).