7.3 row_number()
Using row_number()
with mutate()
will create a column of consecutive numbers. The row_number()
function is useful for creating an identification number (an ID variable). It is also useful for labeling each observation by a grouping variable.
### Practice Dataset
practice <-
tibble(Subject = rep(c(1,2,3),8),
Date = c("2019-01-02", "2019-01-02", "2019-01-02",
"2019-01-03", "2019-01-03", "2019-01-03",
"2019-01-04", "2019-01-04", "2019-01-04",
"2019-01-05", "2019-01-05", "2019-01-05",
"2019-01-06", "2019-01-06", "2019-01-06",
"2019-01-07", "2019-01-07", "2019-01-07",
"2019-01-08", "2019-01-08", "2019-01-08",
"2019-01-01", "2019-01-01", "2019-01-01"),
DV = c(sample(1:10, 24, replace = T)),
Inject = rep(c("Pos", "Neg", "Neg", "Neg", "Pos", "Pos"), 4))
Using the practice dataset, let’s add a variable called Session
. Each session is comprised of 1 positive day and 1 negative day closest in date. For example, the first observation of Inject = pos and the first observation where Inject = neg will both have a Session
value of 1
; the second observation of Inject = pos and the second observation of Inject = neg will be session 2). In the code below, you will see three methods for creating Session
. Which method produces the result we need?
## Method1
practice %>%
mutate(Session = row_number())
## # A tibble: 24 x 5
## Subject Date DV Inject Session
## <dbl> <chr> <int> <chr> <int>
## 1 1 2019-01-02 9 Pos 1
## 2 2 2019-01-02 4 Neg 2
## 3 3 2019-01-02 7 Neg 3
## 4 1 2019-01-03 8 Neg 4
## 5 2 2019-01-03 8 Pos 5
## 6 3 2019-01-03 3 Pos 6
## 7 1 2019-01-04 3 Pos 7
## 8 2 2019-01-04 3 Neg 8
## 9 3 2019-01-04 7 Neg 9
## 10 1 2019-01-05 6 Neg 10
## # ... with 14 more rows
## Method2
practice %>%
group_by(Subject, Inject) %>%
mutate(Session = row_number())
## # A tibble: 24 x 5
## # Groups: Subject, Inject [6]
## Subject Date DV Inject Session
## <dbl> <chr> <int> <chr> <int>
## 1 1 2019-01-02 9 Pos 1
## 2 2 2019-01-02 4 Neg 1
## 3 3 2019-01-02 7 Neg 1
## 4 1 2019-01-03 8 Neg 1
## 5 2 2019-01-03 8 Pos 1
## 6 3 2019-01-03 3 Pos 1
## 7 1 2019-01-04 3 Pos 2
## 8 2 2019-01-04 3 Neg 2
## 9 3 2019-01-04 7 Neg 2
## 10 1 2019-01-05 6 Neg 2
## # ... with 14 more rows
## Method3
practice %>%
group_by(Subject, Inject) %>%
arrange(Date) %>%
mutate(Session = row_number())
## # A tibble: 24 x 5
## # Groups: Subject, Inject [6]
## Subject Date DV Inject Session
## <dbl> <chr> <int> <chr> <int>
## 1 1 2019-01-01 7 Neg 1
## 2 2 2019-01-01 7 Pos 1
## 3 3 2019-01-01 1 Pos 1
## 4 1 2019-01-02 9 Pos 1
## 5 2 2019-01-02 4 Neg 1
## 6 3 2019-01-02 7 Neg 1
## 7 1 2019-01-03 8 Neg 2
## 8 2 2019-01-03 8 Pos 2
## 9 3 2019-01-03 3 Pos 2
## 10 1 2019-01-04 3 Pos 2
## # ... with 14 more rows
7.3.1 Exercises
Create a row ID for diamonds where each row is unique and order doesn’t matter
Create an ID that relies on the clarity of diamonds where order doesn’t matter
Create an ID that represents the price rank of the diamond.
Which diamond is #1 (highest priced diamond in dataset)?
Which diamond is ranked #2 in highest price?
Create an ID that represents price rank within each clarity category.
Of the diamonds with the clarity IF, what is the highest ranked/most expensive diamond?
Of the diamonds with the clarity SI2, what is the 2nd most expensive diamond (rank = 2)