## 10.6 Exercises

Here are some exercises on creating and computing with dates and times:

#### Please note

The **lubridate** package is not part of the core **tidyverse** packages.
Hence, do not forget loading this package if you want to use its commands:

Note that some key tasks (e.g., computing someone’s age, determining the weekday of some date) occur repeatedly throughout these exercises. If this gets boring, use different solution paths for solving them.

### 10.6.1 Exercise 1

#### Reading dates and times

- Use the appropriate
**lubridate**function to parse each of the following dates:

```
d1 <- "January 20, 2020"
d2 <- "2020-Apr-01"
d3 <- "11-Nov-2020"
d4 <- c("July 13 (1969)", "August 23 (1972)", "July 1 (1975)")
# Date:
d5 <- "08/12/10" # Oct 12, 2008
d6 <- d5 # Aug 12, 2010
d7 <- d5 # Oct 08, 2012
```

- Use the appropriate
**lubridate**function to parse each of the following date-times:

```
t1 <- "2020-11-11 11:11:01"
t2 <- "2020/12/24 07:30"
t3 <- "31:12:20 12:45:59"
t4 <- c("8:05 01/01/2020", "9:20 29/02/2020", "12:30 24/12/2020", "23:58 30/12/2020")
```

**Hint:** Note that `t4`

contains the time component before the date component. To handle this vector, consider creating a tibble and then using **dplyr** commands for separating its time and date components, and pasting them in reversed order (date before time).

- Determine the
*weekdays*of the 7 dates in`d4`

and`t4`

.

**Hint:** First combine the 7 dates into a vector. Then choose from an abundance of options — the **base** R function `weekdays()`

, the **lubridate** function `wday()`

, or the **ds4psy** function `what_wday()`

— to solve the task.

### 10.6.2 Exercise 2

#### Birth dates and times

The table `dt_10`

(available from **ds4psy** or rpository.com) contains the birth dates and times of 10 non-existent people. Read the data into a tibble `dt_10`

:

```
# dt_10 <- readr::read_csv("./data/dt_10.csv") # from local file
# dt_10 <- readr::read_csv("http://rpository.com/ds4psy/data/dt_10.csv") # online
dt_10 <- ds4psy::dt_10 # from ds4psy
# Show data:
knitr::kable(dt_10, caption = "Data of table `dt_10`.")
```

name | day | month | year | hour | min | sec |
---|---|---|---|---|---|---|

Anna | 8 | 8 | 1994 | 11 | 47 | 57 |

Beowulf | 1 | 6 | 1994 | 5 | 35 | 43 |

Cassandra | 14 | 11 | 2000 | 5 | 58 | 6 |

David | 17 | 1 | 1991 | 13 | 3 | 12 |

Eva | 21 | 1 | 2001 | 21 | 33 | 55 |

Frederic | 19 | 7 | 2000 | 13 | 47 | 12 |

Gwendoline | 20 | 9 | 1996 | 8 | 28 | 37 |

Hamlet | 5 | 5 | 1996 | 17 | 7 | 8 |

Ian | 18 | 8 | 1996 | 8 | 27 | 17 |

Joy | 18 | 12 | 1990 | 14 | 44 | 35 |

- Use
**base**R commands (with “POSIX” specifications) or the corresponding**lubridate**functions to parse the data of birth`dob`

and time of birth`tob`

as two new columns of`dt_10`

.

**Hint:** When using **base** R commands, consider using `paste()`

for creating a character string with appropriate separators from the date- and time-related variables contained in `dt_10`

.

As it turns out, all the people of

`dt_10`

were born in*Denmark*. Create a second tibble`dt_10_2`

that considers this fact for the`tob`

variable (e.g., when using the`make_datetime()`

function) and quantify and explain any discrepancies between`dt_10$tob`

and the corresponding variable in`dt_10_2`

.Use the appropriate

**lubridate**functions to add two columns that specify – given each person’s DOB – the*weekday*`dob_wd`

(from Monday to Sunday) of their birthday and their current age`age_fy`

in full years (i.e., the numeric value of their age, as an integer).

**Hint:** Their current age can be computed by subtracting their DOB from today’s date `today()`

.
One way of computing their age in full years is by dividing the `interval()`

of their current age by a `duration()`

in the unit of “years”. (Alternatively, rounding can also work.)

### 10.6.3 Exercise 3

This exercise uses the `fame`

dataset included in the **ds4psy** package.
Actually, the entries of the dataset were populated by the submissions of previous students.
So think carefully about your entries — they might end up in the dataset studied by future generations of students.

#### Add to `fame`

- Pick at least 4 famous people — some of which are still alive, some of which have already died — and enter their
`name`

,`area`

of occupation, date of birth (`DOB`

), and date of death (`DOD`

, if deceased) in a tibble`fame`

, in analogy to the following:

```
fame <- tibble(name = c("Napoleon Bonaparte", "Jimi Hendrix", "Michael Jackson", "Frida Kahlo",
"Angela Merkel", "Kobe Bryant", "Lionel Messi", "Zinedine Zidane"),
area = c("politics", "guitarist/music", "singer/music", "arts/painter",
"politics", "basketball/sports", "football/sports", "football/sports"),
DOB = c("August 15, 1769", "November 27, 1942", "August 29, 1958", "July 06, 1907",
"July 17, 1954", "August 23, 1978", "June 24, 1987", "June 23, 1972"),
DOD = c("May 05, 1821", "September 18, 1970", "June 25, 2009", "July 13, 1954",
NA, "January 26, 2020", NA, NA))
knitr::kable(fame, caption = "Basic info on some famous people.")
```

name | area | DOB | DOD |
---|---|---|---|

Napoleon Bonaparte | politics | August 15, 1769 | May 05, 1821 |

Jimi Hendrix | guitarist/music | November 27, 1942 | September 18, 1970 |

Michael Jackson | singer/music | August 29, 1958 | June 25, 2009 |

Frida Kahlo | arts/painter | July 06, 1907 | July 13, 1954 |

Angela Merkel | politics | July 17, 1954 | NA |

Kobe Bryant | basketball/sports | August 23, 1978 | January 26, 2020 |

Lionel Messi | football/sports | June 24, 1987 | NA |

Zinedine Zidane | football/sports | June 23, 1972 | NA |

**Note:** Please remember to enter any rare and unusual symbols as Unicode characters (see Section 9.2.2).

Use the appropriate

**lubridate**functions to replace the`DOB`

and`DOD`

variables in`fame`

by corresponding`dob`

and`dod`

variables of type “Date”.Add two variables to

`fame`

that specify the*weekday*(from “Monday” to “Sunday”) of their birth (`dob_wd`

) and – if applicable – of their death (`dob_wd`

).Add a variable

`age_days`

that computes their age in days (relative to today’s date). Then compute two more variables`age_yr1`

and`age_yr2`

that determines their age in years (as a decimal number) in two different ways. Finally, add a variable`age_fy`

that specifies their current age (in full years) as an integer (i.e., what they would say if they truthfully responded to the question “How old are you today?”).Correct your previous

`age_fyr`

variable so that — for those people who have already died — it should remain at the age at which they died (i.e., dead people do not age further).

### 10.6.4 Exercise 4

#### Time conversions

Define a time point of the New Year fireworks in Sydney, Australia, as “2021-01-01 00:00:01” (including time zone information).

Predict and explain the results of the following commands in your own words.

```
with_tz(t_fw, tz = "Europe/Berlin")
#> [1] "2020-12-31 14:00:01 CET"
force_tz(t_fw, tz = "Europe/Berlin")
#> [1] "2021-01-01 00:00:01 CET"
```

- Predict and explain the outcome of the following commands.

**Hint:** This is possible without actually running them (after having done 2.).

### 10.6.5 Exercise 5

#### Hoop times

This exercise uses the `lakers`

dataset included in **lubridate** (originally from http://www.basketballgeek.com/data/), which contains play-by-play statistics of each Los Angeles Lakers (LAL) basketball game in the 2008/2009 season of the NBA. (See `?lakers`

for details.)

Select only those games against the

*Dallas Mavericks*(abbreviated as “DAL”) and save the corresponding data as a tibble`LAL_DAL`

.Use your

`tidyverse`

knowledge acquired so far to answer some basic questions about those games:- How many such (home vs. away) games exist?

- On which dates were they played?

- What were their scores? Who won the game?

- How many such (home vs. away) games exist?

**Hint:** All these questions can be answered with a single **dplyr** pipe.

Create and add the following date and time variables to

`LAL_DAL`

:`date`

should be a variable of type “Date” (rather than a character string)

`t_clock`

should represent the time shown on the clock (as a*period*)

`t_psec`

should represent the time elapsed in the current period (a*duration*in seconds)

`t_game`

should represent the time elapsed in the game overall (as a*duration*).

**Hint:** An NBA game consists of 4 periods, each of which lasts 12 minutes
(i.e., each game’s time should add up to a total of 48 minutes).

Prominent players:

- For which individual player on each team do the data record the highest number of events?
- How many points did each of these two players score (over all games)?
- What would it take to compute the time difference between all recorded events for these two players as
**lubridate**intervals?

**Bonus task:**Compute these intervals for each of these two players.

- What would it take to compute the time difference between all recorded events for these two players as
**lubridate**intervals?

Cumulative points per game:

- Compute and add a variable for the cumulative
`point_total`

of each game and team.

- Compute the final score
`f_score`

of each game and team (and compare your result to the one obtained to answer 2. above).

- Plot the (cumulative)
`point_total`

for each game per team as a function of`t_game`

.

- Compute and add a variable for the cumulative

**Please note:**
This dataset and questions like the ones asked here are a good illustration of a possible *Data science project*. At this point, you should be starting to think about datasets and questions for your own project. (See Appendix C for some guidelines for and the scope of a successful data science projects.)

### 10.6.6 Exercise 6

#### DOB and study times

The dataset `exp_num_dt`

(available in the **ds4psy** package or as a CSV-file from rpository.com) contains the birth dates and study participation times of 1000 ficticious, but surprisingly friendly people.

We read the data file into a tibble `dt`

and select only its date-related variables:

```
# dt <- readr::read_csv("http://rpository.com/ds4psy/data/dt.csv") # online
dt <- ds4psy::exp_num_dt # ds4psy package
# dt
# Select only its date-time related variables:
dt_t <- dt %>% select(name:byear, t_1, t_2)
# Check:
# dt # 1000 x 7
knitr::kable(head(dt_t), caption = "Time-related variables of table `dt`.")
```

name | gender | bday | bmonth | byear | t_1 | t_2 |
---|---|---|---|---|---|---|

I.G. | male | 14 | 12 | 1968 | 2020-01-16 11:00:58 | 2020-01-16 11:32:21 |

O.B. | male | 10 | 4 | 1974 | 2020-01-17 14:11:07 | 2020-01-17 15:05:14 |

M.M. | male | 28 | 9 | 1987 | 2020-01-16 10:06:06 | 2020-01-16 10:51:47 |

V.J. | female | 15 | 2 | 1978 | 2020-01-10 10:06:04 | 2020-01-10 10:39:48 |

O.E. | male | 18 | 5 | 1985 | 2020-01-20 09:23:51 | 2020-01-20 10:11:36 |

Q.W. | male | 1 | 3 | 1968 | 2020-01-13 11:10:09 | 2020-01-13 11:54:07 |

The variables

`bday`

,`bmonth`

, and`byear`

contain each participant’s date of birth.- Compute a variable
`DOB`

that summarizes`bday`

,`bmonth`

, and`byear`

(in a “Date” variable) and a variable`bweekday`

that shows the weekday of each participant’s DOB (as a chacter variable).

- Compute a variable

**Hint:** A **base** R solution is about as long as the **dplyr**/**lubridate** solution.

What would each participant respond to the question

- “How old are you?”

(i.e., what was each person’s `age`

in *completed* years, when starting the study in January 2020)?
Verify your result for those participants who took part in the study *on their birthday*.

**Hint:** This task requires considering both `DOB`

and `t_1`

(to check whether the person already celebrated his or her birthday in the current year when starting the study at the time `t_1`

).

The time variables

`t_1`

and`t_2`

indicate the start and end times of each person’s participation in this study.- Compute the
*duration*of each person’s participation (in minutes and seconds) and plot the distribution of the resulting durations (e.g., as a histogram).

- Compute the

The study officially only ran for 5 days — from “2020-01-13” to “2020-01-18” — and should only include participants that responded in up to 1 hour (60 minutes).

- Add a filter variable
`valid`

that enforces these criteria (i.e., allows filtering out participants with other dates and durations longer than 60 minutes).

- Add a filter variable

Finally, we can compute some basic descriptives of the participants considered to be

`valid`

:- How many participants remain in the sample of valid data?

- What is their average
`height`

and`g_iq`

score?

- How many participants remain in the sample of valid data?

### 10.6.7 Exercise 7

#### Bonus task: Evaluating time differences

This exercise creates random time differences and compares the results of computing them in two different ways.

Use the

`sample_time()`

function of**ds4psy**to generate vectors of`N`

random starting times and`N`

random end times.Compute and compare the time difference between both vectors for various units of time. Specifically, compare the solutions of the

`diff_times()`

function of**ds4psy**with the corresponding**lubridate**solution (using time intervals and periods).Continue comparing the results of both solution methods until you find some examples with different solutions for the same time difference. Can you explain the discrepancies?

**Hint:** Here is a possible setup for an investigation of this type:

```
# Parameters:
N <- 10
t1 <- "2020-01-01 00:00:00"
t2 <- Sys.time()
# Random time vectors:
t_start <- ds4psy::sample_time(from = t1, to = t2, size = N)
t_end <- ds4psy::sample_time(from = t1, to = t2, size = N)
# in months:
ds4psy::diff_times(t_start, t_end, unit = "months", as_character = FALSE)
lubridate::as.period(lubridate::interval(t_start, t_end), unit = "months")
# in days:
ds4psy::diff_times(t_start, t_end, unit = "days", as_character = FALSE)
lubridate::as.period(lubridate::interval(t_start, t_end), unit = "days")
```

This concludes our exercises on creating and computing with dates and times.