10.4 Applications

Please note: This section is still fragmentary at this time (i.e., as of 2020-07-07 13:13:31) and can be skipped.

For additional information, see also Chapter 16: Dates and times of the r4ds textbook (Wickham & Grolemund, 2017) and corresponding examples and exercises.

10.4.1 Converting times

Converting times between different time zones is a common task. (See the lubridate functions with_tz() and force_tz() above and Exercise 4, Section 10.6.4.)

10.4.2 Computing age

A suprisingly difficult problem consists in determining someone’s age when their date of birth (DOB) is known. The reason is not that some people are secretive about revealing their age, but that we cannot simply subtract their year of birth from the current year. Instead, we need to take into account, whether they already had their birthday in the current year.

Let’s load some data to illustrate the problem and how we can solve it. The dataset exp_num_dt (available in the ds4psy package or as a CSV-file from rpository.com) contains the birth dates and study participation times of 1000 ficticious people.

We read the data file into a tibble dt and select only its date-related variables:

Table 10.2: Date-related data in dt.
name gender bday bmonth byear
I.G. male 14 12 1968
O.B. male 10 4 1974
M.M. male 28 9 1987
V.J. female 15 2 1978
O.E. male 18 5 1985
Q.W. male 1 3 1968

Here are two possible approaches to this task:

  1. Check whether a person already had her/his birthday in the current year.
## A. Compute age in (completed) years:

# (1) Describe today's date:

# (a) with base R:
today <- Sys.Date()
cur_year <- as.numeric(format(today, "%Y"))
cur_month <- as.numeric(format(today, "%m"))
cur_day <- as.numeric(format(today, "%d"))

# (b) with lubridate:
today <- today()
cur_year  <- lubridate::year(today)
cur_month <- lubridate::month(today)
cur_day   <- lubridate::day(today)

# (2) Check whether someone had his birthday this year and 
#     subtract "TRUE" (=1) from difference in years if not: 
dt2 <- dt %>%
  mutate(had_bday_this_year = ((bmonth < cur_month) | ((bmonth == cur_month) & (bday <= cur_day))), 
         age = (cur_year - byear) - !had_bday_this_year) %>%
  select(name:byear, age, had_bday_this_year, everything()) # re-arrange variables  

# Check: 
dt2 %>% filter(had_bday_this_year == TRUE)
#> # A tibble: 516 x 7
#>    name  gender  bday bmonth byear   age had_bday_this_year
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <dbl> <lgl>             
#>  1 O.B.  male      10      4  1974    46 TRUE              
#>  2 V.J.  female    15      2  1978    42 TRUE              
#>  3 O.E.  male      18      5  1985    35 TRUE              
#>  4 Q.W.  male       1      3  1968    52 TRUE              
#>  5 H.K.  male      27      4  1994    26 TRUE              
#>  6 T.R.  female     5      6  1961    59 TRUE              
#>  7 N.T.  female    19      5  1975    45 TRUE              
#>  8 U.W.  female    12      1  1996    24 TRUE              
#>  9 J.Y.  female    20      5  1987    33 TRUE              
#> 10 S.X.  female     5      3  1986    34 TRUE              
#> # … with 506 more rows
dt2 %>% filter(had_bday_this_year == FALSE)
#> # A tibble: 484 x 7
#>    name  gender  bday bmonth byear   age had_bday_this_year
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <dbl> <lgl>             
#>  1 I.G.  male      14     12  1968    51 FALSE             
#>  2 M.M.  male      28      9  1987    32 FALSE             
#>  3 F.J.  male       1     10  1983    36 FALSE             
#>  4 J.R.  female    29     12  1941    78 FALSE             
#>  5 N.S.  male      25      9  1953    66 FALSE             
#>  6 K.E.  male      10     12  1951    68 FALSE             
#>  7 O.H.  male      28      9  1949    70 FALSE             
#>  8 M.Y.  male      19     10  1938    81 FALSE             
#>  9 E.Q.  female    12     11  1932    87 FALSE             
#> 10 P.M.  female    21      7  1991    28 FALSE             
#> # … with 474 more rows
  1. Calculate a time difference (as an interval) and convert the result into a meaningful common-sense unit.
# B. A simpler solution (using lubridate)
library(lubridate)

td <- today() 

bd <- td - years(18)       # today, 18 years ago (period) 
(bd %--% td)               # interval (of life)
#> [1] 2002-07-07 UTC--2020-07-07 UTC
(bd %--% td) / years(1)    # in terms of years (as period)
#> [1] 18
(bd %--% td) %/% years(1)  # integer division: completed years (as period)
#> [1] 18

# Define a function that computes current age: 
cur_age <- function(bd) {
  
  life <- (bd %--% today()) # interval from bday to today() 
  (life %/% years(1))       # integer division (into a period of full years)
  
}

# 1. Check function with an example:
bd_1 <- today() - years(18) - days(1)  # 18 years ago, yesterday
bd_2 <- today() - years(18) + days(0)  # 18 years ago, today
bd_3 <- today() - years(18) + days(1)  # 18 years ago, tomorrow

cur_age(bd_1)  # => 18
#> [1] 18
cur_age(bd_2)  # => 18
#> [1] 18
cur_age(bd_3)  # => 17 (qed)
#> [1] 17

# 2. Apply function to dt2 data: 
dt2 <- dt2 %>%
  mutate(bdate = make_date(year = byear, month = bmonth, day = bday), 
         age_2 = cur_age(bdate)
         ) %>%
  select(name:byear, bdate, age, age_2, everything())

# Check: 
all.equal(dt2$age, dt2$age_2)
#> [1] TRUE
dt2 %>% filter(had_bday_this_year == TRUE)
#> # A tibble: 516 x 9
#>    name  gender  bday bmonth byear bdate        age age_2 had_bday_this_year
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <date>     <dbl> <dbl> <lgl>             
#>  1 O.B.  male      10      4  1974 1974-04-10    46    46 TRUE              
#>  2 V.J.  female    15      2  1978 1978-02-15    42    42 TRUE              
#>  3 O.E.  male      18      5  1985 1985-05-18    35    35 TRUE              
#>  4 Q.W.  male       1      3  1968 1968-03-01    52    52 TRUE              
#>  5 H.K.  male      27      4  1994 1994-04-27    26    26 TRUE              
#>  6 T.R.  female     5      6  1961 1961-06-05    59    59 TRUE              
#>  7 N.T.  female    19      5  1975 1975-05-19    45    45 TRUE              
#>  8 U.W.  female    12      1  1996 1996-01-12    24    24 TRUE              
#>  9 J.Y.  female    20      5  1987 1987-05-20    33    33 TRUE              
#> 10 S.X.  female     5      3  1986 1986-03-05    34    34 TRUE              
#> # … with 506 more rows
dt2 %>% filter(had_bday_this_year == FALSE)
#> # A tibble: 484 x 9
#>    name  gender  bday bmonth byear bdate        age age_2 had_bday_this_year
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <date>     <dbl> <dbl> <lgl>             
#>  1 I.G.  male      14     12  1968 1968-12-14    51    51 FALSE             
#>  2 M.M.  male      28      9  1987 1987-09-28    32    32 FALSE             
#>  3 F.J.  male       1     10  1983 1983-10-01    36    36 FALSE             
#>  4 J.R.  female    29     12  1941 1941-12-29    78    78 FALSE             
#>  5 N.S.  male      25      9  1953 1953-09-25    66    66 FALSE             
#>  6 K.E.  male      10     12  1951 1951-12-10    68    68 FALSE             
#>  7 O.H.  male      28      9  1949 1949-09-28    70    70 FALSE             
#>  8 M.Y.  male      19     10  1938 1938-10-19    81    81 FALSE             
#>  9 E.Q.  female    12     11  1932 1932-11-12    87    87 FALSE             
#> 10 P.M.  female    21      7  1991 1991-07-21    28    28 FALSE             
#> # … with 474 more rows

Additional solutions to the surprisingly difficult task of computing someone’s age (in completed years) are possible by other ways of computing and rounding time spans.

10.4.3 Simple date and time functions

Before we wrap up this chapter, here are the good news: If you’re into satisficing and only want to query and display dates and times existing in R, rather than dealing with date- and time-related data and computing other variables from it, here’s all you need:

The ds4psy package provides a small and opinionated collection of functions that probably cover 95% of all use cases for most people. To avoid conflicts with base R and other packages — which already provide functions named date(), time(), today(), and now() — these functions all start with cur_ (for “current”). They then follow a simple heuristic: What is it that we usually want to hear as x when asking “What x is it today?” or “What x is it right now?”

  1. In my world, about 90% of all use cases are covered by 2 functions that ask for the current date or time:
  • cur_date(): Get the current date, in 2 different orders (and optional separator)
  • cur_time(): Get the current time (with optional seconds and separator)

By default, these functions return a character string denoting the current date or time. However, by setting as_string = FALSE they can be switched to returning “Date” or “POSIXct” objects.

  1. About 5% more of all use cases are covered by 4 additional functions that ask what_ questions about the position of some temporal unit in some larger continuum of time:
  • what_day(): as name (weekday, abbr or full), or as number (in units of week, month, or year; as char or as integer)
  • what_week(): only as number (in units of month, or year; as char or as integer)
  • what_month(): as name (abbr or full) or as number (as char or as integer)
  • what_year(): only as number (abbr or full, as char or as integer)

All of these take some time point(s) as input, which defaults to today (as a “Date”) or now (i.e., Sys.time() as “POSIXct”), but also accept entering vectors of dates or times.

All these function essentially provide shortcuts for time-related commands and string formatting functions and abbreviations of the “POSIX” standard, which most people — even of the R-friendly variety — cannot or do not care to remember (see the date-time conversion functions in Section 10.2.4 and ?strftime for details).

  1. For the remaining 5% of use cases, we need to know (some of) the details above (see Section 10.2.2) or the versatile toolbox provided by the lubridate package (see Section 10.3).

Note: At this point, calculations with dates and times are not supported by the ds4psy package. However, you can always calculate them with R’s elementary time classes (see Section 10.2.2) and then convert the result into more convenient formats or units.

References

Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz