10.2 Essentials

Essentials of this chapter: Which base R commands and packages are covered? Here: Functions on dates and times, and the lubridate package (Spinu et al., 2018).

10.2.1 Base R

Basics: base R contains 3 classes for dates and times:

  1. Date: Dates without times
  2. POSIXct: Date-time class used to represent calendar time
  3. POSIXlt: Date-time class used to represent local time (as lists, enabling easy extraction of specific time components)

See the tutorial on https://www.r-bloggers.com/using-dates-and-times-in-r/.

Obtaining the current date and time

DateTimeClasses

  1. Sys.Date() returns an object of class Date
  2. Sys.time() returns an object of class POSIXct

Dates

Entering and interpreting dates can be tricky. It really shows the importance of adhering to conventions (specifically: order of units):

But: Note huge potential for confusion:

Example

#> [1] "0001-02-03"

Assuming an R object “01-02-03”. Which date is this?

Evaluate as.Date("01-02-03") to find out:

This illustrates 2 problems:

  1. How is the year entered?
  • Interpretation by R: 0001-02-03. Note that the year is expected as a 4-digit unit (i.e., yyyy or %Y).

  • Explication: R reads the date with formatting option "%Y-%m-%d", which yields February 03, 0001 (Saturday) (note the uppercase letter %Y).

  1. Which number is which unit: What is the order of temporal units?

What if we assumed a 2-digit year (denoted in R as lowercase letter %y)? Still confusing, as the order of 3 temporal units is unclear.

By varying the identity of the 3 elements, we obtain 6 possible interpretations of an R object “01-02-03” (as a table):

Nr. Format: Date (in R): Details: Evaluation:
1 "%y-%m-%d" 2001-02-03 February 03, 2001 (Saturday) best (decreasing units)
2 "%y-%d-%m" 2001-03-02 March 02, 2001 (Friday) bad
3 "%m-%y-%d" 2002-01-03 January 03, 2002 (Thursday) abysmal
4 "%m-%d-%y" 2003-01-02 January 02, 2003 (Thursday) bad (despite US convention)
5 "%d-%y-%m" 2002-03-01 March 01, 2002 (Friday) abysmal
6 "%d-%m-%y" 2003-02-01 February 01, 2003 (Saturday) ok (increasing units, EU convention)

Varying the order of 3 different format options yields 6 different orders that denote 6 different dates.
This demonstrates that order really matters when entering dates — and raises the question: Which is the best order to use? To find out, let’s evaluate the 6 candidate interpretations:

  • Formats 3. and 5. place the year (denoted as %y) in the middle between day and month. This makes absolutely no sense and is abysmal.

  • Formats 2. and 4. are similarly bad, by putting the day (denoted as %d) in the middle between month and year. This is also bad and confusing, even though 4. happens to be the US convention.

  • Format 6. puts the temporal units in (increasing) order. This seems ok, and happens to correspond to the EU convention. Nevertheless, the EU-convention is still suboptimal.

  • The best solution is provided by Format 1: By putting the 3 temporal units in decreasing order, a set of dates would automatically be sorted (from older to newer dates). In analogy to the alphabetic order of words that helps finding them in a lexicon, such an order is called lexicographic. In the case of dates, the decreasing order of units (i.e., year-month-day) also happens to be the ISO standard.

Conclusion:

This example teaches 2 important lessons for entering dates:

  • Lesson 1: Always enter years as 4 digits — never ever enter years as 2-digits!

This reduces the potential of confusion: As day and month have maximally 2 digits, the digits denoting the year are always distinguishable.
However, even when it is clear which number denotes the year, the digits for month and day can still be confused (unless the day is greater than 12, in which case it cannot be misinterpreted as a month).

  • Lesson 2: Use a reasonable order for dates.

For dates, only 2 orders make sense:

  1. decreasing units: yyyy-mm-dd or %Y-%m-%d
  2. increasing units: dd-mm-yyyy or %d-%m-%Y (EU convention)

Note that decreasing is superior to increasing (EU), as it is lexicographic. Also note that the US-convention: “Month dd, yyyy” is confusing and therefore excluded!

Times

Entering and interpreting times.

The POSIX standard

The details of the formats are platform-specific, but the following are likely to be widely available: most are defined by the POSIX standard.

A conversion specification is introduced by %, usually followed by a single letter (or O or E and then a single letter). Any character in the format string not part of a conversion specification is interpreted literally (and %% yields %).

Widely implemented conversion specifications include the following:34

  • %a: Abbreviated weekday name (in the current locale on this platform): Sat

  • %A: Full weekday name (…): Saturday

  • %b: Abbreviated month name (…): Feb

  • %B: Full month name (…): February

  • %C: Century (00–99): the integer part of the year divided by 100.

  • %d: Day of the month as decimal number (01–31).
  • %e: Day of the month as decimal number (1–31), with a prefix space for a single-digit number.

  • %h: Equivalent to %b.

  • %H: Hours as decimal number (00–23).
  • %I: Hours as decimal number (01–12).

  • %j: Day of year as decimal number (001–366).

  • %m: Month as decimal number (01–12).

  • %M: Minute as decimal number (00–59).

  • %p: AM/PM indicator in the locale. Used in conjunction with %I and not with %H.

  • %S: Second as integer (00–61), allowing for up to two leap-seconds.

  • %u: Weekday as a decimal number (1–7, Monday is 1).

  • %U: Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (US convention).

  • %V: Week of the year as decimal number (01–53) as defined in ISO 8601.

  • %w: Weekday as decimal number (0–6, Sunday is 0).

  • %W: Week of the year as decimal number (00–53), using Monday as the first day of week (UK convention).

  • %y: Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the POSIX standards, but they do also state that “in a future version the default century inferred from a 2-digit year will change”.

  • %Y: Year with century. Note that whereas there was no zero in the original Gregorian calendar, ISO 8601:2004 defines it to be valid (interpreted as 1BC, see https://en.wikipedia.org/wiki/0_(year)). As input, only years 0:9999 are accepted.

Occasionally useful are the following shortcuts:

As a table:

Abb.: What? Example
%c Date and time (locale-specific on output, %a %b %e %H:%M:%S %Y on input) Sat Feb 15 16:07:34 2020
%D Date formatted as %m/%d/%y 02/15/20
%F Date equivalent to %Y-%m-%d (ISO 8601 date format) 2020-02-15
%T Time equivalent to %H:%M:%S 16:07:34
%x Date (locale-specific output, %y/%m/%d on input) 02/15/2020
%X Time (locale-specific output, %H:%M:%S on input) 16:07:34
%Z Time zone abbreviation (empty if not available) CET

As an itemized list:

  • %c: Date and time (locale-specific on output, %a %b %e %H:%M:%S %Y on input): Sat Feb 15 16:07:34 2020

  • %D: Date format such as %m/%d/%y: 02/15/20

  • %F: Equivalent to %Y-%m-%d (the ISO 8601 date format) : 2020-02-15

  • %T: Equivalent to %H:%M:%S: 16:07:34

  • %x: Date (locale-specific on output, %y/%m/%d on input): 02/15/2020

  • %X: Time. Locale-specific on output, %H:%M:%S on input: 16:07:34

  • %Z: Time zone abbreviation as a character string (empty if not available): CET

10.2.1.1 Practice

  • Evaluate Sys.time() in your console. Then use your knowledge about POSIXct objects to predict the results of evaluating the following sequence:
  • Evaluate the sequence to check your predictions.

Reading date and time strings

  • Format strings
  • Time zones

10.2.2 lubridate

Key commands of the lubridate package (Spinu et al., 2018).

10.2.2.1 Creating dates and times

  1. Dates from strings:

Including times:

  1. From date components (days, months, years) and time components (seconds, minutes, hours):

(Needs data containing time variables. See Exercise 2 below.)

  1. From dates and times:

The lubridate package distinguishes between 4 types of time objects:

  1. instants: Moments or points in time
  2. durations: Time spans in exact number of seconds
  3. periods: Time spans in human units (like weeks, months, years)
  4. intervals: Durations with a starting and ending point in time

An instant is a specific moment in time. By contrast, intervals, durations, and periods are all ways of recording time spans.

Durations vs. periods:

Practice

  1. Explain the different results of the following two commands:

Explanation: The command d <- ymd("2020-01-20") assigns d to a particular date (i.e., an instant in time). To this, we add a time span (of 1 year) in two different ways: + years(1) adds the period of 1 year (in human units), yielding the same date a year later. By contrast, + dyears(1) adds the duration of 1 year (in exact number of seconds). As 2020 is a leap year (with a date “2020-02-29” and a total number of 366 days) both additions yield different results. Thus, adding periods typically yields more predictable results.

  1. Explain the different results of the following two commands:

Explanation: noon_saturday is assigned to a particular instant in time: 2020-03-28 12:00:00. The tz specification ensures that the timezone is set to CET. Germany introduces daylight savings time on “2020-03-29”: At 2am, the clocks are set forwards by 1 hour. Thus, adding the duration of 1 day (as in + ddays(1)) will yield a later time than adding the period of 1 day (as in + days(1)). As before, adding periods yields more predictable results.

  1. Predict and explain the result of the following command:

Intervals: ToDo.

10.2.3 Computing with dates and times

Show examples of computing with dates and times with base R and lubridate commands.

Computing durations or time intervals: Difference (in seconds). However, rounding is often an issue.

Computing someone’s age

A suprisingly difficult problem consists in determining someone’s age when their date of birth (DOB) is known. The reason is not that some people are secretive about revealing their age, but that we cannot simply subtract their year of birth from the current year. Instead, we need to take into account, whether they already had their birthday in the current year.

Let’s load some data to illustrate the problem and how we can solve it. The data file dt.csv (available at rpository.com) contains the birth dates and study participation times of 1000 ficticious people. Read the data into a tibble dt:

Two possible solutions:

  1. Check whether a person already had her/his birthday in the current year.
## A. Compute age in (completed) years:

# (1) Describe today's date:

# (a) with base R:
today <- Sys.Date()
cur_year <- as.numeric(format(today, "%Y"))
cur_month <- as.numeric(format(today, "%m"))
cur_day <- as.numeric(format(today, "%d"))

# (b) with lubridate:
today <- today()
cur_year  <- lubridate::year(today)
cur_month <- lubridate::month(today)
cur_day   <- lubridate::day(today)

# (2) Check whether someone had his birthday this year and 
#     subtract "TRUE" (= 1) from difference in years if not: 
dt2 <- dt %>%
  mutate(had_bday_this_year = ((bmonth < cur_month) | ((bmonth == cur_month) & (bday <= cur_day))), 
         age = (cur_year - byear) - !had_bday_this_year) %>%
  select(name:byear, age, had_bday_this_year, everything()) # re-arrange variables  

# Check: 
dt2 %>% filter(had_bday_this_year == TRUE)
#> # A tibble: 120 x 11
#>    name  gender  bday bmonth byear   age had_bday_this_y… height score
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <dbl> <lgl>             <dbl> <dbl>
#>  1 V.J.  female    15      2  1978    42 TRUE                161    93
#>  2 U.W.  female    12      1  1996    24 TRUE                161   104
#>  3 U.V.  male      13      1  1990    30 TRUE                185   100
#>  4 G.H.  female    17      1  1948    72 TRUE                165   103
#>  5 V.U.  female    22      1  1952    68 TRUE                154    99
#>  6 F.V.  female    13      2  1976    44 TRUE                167    98
#>  7 T.M.  female    14      1  1994    26 TRUE                166    NA
#>  8 Y.B.  female    10      1  1956    64 TRUE                158   103
#>  9 P.V.  male       1      2  1996    24 TRUE                197    86
#> 10 H.V.  female     7      1  1973    47 TRUE                167    99
#> # … with 110 more rows, and 2 more variables: t_1 <dttm>, t_2 <dttm>
dt2 %>% filter(had_bday_this_year == FALSE)
#> # A tibble: 880 x 11
#>    name  gender  bday bmonth byear   age had_bday_this_y… height score
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <dbl> <lgl>             <dbl> <dbl>
#>  1 I.G.  male      14     12  1968    51 FALSE               169   113
#>  2 O.B.  male      10      4  1974    45 FALSE               181   114
#>  3 M.M.  male      28      9  1987    32 FALSE               183   108
#>  4 O.E.  male      18      5  1985    34 FALSE               164   114
#>  5 Q.W.  male       1      3  1968    51 FALSE               172   103
#>  6 H.K.  male      27      4  1994    25 FALSE               157   110
#>  7 T.R.  female     5      6  1961    58 FALSE               167   103
#>  8 F.J.  male       1     10  1983    36 FALSE               158   107
#>  9 J.R.  female    29     12  1941    78 FALSE               157   107
#> 10 N.S.  male      25      9  1953    66 FALSE               170   100
#> # … with 870 more rows, and 2 more variables: t_1 <dttm>, t_2 <dttm>
  1. Calculate a time difference (as an interval) and convert the result into a meaningful unit.
# B. A simpler solution (using lubridate)
library(lubridate)

today <- today() 

bday <- today - years(18)     # today, 18 years ago (period) 
(bday %--% today)             # time interval (of life)
#> [1] 2002-02-15 UTC--2020-02-15 UTC
(bday %--% today) / years(1)  # in terms of years (as period)
#> [1] 18

# Define a function that computes current age: 
cur_age <- function(bday) {
  
  life <- (bday %--% today()) # interval from bday to today() 
  (life %/% years(1))         # integer division (into a period of full years)
  
}

# 1. Check function with an example:
bday_1 <- today() - years(18) - days(1)  # 18 years ago, yesterday
bday_2 <- today() - years(18) + days(0)  # 18 years ago, today
bday_3 <- today() - years(18) + days(1)  # 18 years ago, tomorrow

cur_age(bday_1)  # => 18
#> [1] 18
cur_age(bday_2)  # => 18
#> [1] 18
cur_age(bday_3)  # => 17 (qed)
#> [1] 17

# 2. Apply function to dt2 data: 
dt2 <- dt2 %>%
  mutate(bdate = make_date(year = byear, month = bmonth, day = bday), 
         age_2 = cur_age(bdate)
         ) %>%
  select(name:byear, bdate, age, age_2, everything())

# Check results: 
all.equal(dt2$age, dt2$age_2)
#> [1] TRUE
dt2 %>% filter(had_bday_this_year == TRUE)
#> # A tibble: 120 x 13
#>    name  gender  bday bmonth byear bdate        age age_2 had_bday_this_y…
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <date>     <dbl> <dbl> <lgl>           
#>  1 V.J.  female    15      2  1978 1978-02-15    42    42 TRUE            
#>  2 U.W.  female    12      1  1996 1996-01-12    24    24 TRUE            
#>  3 U.V.  male      13      1  1990 1990-01-13    30    30 TRUE            
#>  4 G.H.  female    17      1  1948 1948-01-17    72    72 TRUE            
#>  5 V.U.  female    22      1  1952 1952-01-22    68    68 TRUE            
#>  6 F.V.  female    13      2  1976 1976-02-13    44    44 TRUE            
#>  7 T.M.  female    14      1  1994 1994-01-14    26    26 TRUE            
#>  8 Y.B.  female    10      1  1956 1956-01-10    64    64 TRUE            
#>  9 P.V.  male       1      2  1996 1996-02-01    24    24 TRUE            
#> 10 H.V.  female     7      1  1973 1973-01-07    47    47 TRUE            
#> # … with 110 more rows, and 4 more variables: height <dbl>, score <dbl>,
#> #   t_1 <dttm>, t_2 <dttm>
dt2 %>% filter(had_bday_this_year == FALSE)
#> # A tibble: 880 x 13
#>    name  gender  bday bmonth byear bdate        age age_2 had_bday_this_y…
#>    <chr> <chr>  <dbl>  <dbl> <dbl> <date>     <dbl> <dbl> <lgl>           
#>  1 I.G.  male      14     12  1968 1968-12-14    51    51 FALSE           
#>  2 O.B.  male      10      4  1974 1974-04-10    45    45 FALSE           
#>  3 M.M.  male      28      9  1987 1987-09-28    32    32 FALSE           
#>  4 O.E.  male      18      5  1985 1985-05-18    34    34 FALSE           
#>  5 Q.W.  male       1      3  1968 1968-03-01    51    51 FALSE           
#>  6 H.K.  male      27      4  1994 1994-04-27    25    25 FALSE           
#>  7 T.R.  female     5      6  1961 1961-06-05    58    58 FALSE           
#>  8 F.J.  male       1     10  1983 1983-10-01    36    36 FALSE           
#>  9 J.R.  female    29     12  1941 1941-12-29    78    78 FALSE           
#> 10 N.S.  male      25      9  1953 1953-09-25    66    66 FALSE           
#> # … with 870 more rows, and 4 more variables: height <dbl>, score <dbl>,
#> #   t_1 <dttm>, t_2 <dttm>

10.2.4 Simple date and time functions

Now the good news: If you’re into satisficing and only want to query dates and time existing in R (rather than computing with them), here’s all you need:

The ds4psy package provides an opinionated collection of functions that probably cover 95% of all use cases. To avoid conflicts with base R and other packages — which already provide functions named date, time, today, and now — these functions all start with cur_ (for “current”). They then follow a simple heuristic: What is it that we usually want to hear as x when asking “What x is it today?” or “What x is it right now?”

  1. In my world, about 90% of all use cases are covered by 2 functions that ask for the current date or time:
  • cur_date(): in 2 different orders (optional sep)
  • cur_time(): with or without seconds (optional sep)
  1. About 5% more of all use cases are covered by 4 additional functions that ask what_ questions about the position of some temporal unit in some larger continuum of time:
  • what_day(): as name (weekday, abbr or full), or as number (in units of week, month, or year; as char or as integer)
  • what_week(): only as number (in units of month, or year; as char or as integer)
  • what_month(): as name (abbr or full) or as number (as char or as integer)
  • what_year(): only as number (abbr or full, as char or as integer)

All of these take some “point in time” time as input, which defaults to now (i.e., Sys.time()) but can also be a vector of other time points.

  1. For the remaining 5% of use cases, we need to know (some of) the details above (in Section 10.2.1) or the lubridate package.

Note: At this point, calculations with time differences are not supported by the ds4psy package. However, you can calculate with R’s elementary time classes and then convert the result into more convenient formats.

References

Spinu, V., Grolemund, G., & Wickham, H. (2018). lubridate: Make dealing with dates a little easier. Retrieved from https://CRAN.R-project.org/package=lubridate


  1. Note that this list is already partial — check out ?strftime for the full version.