10.3 Dates and times with lubridate
The previous section has shown that base R provides basic support for handling dates and times, but as the corresponding classes and functions can be confusing, this remains challenging. To facilitate working with dates and times, the lubridate package (Spinu et al., 2020) provides a more coherent and user-friendly framework. This section illustrates key lubridate commands and concepts.
As we only need lubridate in those sessions in which we are dealing with dates and times, the package is not part of the core tidyverse. Hence, we need to load it in addition to the core tidyverse packages when we want to use it:
Just as base R provided two separate functions for obtaining the current date and time — specifically,
Sys.time() — the lubridate package provides two corresponding functions:
To learn about the internal representation of both objects, we can inspect their class in R:
We see that the lubridate package uses the two key classes discussed in Section 10.2.2:
today()returns the current date (as a “Date” object).
now()returns the current calendar time (as a date-time, i.e., “POSIXct” object).
now() functions also accept a
tzone argument for specifying a time zone.
To see what time zones are used by default, we can apply the
tz() function to the results of both functions:
now() are convenient short-cuts, but should be handled with care when dealing with different time zones.
10.3.1 Parsing dates and times
When learning how to read and retrieve (elements of) dates and times in base R, we spent a lot of time and effort on conversion specifications (e.g., strings like
"%H:%M:%S") that specified parsing and formatting instructions in the so-called POSIX standard (see Section 10.2.2). Although it is good to know POSIX, as it is widely used and powerful, it can also seem a bit cumbersome and clumsy. Thus, lubridate adopts a more intuitive approach to parsing dates and times.
To create new dates or times, lubridate provides functions that parse (i.e., read or scan) them from various other objects. Specifically, we can read dates or times
- from character strings (representing dates or times);
- from variables (denoting date or time components);
- from other types (i.e., date or time objects).
The next sections introduce the lubridate functions for each of these object types.
1. Read from character strings
In Chapter 6 on Importing data, we encountered some readr functions that parse character vectors into dates or times (see the
parse_time() functions in Section 6.2.1).
The lubridate package provides even simpler tools for reading in dates and times. The function names are combinations of the initial letter of basic date and time components:
- date components are:
- time components are:
The order of these components in the function name determines how the arguments (provided as strings) are interpreted. Here are some examples:
- Dates from strings: Without any further specification, a date-denoting string like
"02 04 06"would be highly ambiguous (see Table 10.1 of Section 10.2.2). To read this string into a date, lubridate allows us to use a combination of
yto indicate which date-related element each numeric component describes:
ymd("06 02 04") #>  "2006-02-04" ymd("20060204") # works as well #>  "2006-02-04" ymd("2006 2 4") #>  "2006-02-04" ymd("2006 : 2///04") #>  "2006-02-04" ymd(060204) # works with numbers #>  "2006-02-04" dmy("4 2 06") #>  "2006-02-04" dmy("04 02 2006") #>  "2006-02-04" mdy("02 04 06") #>  "2006-02-04" mdy(020406) # works with numbers #>  "2006-02-04" # Even "abysmal" orders are possible: myd("06 02 04") #>  "2002-06-04" dym("06 02 04") #>  "2002-04-06" # Resulting representation: class(ymd(Sys.Date())) # a "Date" object #>  "Date"
- Dates with times (i.e., date) from strings:
- Times from strings:
Note that the particular representation of date and time objects (e.g., as the columns of a tibble) varies with the functions that created the corresponding variable (column):
dt <- c("10:05 29/02/2020", "20:10 24/12/2020") tb <- tibble::tibble(datetime = dt) tb #> # A tibble: 2 x 1 #> datetime #> <chr> #> 1 10:05 29/02/2020 #> 2 20:10 24/12/2020 tb2 <- tb %>% separate(datetime, into = c("time", "day"), sep = " ", remove = FALSE) %>% separate(time, into = c("hour", "min"), sep = ":", remove = FALSE) %>% mutate(dt_1 = paste(day, time), # a character string dt_2 = dmy_hm(dt_1), # a <dttm> object ti_2 = hm(time) # a <period> object ) tb2 #> # A tibble: 2 x 8 #> datetime time hour min day dt_1 dt_2 ti_2 #> <chr> <chr> <chr> <chr> <chr> <chr> <dttm> <Period> #> 1 10:05 29/0… 10:05 10 05 29/02/… 29/02/20… 2020-02-29 10:05:00 10H 5M 0S #> 2 20:10 24/1… 20:10 20 10 24/12/… 24/12/20… 2020-12-24 20:10:00 20H 10M 0S
2. Read from date and time variables
Many datasets already contain variables that denote date components (i.e., values for years, months, weeks, or days) or time components (i.e., values for hours, minutes, or seconds). Given our skills in dealing with Strings of text (from Chapter 9) we could first paste these variables into a character string and than parse this string into a date or time variable. However, the lubridate package also provides more direct functions for converting such variables into dates or calendar times:
make_date()expects inputs to
dayarguments to create an object of the “Date” class:
make_date(year = 2020, month = 7, day = 13) # as numbers #>  "2020-07-13" make_date(year = "2020", month = "7", day = "13") # as characters #>  "2020-07-13" make_date(year = 2020, month = "007", day = "013") # mixed #>  "2020-07-13" # Note defaults for missing elements: make_date(year = 2020, month = 7) # day = 1 #>  "2020-07-01" make_date(month = 2, day = 13) # year = 1970 #>  "1970-02-13" make_date(year = 2020, day = 13) # month = 1 #>  "2020-01-13" # Representation as "Date": is.Date(make_date(year = 2020)) #>  TRUE is.POSIXct(make_date(year = 2020)) #>  FALSE
make_date() accepts a variety of input types and fills in default values for missing elements.
make_datetime()expects the same inputs as
dayarguments), plus additional inputs to its time-related arguments (
sec, and a time zone
tz, which defaults to UTC) to create a calendar time (i.e., “POSIXct”) object:
make_datetime(year = 2020, month = 7, day = 13, hour = 10, min = 30, sec = 45, tz = "Europe/Zurich") #>  "2020-07-13 10:30:45 CEST" ## Note: Inputs need to be numeric # make_datetime(year = "2020", month = "7", day = "13", # hour = "10", min = "30", sec = "45") # Note defaults for missing elements: make_datetime(year = 2020) #>  "2020-01-01 UTC" make_datetime(sec = 33) #>  "1970-01-01 00:00:33 UTC" # Representation as "POSIXct": is.Date(make_datetime(year = 2020)) #>  FALSE is.POSIXct(make_datetime(year = 2020)) #>  TRUE
make_datetime() is less flexible than
make_date() in expecting that all its arguments are numeric.
The lubridate additionally includes a
make_difftime() function for creating
difftime objects in various units of time (see the section on durations below).
To demonstrate these functions for creating dates or times from variables, we need some data that contains date and time variables.
Lacking such a dataset, we can create one. The following code snippet does this by working backwards:
We first use the
sample_time() function from ds4psy (to draw random samples of calendar times within a specific range of time) and then use a dplyr pipe to extract its date- and time-related components.52
# Sample N calendar dates within the current year: N <- 25 t_end <- ceiling_date(now(), "year") # ymd_hms("2020-12-31 23:59:59") t_start <- t_end - years(1) set.seed(246) # for reproducible randomness dt_org <- sort(ds4psy::sample_time(from = t_start, to = t_end, size = N)) # Tibble with date and time components (as columns): dt_tb <- tibble::tibble(dt_org) %>% mutate(yr = year(dt_org), mt = month(dt_org), dy = day(dt_org), hr = hour(dt_org), mi = minute(dt_org), sc = second(dt_org) ) %>% select(-dt_org)
Just copy and run this code chunk and note that the resulting tibble
dt_tb contains numeric columns that contain date and time components:
Given this data, we can use
make_date() for creating dates, or
make_datetime() for creating calendar times:
dt_tb <- dt_tb %>% mutate(date = make_date(year = yr, month = mt, day = dy), datetime = make_datetime(year = yr, month = mt, day = dy, hour = hr, min = mi, sec = sc) ) head(dt_tb) #> # A tibble: 6 x 8 #> yr mt dy hr mi sc date datetime #> <dbl> <dbl> <int> <int> <int> <dbl> <date> <dttm> #> 1 2020 1 1 5 22 30 2020-01-01 2020-01-01 05:22:30 #> 2 2020 2 10 3 15 39 2020-02-10 2020-02-10 03:15:39 #> 3 2020 3 7 8 17 17 2020-03-07 2020-03-07 08:17:17 #> 4 2020 3 31 19 5 35 2020-03-31 2020-03-31 19:05:35 #> 5 2020 6 4 6 35 19 2020-06-04 2020-06-04 06:35:19 #> 6 2020 6 4 22 25 1 2020-06-04 2020-06-04 22:25:01
To evaluate our success, we can verify whether the
datetime column in
dt_tb managed to re-construct the original vector
dt_org created above:
Ooops — this is awkward: Why do we not get out the original times
dt_org that we fed into the table?
A first hypothesis could be that calendar times (of the “POSIXct” class) are numeric objects and hence some differences may be due to rounding. We can check this by using the
round_date() function to round both our original vector (i.e.,
dt_org) and the newly created one (
dt_tb$datetime) to the same units (e.g., “sec”):
So rounding did not diminish the difference, and the discrepancies observed here are far too large to be due to rounding differences.
A clue to solving this puzzle is provided by computing the time difference between our newly created times
the original times
This shows that our new calendar times
dt_tb$datetime are either 1 or 2 hours (i.e., 3600 seconds or 7200 seconds) later than our original times
dt_org. To detect the source of this difference, let’s look more closely at both vectors again:
# Inspect details: head(dt_org) #>  "2020-01-01 05:22:30 CET" "2020-02-10 03:15:39 CET" #>  "2020-03-07 08:17:17 CET" "2020-03-31 19:05:35 CEST" #>  "2020-06-04 06:35:19 CEST" "2020-06-04 22:25:01 CEST" head(dt_tb$datetime) #>  "2020-01-01 05:22:30 UTC" "2020-02-10 03:15:39 UTC" #>  "2020-03-07 08:17:17 UTC" "2020-03-31 19:05:35 UTC" #>  "2020-06-04 06:35:19 UTC" "2020-06-04 22:25:01 UTC"
We can see that both vectors show the same dates and times, but for different time zones.
make_datetime() used “UTC” by default, whereas our original vector
dt_org automatically used the current setting of our local system (here: “Europe/Berlin”):
Thus, to prevent such problems, we need to be explicit about the appropriate time zone when calling the
dt_tb <- dt_tb %>% mutate(date = make_date(year = yr, month = mt, day = dy), datetime = make_datetime(year = yr, month = mt, day = dy, hour = hr, min = mi, sec = sc, tz = "") # tz = "" uses Sys.timezone() ) head(dt_tb) #> # A tibble: 6 x 8 #> yr mt dy hr mi sc date datetime #> <dbl> <dbl> <int> <int> <int> <dbl> <date> <dttm> #> 1 2020 1 1 5 22 30 2020-01-01 2020-01-01 05:22:30 #> 2 2020 2 10 3 15 39 2020-02-10 2020-02-10 03:15:39 #> 3 2020 3 7 8 17 17 2020-03-07 2020-03-07 08:17:17 #> 4 2020 3 31 19 5 35 2020-03-31 2020-03-31 19:05:35 #> 5 2020 6 4 6 35 19 2020-06-04 2020-06-04 06:35:19 #> 6 2020 6 4 22 25 1 2020-06-04 2020-06-04 22:25:01
Note that setting
tz = "" is a shortcut for using our local system time zone in
make_datetime(), rather than its “UTC” default. We could have been even more explicit by stating
tz = Sys.timezone() or
tz = "Europe/Berlin".
As both vectors now used the same time zone (i.e., Europe/Berlin), they should contain the same date-time points:
This is reassuring — and good that we compared our result to the original date-time vector. The important lesson to learn here is:
- Always watch out for time zones when working with times.
We will reconsider this issue in Exercise 2 (see Section 10.6.2).
3. Read from dates or times
Given that R distinguishes between dates (e.g., of class “Date”, see Section 10.2.3) and dates with times (e.g., calendar times of class “POSIXct”, see Section 10.2.4), it is often necessary to switch between these formats. Let’s first re-create a date and a date-time object to work with:
The lubridate functions
as_datetime() facilitate such conversions:
as_date()converts date-times (i.e., calendar times) into dates.
as_datetime()converts dates (of class “Date”) into date-times (if possible).
Converting date-times into dates is straightforward, as it merely drops the time-related information:
Converting dates into date-times is trickier, as dates lack information about times:
We see that
feb_29 are calendar dates (i.e., date with times of type “POSIXct”), but still seem to lack time information.
Nevertheless, the time information is there, but the default time of the date was set to “00:00:00 UTC”.
This becomes apparent when explicating the time object by supplying a more detailed
Again, we see: Always watch out for time zones when working with times.
as_datetime() functions receive numeric inputs, they add a corresponding number of increments to the Unix epoch at “1970-01-01 00:00:00 UTC” (see Wikipedia: Unix_time for details).
Note that date increments are interpreted as days, whereas time increments are interpreted as seconds:
# Dates from numeric inputs: as_date(0) # Unix epoch #>  "1970-01-01" as_date(1) # increment: +1 day #>  "1970-01-02" as_date(365) # +1 year #>  "1971-01-01" # Times from numeric inputs: as_datetime(0) # Unix epoch #>  "1970-01-01 UTC" as_datetime(1) # increment: +1 second #>  "1970-01-01 00:00:01 UTC" as_datetime(5 * 60 * 60) # +5 hours #>  "1970-01-01 05:00:00 UTC" as_datetime(24 * 60 * 60) # +1 day #>  "1970-01-02 UTC" as_datetime(365 * 24 * 60 * 60) # +1 year #>  "1971-01-01 UTC"
10.3.2 Get and set date and time components
Having succeeded in creating date-time objects (from strings, other variables, or a date), we can ask additional questions:
- How can we get or set individiual date and time components (of date-time objects)?
To illustrate this, we can use our
tnow scalar, which is an object of the “POSIXct” class:
Actually, we have already encountered a pretty nifty way of retrieving individual date and time components (see the conversion functions of the POSIX standard in Section 10.2.4 above, or evaluate
# Using format (see POSIX above): format(tnow, "%A") #>  "Friday" format(tnow, "%F") #>  "2020-11-20" format(tnow, "%T %Z") #>  "16:51:58 CET" # explicate_time <- "%a, %Y-%m-%d %H:%M:%S %Z (%z from UTC)" # (defined above) format(tnow, format = explicate_time) #>  "Fri, 2020-11-20 16:51:58 CET (+0100 from UTC)"
However, using the
format() function with a
format argument according to the POSIX standard is pretty geeky.
If lubridate lives up to its name, its functions should flow a bit more fluently.
Getting date components
In fact, the following lubridate functions are pretty self-explanatory in retrieving individual date and time components from a date-time object (of the “POSIXct” or “POSIXlt” class):
# Get date components (from POSIX times): year(tnow) # full year (numeric) #>  2020 month(tnow) # month in year (numeric) #>  11 mday(tnow) # day in month (numeric) #>  20 # Noteworthy date components: yday(tnow) # day of year (numeric). See format(tnow, "%j") #>  325 wday(tnow) # day of week (numeric, week_start = 7 by default), see format(tnow, "%w") #>  6
For date components with names, using
abbr arguments yields character strings as outputs:
# Get names instead of numbers: month(tnow, label = TRUE, abbr = TRUE) # month in year (name) #>  Nov #> 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec wday(tnow, label = TRUE, abbr = FALSE) # day of week (name) #>  Friday #> 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
It is noteworthy and re-assuring that these date-related functions also work for dates (of class “Date”):
Getting time components
Getting time-related components from date-time objects (i.e., “POSIXct” times) is just as easy:
Note the following defaults when asking for time components of dates:
Setting date and time components
Interestingly, the same lubridate functions that get date and time components can also be used to set those components:
# Copy: new_tnow <- tnow new_tnow # check #>  "2020-11-20 16:51:58 CET" # SET date components: year(new_tnow) <- 2022 month(new_tnow) <- 12 day(new_tnow) <- 31 new_tnow # check #>  "2022-12-31 16:51:58 CET" # SET time components: hour(new_tnow) <- 23 minute(new_tnow) <- 45 second(new_tnow) <- 59 tz(new_tnow) <- "Australia/Sydney" new_tnow # check #>  "2022-12-31 23:45:59 AEDT"
What if we re-set a date component that depends on the date?
new_tnow #>  "2022-12-31 23:45:59 AEDT" format(new_tnow, explicate_time) #>  "Sat, 2022-12-31 23:45:59 AEDT (+1100 from UTC)" # day of week: wday(new_tnow) #>  7 wday(new_tnow) <- 4 # CHANGE the weekday (4 is "Wed"") new_tnow # Note that date has shifted to accommodate wday #>  "2022-12-28 23:45:59 AEDT" wday(new_tnow, label = TRUE) #>  Wed #> Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat # day in year: yday(new_tnow) #>  362 yday(new_tnow) <- 366 # CHANGE the yday (note value) new_tnow # Note that date has shifted (to the next year!) to accommodate yday #>  "2023-01-01 23:45:59 AEDT" yday(new_tnow) #>  1
Thus, we can use the same functions that get information from dates and date-times to set its elements. However, beware that setting date-time components can have unintended consequences. As date-time components are not independent of each other, setting some components typically affects other components.
10.3.3 Working with time spans
In Section 10.2.1, we distinguished between time points (often called instants or moments) and time spans (aka. durations, intervals, or time periods). However, different usages of these time span terms actually imply different concepts. To enable accurate computations, we need to distinguish more carefully between the different types of time spans.
In fact, lubridate implements its own ontology of time spans.
Beyond time points (i.e., a particular instant or moment in time), the package distinguishes between 3 types of time spans:
- durations are time spans in exact numbers of seconds
- periods are time spans in human units (e.g., days, months, years)
- intervals are time spans with a given start and end point in time
Essentially, both durations and intervals express physical time spans (a specific number of seconds), whereas periods express time spans in human units that may vary based on context (e.g., not every day has the same number of hours and not every month or year have the same number of days). Intervals are durations that are anchored in calendar time (i.e., intervals have start and end points that are real date-times).
We will consider each type of time span to see how they are created and find out what we can do with them. To motivate our explorations, consider the following example:
On Tuesday, September 11, 2001, the terrorist group al-Qaeda attacked several targets in the United States in a coordinated fashion. At 08:46 a.m., five hijackers crashed an American Airlines plane into the northern facade of the World Trade Center in New York City. Many remember the vivid images of this particular event (a so-called dread risk event), even though it happened many years ago. This raises the question:
- How long ago did the 9/11-attacks take place?
Please take a moment (or rather: some time span) to think about potential answers to this question: What would you accept as an informative answer? How does this answer depend on when or where the question is being asked? What kind of accuracy would you expect? And which temporal unit(s) would an answer be expressed in?
As a first approach for answering the question “How long ago did the 9/11-attacks take place?”, we can enter the particular time point of this event and subtract it from
now() to compute a time difference object in R:
The time difference
td represents a duration as an R object of the “difftime” class, which is automatically displayed as a count of “days”. The corresponding
difftime() function (see Section 10.2.4) offers a range of
units varying from “secs” to “weeks”, which are all rather limited in this case:
# Representation: class(td) #>  "difftime" is.difftime(td) #>  TRUE is.numeric(td) #>  FALSE # difftime units: difftime(tm_now, tm_911, units = "weeks") #> Time difference of 1001.447 weeks difftime(tm_now, tm_911, units = "auto") #> Time difference of 7010.129 days difftime(tm_now, tm_911, units = "secs") #> Time difference of 605675159 secs
For time spans exceeding a few months, the duration class provided by lubridate is a better alternative: The lubridate notion of duration measures time spans as the number of elapsed seconds.
Durations are internally defined as a special class of object and record time spans in numeric form (as numbers of seconds):
There are several constructor functions (all starting with
d) that facilitate defining durations:
# Creating durations: dseconds(10) #>  "10s" dminutes(30) #>  "1800s (~30 minutes)" dhours(1) #>  "3600s (~1 hours)" ddays(2) #>  "172800s (~2 days)" dweeks(3) #>  "1814400s (~3 weeks)" dmonths(1) # ?: Does this make sense? #>  "2629800s (~4.35 weeks)" dyears(2) #>  "63115200s (~2 years)"
Note that all these definitions internally create “Duration” objects that denote numbers of elapsed seconds, but are printed in a more human-readable fashion.
dmonths() function is flagged (with
?) as it’s underlying notion is a bit tricky. We just learned that durations are defined as time spans measuring an exact number of seconds — but how many seconds are there in a month? The answer clearly depends on the month in question (e.g., July is longer than June, and both are longer than February) and can only be determined when the particular month is not known.53 Thus,
dmonths(1) (evaluating to 2629800) can only be an estimate and should be handled with care in practical applications.
As they are numbers, durations can be used in arithmetic expressions:
However, we need to keep in mind that they represent abstract time spans (in numbers of seconds). Thus, adding durations to date-time objects (i.e., calendar times of the “POSIXct” class) can yield unexpected results:
# Apparent oddities with durations: t1 <- ymd_hms("2020-03-29 00:00:00", tz = "Europe/Berlin") t1 + dhours(10) # add 10 hours (as duration) #>  "2020-03-29 11:00:00 CEST" t1 + ddays(1) # add 1 day (as duration) #>  "2020-03-30 01:00:00 CEST" t2 <- ymd("2020-02-28") t2 + dyears(1) # add a year (as duration) #>  "2021-02-27 06:00:00 UTC"
We see that adding a duration of 10 hours or 1 day to
t1 seemingly created a difference of 11 or 25 hours (in calendar time, but note the switch of time zone, due to daylight saving time, DST. Similarly, adding a year’s worth of seconds to
t2 moved the date back by a day (due to 2020 being a leap year). Both results are correct, of course, if we really meant to add time spans as a specific number of seconds (i.e., durations).
But as we often mean something else when thinking “ten hours later”, “tomorrow”, or “next year”, durations are rather limited when calculating time spans in human units. But that’s ok — for that’s exactly what periods are for.
When asking our original question:
- How long ago did the 911-attacks take place?
receiving the number 7010.129) as its answer would be precise, but probably not satisfy us. This is because we typically do not think about longer periods of time in terms of an exact number of seconds. Instead, we tend to provide counts of various units of time so that their sum fills out the period of time we are dealing with.
In lubridate, periods are time spans that are expressed in human common-sense units of time (e.g., hours, days, months, years). Importantly, a period varies in its length (when expressed as durations, i.e., number of seconds, except periods defined in seconds) based on its context. For instance, the leap year 2020 is 366 days long (as it contains a February 29, 2020), whereas the year 2021 is only 365 days long. Flexible periods turn into fixed time spans (of various lengths) when added to a specific time point (date-time or calendar time).
As a consequence of their nature, periods are suited to set and track the change in the “clock time” between two events (date-times).
Periods are expressed and measured in common time units (ranging from seconds to years) and provide each unit as integer values (though seconds can be non-integers). Periods are created by simple constructor functions (that are all plural versions of the desired time unit):
# Creating periods: # from time differences: as.period(tm_now - tm_911) #>  "7010d 3H 5M 58.6861400604248S" # in units of integers: seconds(7) #>  "7S" minutes(5) #>  "5M 0S" hours(2) #>  "2H 0M 0S" days(5) #>  "5d 0H 0M 0S" weeks(2) #>  "14d 0H 0M 0S" months(6) #>  "6m 0d 0H 0M 0S" years(10) #>  "10y 0m 0d 0H 0M 0S"
As they objects of class “Period” are numbers, they can be used in computing arithmetic expressions:
When computing with periods, each unit is applied separately. The distribution of periods among units is non-trivial (e.g., the duration of some days, months, or years are longer than others), but this complexity is hidden from us. In fact, as we tend to represent dates and times in terms of periods (at least as long as we think of calendar time), using periods in calculations typically yields more intuitive results than adding durations (see above):
# Intuitive results with periods: t1 <- ymd_hms("2020-03-29 00:00:00", tz = "Europe/Berlin") t1 + hours(10) # add 10 hours (as period) #>  "2020-03-29 10:00:00 CEST" t1 + years(1) # add 1 year (as period) #>  "2021-03-29 CEST" t2 <- ymd("2020-02-28") t2 + years(1) # add a year (as period) #>  "2021-02-28"
Thus, when reckoning with times and dates in various human-based units (like days, weeks, or months), periods are most likely the type of time span that we want to use.
In lubridate, intervals are time spans that are bound by two time points that are real date-times (or calendar times). Thus, intervals are durations anchored in date-times (or calendar times) and provide a bridge between durations (i.e., number of seconds) and periods (i.e., common-sense time units) when at least one point in calendar time is known.
A first way of defining an interval requires a time span
x (which can be a time difference, duration, or period) and a
start date (typically a date-time or “POSIXct” object):
# Define an interval i1: tdiff <- (tm_now - tm_911) # from difftime: i1 <- as.interval(x = tdiff, start = tm_911) i1 #>  2001-09-11 08:46:00 EDT--2020-11-20 10:51:58 EST tdur <- as.duration(tm_now - tm_911) # from duration i2 <- as.interval(x = tdur, start = tm_911) i2 == i1 #>  TRUE tper <- as.period(tm_now - tm_911) # from period i3 <- as.interval(x = tper, start = tm_911) i3 == i1 #>  FALSE
An alternative way of defining an interval uses its
end points (as date-time objects) and places a special operator
%--% between them (using “infix” notation):
Internally, lubridate represents intervals as objects of class “Interval”, which is numeric in nature:
Since an interval is anchored firmly in calendar time, both the exact number of seconds that passed (i.e., a duration) and the number of variable length time units that occurred during the interval (i.e., a period) can be calculated from a given interval.
For accurately converting intervals into durations or periods, we can use the
Multiple transformations between durations, periods, and intervals yield the expected results (except for rounding differences):
but returning from durations or periods to an interval requires specifying a start date (as an anchor):
Intervals can be thought of as lines with given start and end points on a linear axis of time. Thus, we can ask and answer a range of interesting questions when dealing with one or more intervals:
# Define intervals: # (a) based on durations: next_6m_dur <- tm_now %--% (tm_now + dmonths(6)) last_6m_dur <- (tm_now - dmonths(6)) %--% tm_now # (b) based on periods: next_6m_per <- tm_now %--% (tm_now + months(6)) day_before_yesterday <- (today() - days(2)) %--% (today() - days(1)) # Note: next_6m_per > next_6m_dur #>  FALSE # Querying intervals: ---- # Start and end points: int_start(next_6m_per) #>  "2020-11-20 16:51:58 CET" int_end(day_before_yesterday) #>  "2020-11-19 UTC" # Length (in seconds): int_length(next_6m_dur) #>  15778800 int_length(next_6m_per) #>  15634800 # Flip or shift interval: int_flip(day_before_yesterday) #>  2020-11-19 UTC--2020-11-18 UTC int_shift(day_before_yesterday, by = days(3)) #>  2020-11-21 UTC--2020-11-22 UTC # Overlaps: int_overlaps(next_6m_dur, next_6m_per) #>  TRUE int_overlaps(next_6m_dur, day_before_yesterday) #>  FALSE # Same start or end point: int_aligns(next_6m_dur, next_6m_per) #>  TRUE int_aligns(next_6m_dur, last_6m_dur) #>  FALSE
The infix operator
x %within% y allows checking whether an interval or date-time
x lies within an interval or list of intervals
int_diff() function is similar to the base R function
diff(), but returns the intervals that occur between the elements of a vector of date-times:
# A vector of date-time points: tv <- now() + months(1:4) tv #>  "2020-12-20 16:51:59 CET" "2021-01-20 16:51:59 CET" #>  "2021-02-20 16:51:59 CET" "2021-03-20 16:51:59 CET" # Intervals between date-times: int_diff(tv) #>  2020-12-20 16:51:59 CET--2021-01-20 16:51:59 CET #>  2021-01-20 16:51:59 CET--2021-02-20 16:51:59 CET #>  2021-02-20 16:51:59 CET--2021-03-20 16:51:59 CET as.duration(int_diff(tv)) #>  "2678400s (~4.43 weeks)" "2678400s (~4.43 weeks)" "2419200s (~4 weeks)" as.period(int_diff(tv)) #>  "1m 0d 0H 0M 0S" "1m 0d 0H 0M 0S" "1m 0d 0H 0M 0S"
Divinding time spans
When asking “How long…” or “How old…” questions, we often are looking for answers that express a time span in terms of another one.
For instance, we can determine how many durations or periods fall into a given interval of time by dividing intervals by other time spans. This is straightforward for durations:
next_6m_dur # an interval #>  2020-11-20 16:51:58 CET--2021-05-22 08:51:58 CEST is.interval(next_6m_dur) #>  TRUE # Dividing intervals by durations: next_6m_dur / dyears(1) #>  0.5 next_6m_dur / dmonths(1) #>  6 next_6m_dur / dweeks(1) #>  26.08929 next_6m_dur / ddays(1) #>  182.625
— yet may yield unexpected results — and also works for periods:
However, we cannot divide time differences or durations by periods, or periods by durations:
In practical contexts, we often do not care about exact durations, but are primarily interested in the number of completed time periods. These can be computed by dividing time intervals by periods (by using integer division):
Having learned about three different types of time spans and their combinations, we finally are in a position to answer our original question:
- How long ago did the 911-attacks take place?
In most applied contexts, the following estimates — based on a duration, a period, or an interval — would count as informative answers:
# (1) from a duration: t_dur <- as.duration(tm_now - tm_911) round(t_dur / dyears(1), 1) #>  19.2 round(t_dur / dmonths(1), 1) #>  230.3 # (2) from a period: t_per <- as.period(tm_now - tm_911) round(t_per / years(1), 1) #>  19.2 round(t_per / months(1), 1) #>  230.3 # (3) from an interval: t_int <- tm_911 %--% tm_now t_int %/% years(1) #>  19 t_int %/% months(1) #>  230
We see once more, that — in R, as in life — many different ways can yield satisfactory results. Which way is best depends on many additional details, but it’s good to know what our options are.
Choosing the right time span
Given three different time spans, which one should we use? As always, this depends on the task that we want to do.
Chapter 16: Dates and times of r4ds recommends to always use the simplest type that solves our problem. When our primary concern is for amounts of time elapsed in terms of seconds, we use durations. When time spans are to be measured in common-sense units, periods typically provide the best solutions. And if we need to measure time spans that are bounded by calendar times, we use intervals, or combine several time spans.
Most everyday questions about time spans can be solved by either computing durations, periods, or intervals, or by dividing time intervals by durations or periods. Keep in mind that not all combinations of the different time spans concepts and arithmetic operations make sense. (Figure 16.1 provides an overview of the arithmetic operations that are allowed between pairs of date/time classes.)
10.3.4 Other reasons to lubridate
This section collects some additional examples of computing with dates and times with lubridate commands.
Checking date and time objects
Given a multiplicity of object types — a “Date” class and two different date-time classes (i.e., the “POSIXct” and “POSIXlt” classes) — it is easy to get confused which type of date or time we are dealing with. Fortunately, the lubridate package provides convenient test functions that verify the class of a date or time object:
# "difftime" td # (from above) #> Time difference of 7010.129 days is.difftime(td) #>  TRUE is.difftime(tday) #>  FALSE # "Date": tday #>  "2020-11-20" is.Date(tday) #>  TRUE is.POSIXct(tday) #>  FALSE # "POSIXct" calendar time: ctnow #>  "2020-11-20 16:51:55 CET" is.POSIXct(ctnow) #>  TRUE is.POSIXlt(ctnow) #>  FALSE # "POSIXlt" local time: ltnow #>  "2020-11-20 16:51:55 CET" is.POSIXlt(ltnow) #>  TRUE is.Date(ltnow) #>  FALSE
Whenever dealing with multiple date-time classes, these functions are very helpful.
Rounding dates and times
When computing with dates, times, and various time spans, our resolution of interest is rarely a specific number of seconds. As we have seen in Section 10.3.3, this issue can often be addressed by performing computations in terms of periods or intervals or by dividing intervals by durations or periods.
For date-times (i.e., objects of the “POSIXct” class), rounding often is an issue as well.
To address this concern, lubridate provides a range of conventient rounding functions that allow setting the direction and the
unit used for rounding:
(t <- ymd_hms("2020-11-16 12:01:30")) #>  "2020-11-16 12:01:30 UTC" # Rounding times (by units): round_date(t, unit = "second") # no change #>  "2020-11-16 12:01:30 UTC" round_date(t, unit = "minute") # rounded up #>  "2020-11-16 12:02:00 UTC" round_date(t, unit = "hour") # rounded down #>  "2020-11-16 12:00:00 UTC" round_date(t, unit = "day") # rounded up #>  "2020-11-17 UTC" round_date(t, unit = "week") # rounded down #>  "2020-11-15 UTC" # Force rounding up: ceiling_date(t, unit = "minute") #>  "2020-11-16 12:02:00 UTC" ceiling_date(t, unit = "hour") #>  "2020-11-16 13:00:00 UTC" ceiling_date(t, unit = "day") #>  "2020-11-17 UTC" ceiling_date(t, unit = "week") #>  "2020-11-22 UTC" # Force rounding down: floor_date(t, unit = "minute") #>  "2020-11-16 12:01:00 UTC" floor_date(t, unit = "hour") #>  "2020-11-16 12:00:00 UTC" floor_date(t, unit = "day") #>  "2020-11-16 UTC" floor_date(t, unit = "week") #>  "2020-11-15 UTC"
Time zone conversions
In Section 10.2.4, we noted the importance of time zones and mentioned that many base R functions include a
tz argument for setting them (see
Sys.timezone() for your current system setting and
OlsonNames() for available options).
When not explicitly specifying any time zone information, any date-times created in R either use our local system setting (here: “Europe/Berlin”, which may or may not include daylight saving time, DST) or default to “UTC” (Coordinated Universal Time).
For instance, when scheduling dates for the next four quarters (starting
now() in a time zone with DST), the summer dates will automatically include DST information:
The lubridate functions for parsing date-times also have a
For instance, here are three specific date-time definitions (with different time zones):
When computing their differences (as objects of the “difftime” class), we realize that
t3, actually denote the same point (instant or moment) in time:
However, when manipulating times (e.g., by creating a new vector), information regarding time zones is often lost (or unified, based on the initial time zone):
t4 also shows that
t3 all denote the same moment in time, a fact that was only obscured by diplaying this time for different locations. However, the vector only shows this time for one particular time zone (specifically, the time zone of its first element
tz(t1)). Thus, we can still wonder: How should we best express this particular time?
The need to (re-)introduce time zone information to time objects creates two distinct tasks, with corresponding solutions:
Change time zone information by keeping the actual time points the same, but changing their representation (i.e., display fixed time points for a different time zone).
Change time zone information by keeping the representation the same, but changing the actual time points (i.e., display different time points that have the same nominal appearance for a different time zone).
with_tz() function addresses the first task: It changes time zone information (and thus changes the nominal time display) without changing the underlying point in time that is being represented:
This shows that the three identical times (which were merely expressed differently by
t3) all denote noon on 2020-Dec-24 when expressed in terms of UTC (Coordinated Universal Time).
By contrast, the
force_tz() function addresses the second task: It preserves the appearance of its input times (i.e., the nominal time displayed), but changes the actual time points that are being represented:
The difference between both tasks and functions is subtle, but important:
with_tz() only changes the appearance of time points, but keeps the time points intact.
force_tz() preserves the appearance of time, but changes the time actually represented.
When converting times into different time zones, we typically only want to change the appearance of time (aka. the “sense” of time, i.e., the particular way in which fixed time points are being displayed to us), rather than the actually denoted point in time (aka. the “referent” or “meaning” of the time displayed). Thus, we typically want to use
with_tz(), rather than
force_tz(), when converting some given times into a different time zone.
Checking for leap years
A good question to ask is: Is some specific year
y a leap year?
Many people can answer this question for the current year (e.g., “Yes, the year 2020 had a February, 29.”).
But what about the year 2066?
What about the year of
dt_Titanic (i.e., 1912)?
And what about the year MCMLXXXIV?
The hard core solution to this problem consists in studying the definition of a leap year and then implementing it into a command or function. The corresponding definition Wikipedia: leap year reads:
…in the Gregorian calendar, each leap year has 366 days instead of 365,
by extending February to 29 days rather than the common 28.
These extra days occur in each year which is an integer multiple of 4
(except for years evenly divisible by 100, which are not leap years
unless evenly divisible by 400).
In R, we could implement this definition as follows:
However, since we have learned about time points and time spans, we can solve such tasks by using heuristics.
For instance, we could define the interval from January 1st of year
y to January 1st of year
y+1 and determine the amount of days (as durations or periods) that fit into this interval:
If the solution is 366, the year
y is a leap year, if it is 365 it is no leap year.
Similar solutions can be achieved by rounding dates, measuring and comparing their duration in other time units, or by trying to define the date of February, 29, of year
y and checking whether this succeeds:
All these solutions should yield the same result, as long as we can rely on R’s internal date-time definitions, any functions used in our derivation, and our ability to correctly use the corresponding commands and understand their results.
A much simpler solution is finding a function that solves the task. The lubridate actually provides a
leap_year() function that gets the job done:
The benefits of using an existing R function are two-fold:
it saves us effort and time, and
it can be used flexibly with other features of our programming language:
# Was some given Date in a leap year? leap_year(year(dt_Titanic)) #>  TRUE # Is the current year a leap year? leap_year(today()) #>  TRUE # Is FALSE (as a number) a leap year? leap_year(as.numeric(FALSE)) #>  TRUE # What about the next 10 years? leap_year(today() + years(0:10)) #>  TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
The price of using existing functions is that we need to trust that their author(s) knew what they were doing.
In the case of the lubridate package (Spinu et al., 2020), its very likely that the authors can be trusted, as the package has been well-established and has been widely used (though also has changed quite a bit over the years).
Incidentally, the definition of the
leap_year function contains a line:
which looks very much like our leap year definition and initial base R solution from above.
The lubridate package defines many other nifty functions:
pm(dt): Does a date-time object
dtoccur am or pm?
days_in_month(dt): Get the number of days in the month of
dst(dt): Get daylight saving time indicator of
format_ISO8601(dt): Format in ISO8601 character format
rollback(dt): Roll back date to last day of previous month
date_decimal(n): Converts a decimal number
nto the corresponding date
Here are some examples of their results:
(dt <- now()) #>  "2020-11-20 16:52:00 CET" # Other functions: am(dt) #>  FALSE pm(dt) #>  TRUE days_in_month(dt) #> Nov #> 30 dst(dt) #>  FALSE format_ISO8601(dt) #>  "2020-11-20T16:52:00" rollback(dt) #>  "2020-10-31 16:52:00 CET" date_decimal(2020.531) #>  "2020-07-13 08:18:14 UTC"
We conclude this section with some practice tasks that recapitulate the date and time functionality of the lubridate package.
Solve the following tasks by using lubridate functions:
- Local conventions and names of weekdays:
- Predict, evaluate, and explain the results of the following commands:
wday(sat_noon) # => 7 # Explanation: Week appears to start at Saturday (7) wday(sat_noon, label = FALSE, week_start = 7) wday(sat_noon, label = TRUE) # Changing week_start to Monday (1): wday(sat_noon, label = TRUE, abbr = FALSE, week_start = 1) # Setting locale (to get German language names): wday(sat_noon, label = TRUE, abbr = FALSE, week_start = 1, locale = Sys.setlocale("LC_TIME", "de_DE.UTF-8")) # Re-setting locale (back to en_US default): Sys.setlocale(category = "LC_ALL", locale = "")
- Full circle with date-time-dates:
We learned that the
as_datetime() functions allow us converting between times and dates.
- Predict, evaluate, and explain the results of the following commands:
time_2 is created from
date_1 (i.e., a “Date” object), it lacks the time information of
- How can we repair
- Durations vs. periods:
- Predict, evaluate, and explain the results of the following commands:
- Predict, evaluate, and explain the results of the following two commands:
d <- ymd("2020-01-20") assigns
d to a particular date (i.e., an instant in time).
To this, we add a time span (of 1 year) in two different ways:
+ years(1) adds the period of 1 year (in human units), yielding the same date a year later.
+ dyears(1) adds the duration of 1 year (as an exact number of seconds).
As 2020 is a leap year (i.e., containing a date of “2020-02-29” and a total number of 366 days) both additions yield different results.
Thus, when dealing with common-sense units of time, adding periods typically yields intuitively more plausible results.
Note also the default time zone settings to UTC (Coordinated Universal Time).
- Explain the different results of the following two commands:
sat_noon is assigned to a particular date-time point (i.e., instant or moment) in time: Sat, 2020-03-28 12:00:00 CET (+0100 from UTC).
tz specification ensures that the time zone is set to
CET (i.e., corresponds to a specific location).
A difference between adding a duration of
ddays(1) and adding a period of
days(1) implies that a time shift has occurred.
In this case, Germany introduced daylight saving time (DST) on “2020-03-29”:
At 2am, the clocks are set forwards by 1 hour. Thus, adding the duration of 1 day (as in
+ ddays(1)) yields a later time than adding the period of 1 day (as in
+ days(1)). Again, adding periods yields more predictable results.
- Durations, periods, and intervals:
- Predict, evaluate, and explain the different results of the following two commands:
- Evaluate and explain the result of the following expression in terms of their notions of time and in common-sense terms:
Answer: We can re-construct the answer in 4 steps:
day_before_yesterdaywas defined as an interval (see above).
- Shifting this interval by a period of 2 days yields the interval of today (from 00:00:00 to 24:00:00).
- Adding a period of 12 hours to the start of today marks a specific date-time point: noon today.
- Subtracting a duration of 30 seconds yields a date-time point precisely 30 seconds before noon today.
Note that this example involves four different notions of time: Date-time points (i.e., instants, moments, or “POSIXct” objects) and three different types of time spans.
- Predict, evaluate, and explain the results of the following expressions:
- Leap years in Roman numerals:
In Section 10.3.4 above, we left the leap year question regarding MCMLXXXIV unresolved:
- Was the year MCMLXXXIV (represented in Roman numerals) a leap year?
A bit of experimentation with the R utils function
as.roman() reveals that the character sequence “MCMLXXXIV” represents the calendar year 1984 in Roman numerals. As it turns out, this happens to be a leap year:
Colebourne, S., & O’Neill, B. (2010). Joda-time: Java date and time API. Release, 1(2), 4–1. Retrieved from https://www.joda.org/joda-time/
Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. Journal of Statistical Software, Articles, 40(3), 1–25. https://doi.org/10.18637/jss.v040.i03
Spinu, V., Grolemund, G., & Wickham, H. (2020). lubridate: Make dealing with dates a little easier. Retrieved from https://CRAN.R-project.org/package=lubridate
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz
The advantage of this approach is that we start with a set of date-times
dtthat we later want to re-create from its components (i.e., by using the
In Section 16.4.5 Exercises of r4ds, the first question asks: “Why is there
dmonths()?” Thus, I can only guess that the function
dmonths()was absent from earlier versions of lubridate, but then was added later. And since the value of
dyears(1)suffers from the same problem (as leap years are a day longer than non-leap years), it seems ok to provide an average for estimation purposes.↩