10.3 Dates and times with lubridate
The previous section has shown that base R provides basic support for handling dates and times, but as the corresponding classes and functions can be confusing, this remains challenging. To facilitate working with dates and times, the lubridate package (Spinu, Grolemund, & Wickham, 2021) provides a more coherent and user-friendly framework. This section illustrates key lubridate commands and concepts.
As we only need lubridate in those sessions in which we are dealing with dates and times, the package is not part of the core tidyverse. Hence, we need to load it in addition to the core tidyverse packages when we want to use it:
library(tidyverse) # loads core tidyverse packages
library(lubridate) # loads this particular package
Just as base R provided two separate functions for obtaining the current date and time — specifically, Sys.Date()
and Sys.time()
— the lubridate package provides two corresponding functions:
today() # today's date
#> [1] "2022-04-08"
now() # current date-time
#> [1] "2022-04-08 17:22:33 CEST"
To learn about the internal representation of both objects, we can inspect their class in R:
class(today()) # a "Date" object
#> [1] "Date"
class(now()) # a "POSIXct" object
#> [1] "POSIXct" "POSIXt"
We see that the lubridate package uses the two key classes discussed in Section 10.2.2:
today()
returns the current date (as a “Date” object).now()
returns the current calendar time (as a date-time, i.e., “POSIXct” object).
Both the today()
and now()
functions also accept a tzone
argument for specifying a time zone.
To see what time zones are used by default, we can apply the tz()
function to the results of both functions:
tz(today()) # uses "UTC" (Coordinated Universal Time) as default
#> [1] "UTC"
tz(now()) # uses local system setting as default
#> [1] ""
Sys.timezone() # show local system setting
#> [1] "Europe/Berlin"
Thus, both today()
and now()
are convenient short-cuts, but should be handled with care when dealing with different time zones.
10.3.1 Parsing dates and times
When learning how to read and retrieve (elements of) dates and times in base R, we spent a lot of time and effort on conversion specifications (e.g., strings like "%Y-%m-%d"
or "%H:%M:%S"
) that specified parsing and formatting instructions in the so-called POSIX standard (see Section 10.2.2). Although it is good to know POSIX, as it is widely used and powerful, it can also seem a bit cumbersome and clumsy. Thus, lubridate adopts a more intuitive approach to parsing dates and times.
To create new dates or times, lubridate provides functions that parse (i.e., read or scan) them from various other objects. Specifically, we can read dates or times
- from character strings (representing dates or times);
- from variables (denoting date or time components);
- from other types (i.e., date or time objects).
The next sections introduce the lubridate functions for each of these object types.
1. Read from character strings
In Chapter 6 on Importing data, we encountered some readr functions that parse character vectors into dates or times (see the parse_date()
, parse_datetime()
and parse_time()
functions in Section 6.2.1).
The lubridate package provides even simpler tools for reading in dates and times. The function names are combinations of the initial letter of basic date and time components:
- date components are:
y
year,m
month,d
day - time components are:
h
hour,m
minute,s
seconds
The order of these components in the function name determines how the arguments (provided as strings) are interpreted. Here are some examples:
- Dates from strings: Without any further specification, a date-denoting string like
"02 04 06"
would be highly ambiguous (see Table 10.1 of Section 10.2.2). To read this string into a date, lubridate allows us to use a combination ofd
,m
, andy
to indicate which date-related element each numeric component describes:
ymd("06 02 04")
#> [1] "2006-02-04"
ymd("20060204") # works as well
#> [1] "2006-02-04"
ymd("2006 2 4")
#> [1] "2006-02-04"
ymd("2006 : 2///04")
#> [1] "2006-02-04"
ymd(060204) # works with numbers
#> [1] "2006-02-04"
dmy("4 2 06")
#> [1] "2006-02-04"
dmy("04 02 2006")
#> [1] "2006-02-04"
mdy("02 04 06")
#> [1] "2006-02-04"
mdy(020406) # works with numbers
#> [1] "2006-02-04"
# Even "abysmal" orders are possible:
myd("06 02 04")
#> [1] "2002-06-04"
dym("06 02 04")
#> [1] "2002-04-06"
# Resulting representation:
class(ymd(Sys.Date())) # a "Date" object
#> [1] "Date"
- Dates with times (i.e., date) from strings:
ymd_hms("2020-04-01 10:30:13")
#> [1] "2020-04-01 10:30:13 UTC"
ymd_hm("2020/04/01 10.30")
#> [1] "2020-04-01 10:30:00 UTC"
# Resulting representation:
class(ymd_hms("2020-04-01 10:30:13")) # a "POSIXct" object
#> [1] "POSIXct" "POSIXt"
class(ymd_hms(Sys.time()))
#> [1] "POSIXct" "POSIXt"
- Times from strings:
hms("15:16:17")
#> [1] "15H 16M 17S"
hms("10 11 12")
#> [1] "10H 11M 12S"
hm("13 14")
#> [1] "13H 14M 0S"
# Resulting representation:
class(hms("15:16:17")) # a lubridate "period" object
#> [1] "Period"
#> attr(,"package")
#> [1] "lubridate"
Note that the particular representation of date and time objects (e.g., as the columns of a tibble) varies with the functions that created the corresponding variable (column):
<- c("10:05 29/02/2020", "20:10 24/12/2020")
dt <- tibble::tibble(datetime = dt)
tb
tb#> # A tibble: 2 × 1
#> datetime
#> <chr>
#> 1 10:05 29/02/2020
#> 2 20:10 24/12/2020
<- tb %>%
tb2 separate(datetime, into = c("time", "day"), sep = " ", remove = FALSE) %>%
separate(time, into = c("hour", "min"), sep = ":", remove = FALSE) %>%
mutate(dt_1 = paste(day, time), # a character string
dt_2 = dmy_hm(dt_1), # a <dttm> object
ti_2 = hm(time) # a <period> object
)
tb2#> # A tibble: 2 × 8
#> datetime time hour min day dt_1 dt_2 ti_2
#> <chr> <chr> <chr> <chr> <chr> <chr> <dttm> <Period>
#> 1 10:05 29/02/2020 10:05 10 05 29/02/2020 29/0… 2020-02-29 10:05:00 10H 5M 0S
#> 2 20:10 24/12/2020 20:10 20 10 24/12/2020 24/1… 2020-12-24 20:10:00 20H 10M 0S
2. Read from date and time variables
Many datasets already contain variables that denote date components (i.e., values for years, months, weeks, or days) or time components (i.e., values for hours, minutes, or seconds). Given our skills in dealing with Strings of text (from Chapter 9) we could first paste these variables into a character string and than parse this string into a date or time variable. However, the lubridate package also provides more direct functions for converting such variables into dates or calendar times:
make_date()
expects inputs toyear
,month
, andday
arguments to create an object of the “Date” class:
make_date(year = 2020, month = 7, day = 13) # as numbers
#> [1] "2020-07-13"
make_date(year = "2020", month = "7", day = "13") # as characters
#> [1] "2020-07-13"
make_date(year = 2020, month = "007", day = "013") # mixed
#> [1] "2020-07-13"
# Note defaults for missing elements:
make_date(year = 2020, month = 7) # day = 1
#> [1] "2020-07-01"
make_date(month = 2, day = 13) # year = 1970
#> [1] "1970-02-13"
make_date(year = 2020, day = 13) # month = 1
#> [1] "2020-01-13"
# Representation as "Date":
is.Date(make_date(year = 2020))
#> [1] TRUE
is.POSIXct(make_date(year = 2020))
#> [1] FALSE
Note that make_date()
accepts a variety of input types and fills in default values for missing elements.
make_datetime()
expects the same inputs asmake_date()
(i.e.,year
,month
, andday
arguments), plus additional inputs to its time-related arguments (hour
,min
,sec
, and a time zonetz
, which defaults to UTC) to create a calendar time (i.e., “POSIXct”) object:
make_datetime(year = 2020, month = 7, day = 13,
hour = 10, min = 30, sec = 45, tz = "Europe/Zurich")
#> [1] "2020-07-13 10:30:45 CEST"
## Note: Inputs need to be numeric
# make_datetime(year = "2020", month = "7", day = "13",
# hour = "10", min = "30", sec = "45")
# Note defaults for missing elements:
make_datetime(year = 2020)
#> [1] "2020-01-01 UTC"
make_datetime(sec = 33)
#> [1] "1970-01-01 00:00:33 UTC"
# Representation as "POSIXct":
is.Date(make_datetime(year = 2020))
#> [1] FALSE
is.POSIXct(make_datetime(year = 2020))
#> [1] TRUE
Note that make_datetime()
is less flexible than make_date()
in expecting that all its arguments are numeric.
The lubridate additionally includes a make_difftime()
function for creating difftime
objects in various units of time (see the section on durations below).
To demonstrate these functions for creating dates or times from variables, we need some data that contains date and time variables.
Lacking such a dataset, we can create one. The following code snippet does this by working backwards:
We first use the sample_time()
function from ds4psy (to draw random samples of calendar times within a specific range of time) and then use a dplyr pipe to extract its date- and time-related components.62
# Sample N calendar dates within the current year:
<- 25
N <- ceiling_date(now(), "year") # ymd_hms("2020-12-31 23:59:59")
t_end <- t_end - years(1)
t_start set.seed(246) # for reproducible randomness
<- sort(ds4psy::sample_time(from = t_start, to = t_end, size = N))
dt_org
# Tibble with date and time components (as columns):
<- tibble::tibble(dt_org) %>%
dt_tb mutate(yr = year(dt_org),
mt = month(dt_org),
dy = day(dt_org),
hr = hour(dt_org),
mi = minute(dt_org),
sc = second(dt_org)
%>%
) select(-dt_org)
Just copy and run this code chunk and note that the resulting tibble dt_tb
contains numeric columns that contain date and time components:
# Data:
::kable(head(dt_tb), caption = "Data containing typical date and time variables.") knitr
yr | mt | dy | hr | mi | sc |
---|---|---|---|---|---|
2022 | 1 | 1 | 5 | 22 | 30 |
2022 | 2 | 10 | 3 | 15 | 39 |
2022 | 3 | 8 | 8 | 17 | 17 |
2022 | 4 | 1 | 19 | 5 | 35 |
2022 | 6 | 5 | 6 | 35 | 19 |
2022 | 6 | 5 | 22 | 25 | 1 |
Given this data, we can use make_date()
for creating dates, or make_datetime()
for creating calendar times:
<- dt_tb %>%
dt_tb mutate(date = make_date(year = yr, month = mt, day = dy),
datetime = make_datetime(year = yr, month = mt, day = dy,
hour = hr, min = mi, sec = sc)
)head(dt_tb)
#> # A tibble: 6 × 8
#> yr mt dy hr mi sc date datetime
#> <dbl> <dbl> <int> <int> <int> <dbl> <date> <dttm>
#> 1 2022 1 1 5 22 30 2022-01-01 2022-01-01 05:22:30
#> 2 2022 2 10 3 15 39 2022-02-10 2022-02-10 03:15:39
#> 3 2022 3 8 8 17 17 2022-03-08 2022-03-08 08:17:17
#> 4 2022 4 1 19 5 35 2022-04-01 2022-04-01 19:05:35
#> 5 2022 6 5 6 35 19 2022-06-05 2022-06-05 06:35:19
#> 6 2022 6 5 22 25 1 2022-06-05 2022-06-05 22:25:01
To evaluate our success, we can verify whether the datetime
column in dt_tb
managed to re-construct the original vector dt_org
created above:
# Verify equality:
all.equal(dt_org, dt_tb$datetime)
#> [1] "Mean absolute difference: 5616"
Ooops — this is awkward: Why do we not get out the original times dt_org
that we fed into the table?
A first hypothesis could be that calendar times (of the “POSIXct” class) are numeric objects and hence some differences may be due to rounding. We can check this by using the round_date()
function to round both our original vector (i.e., dt_org
) and the newly created one (dt_tb$datetime
) to the same units (e.g., “sec”):
# Round to nearest second:
all.equal(round_date(dt_org, unit = "sec"), round_date(dt_tb$datetime, unit = "sec"))
#> [1] "Mean absolute difference: 5616"
So rounding did not diminish the difference, and the discrepancies observed here are far too large to be due to rounding differences.
A clue to solving this puzzle is provided by computing the time difference between our newly created times dt_tb$datetime
and
the original times dt_org
:
# Inspect differences:
head(unclass(dt_tb$datetime)) - head(unclass(as.POSIXct(dt_org)))
#> [1] 3600 3600 3600 7200 7200 7200
This shows that our new calendar times dt_tb$datetime
are either 1 or 2 hours (i.e., 3600 seconds or 7200 seconds) later than our original times dt_org
. To detect the source of this difference, let’s look more closely at both vectors again:
# Inspect details:
head(dt_org)
#> [1] "2022-01-01 05:22:30 CET" "2022-02-10 03:15:39 CET"
#> [3] "2022-03-08 08:17:17 CET" "2022-04-01 19:05:35 CEST"
#> [5] "2022-06-05 06:35:19 CEST" "2022-06-05 22:25:01 CEST"
head(dt_tb$datetime)
#> [1] "2022-01-01 05:22:30 UTC" "2022-02-10 03:15:39 UTC"
#> [3] "2022-03-08 08:17:17 UTC" "2022-04-01 19:05:35 UTC"
#> [5] "2022-06-05 06:35:19 UTC" "2022-06-05 22:25:01 UTC"
We can see that both vectors show the same dates and times, but for different time zones.
Specifically, make_datetime()
used “UTC” by default, whereas our original vector dt_org
automatically used the current setting of our local system (here: “Europe/Berlin”):
Sys.timezone() # current system time zone
#> [1] "Europe/Berlin"
Thus, to prevent such problems, we need to be explicit about the appropriate time zone when calling the make_datetime()
function:
<- dt_tb %>%
dt_tb mutate(date = make_date(year = yr, month = mt, day = dy),
datetime = make_datetime(year = yr, month = mt, day = dy,
hour = hr, min = mi, sec = sc,
tz = "") # tz = "" uses Sys.timezone()
)head(dt_tb)
#> # A tibble: 6 × 8
#> yr mt dy hr mi sc date datetime
#> <dbl> <dbl> <int> <int> <int> <dbl> <date> <dttm>
#> 1 2022 1 1 5 22 30 2022-01-01 2022-01-01 05:22:30
#> 2 2022 2 10 3 15 39 2022-02-10 2022-02-10 03:15:39
#> 3 2022 3 8 8 17 17 2022-03-08 2022-03-08 08:17:17
#> 4 2022 4 1 19 5 35 2022-04-01 2022-04-01 19:05:35
#> 5 2022 6 5 6 35 19 2022-06-05 2022-06-05 06:35:19
#> 6 2022 6 5 22 25 1 2022-06-05 2022-06-05 22:25:01
Note that setting tz = ""
is a shortcut for using our local system time zone in make_datetime()
, rather than its “UTC” default. We could have been even more explicit by stating tz = Sys.timezone()
or tz = "Europe/Berlin"
.
As both vectors now used the same time zone (i.e., Europe/Berlin), they should contain the same date-time points:
all.equal(dt_org, dt_tb$datetime)
#> [1] TRUE
This is reassuring — and good that we compared our result to the original date-time vector. The important lesson to learn here is:
- Always watch out for time zones when working with times.
We will reconsider this issue in Exercise 2 (see Section 10.6.2).
3. Read from dates or times
Given that R distinguishes between dates (e.g., of class “Date,” see Section 10.2.3) and dates with times (e.g., calendar times of class “POSIXct,” see Section 10.2.4), it is often necessary to switch between these formats. Let’s first re-create a date and a date-time object to work with:
<- today() # a date
tday is.Date(tday)
#> [1] TRUE
<- now() # a date-time/calendar time
tnow is.POSIXct(tnow)
#> [1] TRUE
The lubridate functions as_date()
and as_datetime()
facilitate such conversions:
as_date()
converts date-times (i.e., calendar times) into dates.
as_datetime()
converts dates (of class “Date”) into date-times (if possible).
Converting date-times into dates is straightforward, as it merely drops the time-related information:
# 1. Get dates from times:
<- as_date(tnow)
tday_2
tday_2#> [1] "2022-04-08"
is.Date(tday_2)
#> [1] TRUE
Converting dates into date-times is trickier, as dates lack information about times:
# 2. Get times from dates:
<- as_datetime(tday)
tnow_2
tnow_2#> [1] "2022-04-08 UTC"
is.POSIXct(tnow_2)
#> [1] TRUE
<- as_datetime(dmy("29-02-2020"))
feb_29
feb_29#> [1] "2020-02-29 UTC"
is.POSIXct(feb_29)
#> [1] TRUE
We see that tnow_2
and feb_29
are calendar dates (i.e., date with times of type “POSIXct”), but still seem to lack time information.
Nevertheless, the time information is there, but the default time of the date was set to “00:00:00 UTC.”
This becomes apparent when explicating the time object by supplying a more detailed format
argument:
# explicate_time <- "%a, %Y-%m-%d %H:%M:%S %Z (%z from UTC)" # (defined above)
format(tnow_2, format = explicate_time)
#> [1] "Fri, 2022-04-08 00:00:00 UTC (+0000 from UTC)"
format(feb_29, format = explicate_time)
#> [1] "Sat, 2020-02-29 00:00:00 UTC (+0000 from UTC)"
Again, we see: Always watch out for time zones when working with times.
When the as_date()
and as_datetime()
functions receive numeric inputs, they add a corresponding number of increments to the Unix epoch at “1970-01-01 00:00:00 UTC” (see Wikipedia: Unix_time for details).
Note that date increments are interpreted as days, whereas time increments are interpreted as seconds:
# Dates from numeric inputs:
as_date(0) # Unix epoch
#> [1] "1970-01-01"
as_date(1) # increment: +1 day
#> [1] "1970-01-02"
as_date(365) # +1 year
#> [1] "1971-01-01"
# Times from numeric inputs:
as_datetime(0) # Unix epoch
#> [1] "1970-01-01 UTC"
as_datetime(1) # increment: +1 second
#> [1] "1970-01-01 00:00:01 UTC"
as_datetime(5 * 60 * 60) # +5 hours
#> [1] "1970-01-01 05:00:00 UTC"
as_datetime(24 * 60 * 60) # +1 day
#> [1] "1970-01-02 UTC"
as_datetime(365 * 24 * 60 * 60) # +1 year
#> [1] "1971-01-01 UTC"
10.3.2 Get and set date and time components
Having succeeded in creating date-time objects (from strings, other variables, or a date), we can ask additional questions:
- How can we get or set individiual date and time components (of date-time objects)?
To illustrate this, we can use our tnow
scalar, which is an object of the “POSIXct” class:
tnow#> [1] "2022-04-08 17:22:34 CEST"
is.POSIXct(tnow)
#> [1] TRUE
Actually, we have already encountered a pretty nifty way of retrieving individual date and time components (see the conversion functions of the POSIX standard in Section 10.2.4 above, or evaluate ?strptime
):
# Using format (see POSIX above):
format(tnow, "%A")
#> [1] "Friday"
format(tnow, "%F")
#> [1] "2022-04-08"
format(tnow, "%T %Z")
#> [1] "17:22:34 CEST"
# explicate_time <- "%a, %Y-%m-%d %H:%M:%S %Z (%z from UTC)" # (defined above)
format(tnow, format = explicate_time)
#> [1] "Fri, 2022-04-08 17:22:34 CEST (+0200 from UTC)"
However, using the format()
function with a format
argument according to the POSIX standard is pretty geeky.
If lubridate lives up to its name, its functions should flow a bit more fluently.
Getting date components
In fact, the following lubridate functions are pretty self-explanatory in retrieving individual date and time components from a date-time object (of the “POSIXct” or “POSIXlt” class):
# Get date components (from POSIX times):
year(tnow) # full year (numeric)
#> [1] 2022
month(tnow) # month in year (numeric)
#> [1] 4
mday(tnow) # day in month (numeric)
#> [1] 8
# Noteworthy date components:
yday(tnow) # day of year (numeric). See format(tnow, "%j")
#> [1] 98
wday(tnow) # day of week (numeric, week_start = 7 by default), see format(tnow, "%w")
#> [1] 6
For date components with names, using label
and abbr
arguments yields character strings as outputs:
# Get names instead of numbers:
month(tnow, label = TRUE, abbr = TRUE) # month in year (name)
#> [1] Apr
#> 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
wday(tnow, label = TRUE, abbr = FALSE) # day of week (name)
#> [1] Friday
#> 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
It is noteworthy and re-assuring that these date-related functions also work for dates (of class “Date”):
# Get date components (from Date objects):
tday#> [1] "2022-04-08"
is.Date(tday)
#> [1] TRUE
year(tday)
#> [1] 2022
month(tday, label = FALSE)
#> [1] 4
mday(tday)
#> [1] 8
yday(tnow)
#> [1] 98
wday(tday, label = TRUE)
#> [1] Fri
#> Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
Getting time components
Getting time-related components from date-time objects (i.e., “POSIXct” times) is just as easy:
# Get time components (from "POSIXt" objects):
# a POSIXct calendar time
tnow #> [1] "2022-04-08 17:22:34 CEST"
hour(tnow)
#> [1] 17
minute(tnow)
#> [1] 22
second(tnow)
#> [1] 34.8103
tz(tnow) # time zone
#> [1] ""
Note the following defaults when asking for time components of dates:
# Get time components (from "Date" objects):
# a Date
tday #> [1] "2022-04-08"
hour(tday)
#> [1] 0
minute(tday)
#> [1] 0
second(tday)
#> [1] 0
tz(tday) # time zone
#> [1] "UTC"
Setting date and time components
Interestingly, the same lubridate functions that get date and time components can also be used to set those components:
# Copy:
<- tnow
new_tnow # check
new_tnow #> [1] "2022-04-08 17:22:34 CEST"
# SET date components:
year(new_tnow) <- 2022
month(new_tnow) <- 12
day(new_tnow) <- 31
# check
new_tnow #> [1] "2022-12-31 17:22:34 CET"
# SET time components:
hour(new_tnow) <- 23
minute(new_tnow) <- 45
second(new_tnow) <- 59
tz(new_tnow) <- "Australia/Sydney"
# check
new_tnow #> [1] "2022-12-31 23:45:59 AEDT"
What if we re-set a date component that depends on the date?
new_tnow#> [1] "2022-12-31 23:45:59 AEDT"
format(new_tnow, explicate_time)
#> [1] "Sat, 2022-12-31 23:45:59 AEDT (+1100 from UTC)"
# day of week:
wday(new_tnow)
#> [1] 7
wday(new_tnow) <- 4 # CHANGE the weekday (4 is "Wed"")
# Note that date has shifted to accommodate wday
new_tnow #> [1] "2022-12-28 23:45:59 AEDT"
wday(new_tnow, label = TRUE)
#> [1] Wed
#> Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# day in year:
yday(new_tnow)
#> [1] 362
yday(new_tnow) <- 366 # CHANGE the yday (note value)
# Note that date has shifted (to the next year!) to accommodate yday
new_tnow #> [1] "2023-01-01 23:45:59 AEDT"
yday(new_tnow)
#> [1] 1
Thus, we can use the same functions that get information from dates and date-times to set its elements. However, beware that setting date-time components can have unintended consequences. As date-time components are not independent of each other, setting some components typically affects other components.
10.3.3 Working with time spans
In Section 10.2.1, we distinguished between time points (often called instants or moments) and time spans (aka. durations, intervals, or time periods). However, different usages of these time span terms actually imply different concepts. To enable accurate computations, we need to distinguish more carefully between the different types of time spans.
In fact, lubridate implements its own ontology of time spans.
Beyond time points (i.e., a particular instant or moment in time), the package distinguishes between 3 types of time spans:
- durations are time spans in exact numbers of seconds
- periods are time spans in human units (e.g., days, months, years)
- intervals are time spans with a given start and end point in time
These different time span concepts were inspired by the Joda Time project (Colebourne & O’Neill, 2010) (see the original article by Grolemund & Wickham (2011) for background information).
Essentially, both durations and intervals express physical time spans (a specific number of seconds), whereas periods express time spans in human units that may vary based on context (e.g., not every day has the same number of hours and not every month or year have the same number of days). Intervals are durations that are anchored in calendar time (i.e., intervals have start and end points that are real date-times).
We will consider each type of time span to see how they are created and find out what we can do with them. To motivate our explorations, consider the following example:
On Tuesday, September 11, 2001, the terrorist group al-Qaeda attacked several targets in the United States in a coordinated fashion. At 08:46 a.m., five hijackers crashed an American Airlines plane into the northern facade of the World Trade Center in New York City. Many remember the vivid images of this particular event (a so-called dread risk event), even though it happened many years ago. This raises the question:
- How long ago did the 9/11-attacks take place?
Please take a moment (or rather: some time span) to think about potential answers to this question: What would you accept as an informative answer? How does this answer depend on when or where the question is being asked? What kind of accuracy would you expect? And which temporal unit(s) would an answer be expressed in?
1. Durations
As a first approach for answering the question “How long ago did the 9/11-attacks take place?” we can enter the particular time point of this event and subtract it from now()
to compute a time difference object in R:
# Define 2 date-times (calendar times):
<- ymd_hm("2001 September 11, 8:46", tz = "US/Eastern")
tm_911 <- now() # re-set current time
tm_now
# Time difference:
<- tm_now - tm_911
td
td#> Time difference of 7514.109 days
The time difference td
represents a duration as an R object of the “difftime” class, which is automatically displayed as a count of “days.” The corresponding difftime()
function (see Section 10.2.4) offers a range of units
varying from “secs” to “weeks,” which are all rather limited in this case:
# Representation:
class(td)
#> [1] "difftime"
is.difftime(td)
#> [1] TRUE
is.numeric(td)
#> [1] FALSE
# difftime units:
difftime(tm_now, tm_911, units = "weeks")
#> Time difference of 1073.444 weeks
difftime(tm_now, tm_911, units = "auto")
#> Time difference of 7514.109 days
difftime(tm_now, tm_911, units = "secs")
#> Time difference of 649218996 secs
For time spans exceeding a few months, the duration class provided by lubridate is a better alternative: The lubridate notion of duration measures time spans as the number of elapsed seconds.
# Define duration d:
<- as.duration(tm_now - tm_911)
d
d#> [1] "649218995.759913s (~20.57 years)"
Durations are internally defined as a special class of object and record time spans in numeric form (as numbers of seconds):
# Representation:
class(d)
#> [1] "Duration"
#> attr(,"package")
#> [1] "lubridate"
is.duration(d)
#> [1] TRUE
is.numeric(d)
#> [1] TRUE
unclass(d)
#> [1] 649218996
# Note:
unclass(d) == as.numeric(difftime(tm_now, tm_911, units = "secs"))
#> [1] TRUE
There are several constructor functions (all starting with d
) that facilitate defining durations:
# Creating durations:
dseconds(10)
#> [1] "10s"
dminutes(30)
#> [1] "1800s (~30 minutes)"
dhours(1)
#> [1] "3600s (~1 hours)"
ddays(2)
#> [1] "172800s (~2 days)"
dweeks(3)
#> [1] "1814400s (~3 weeks)"
dmonths(1) # ?: Does this make sense?
#> [1] "2629800s (~4.35 weeks)"
dyears(2)
#> [1] "63115200s (~2 years)"
Note that all these definitions internally create “Duration” objects that denote numbers of elapsed seconds, but are printed in a more human-readable fashion.
The dmonths()
function is flagged (with ?
) as it’s underlying notion is a bit tricky. We just learned that durations are defined as time spans measuring an exact number of seconds — but how many seconds are there in a month? The answer clearly depends on the month in question (e.g., July is longer than June, and both are longer than February) and can only be determined when the particular month is not known.63 Thus, dmonths(1)
(evaluating to 2629800) can only be an estimate and should be handled with care in practical applications.
As they are numbers, durations can be used in arithmetic expressions:
# Math with durations:
dmonths(2) + dweeks(3) - ddays(4)
#> [1] "6728400s (~11.12 weeks)"
dyears(1) + dhours(6))/dmonths(6)
(#> [1] 2.001369
dyears(2) == 4 * dmonths(6)
#> [1] TRUE
dyears(2) == 2 * 365.25 * ddays(1)
#> [1] TRUE
However, we need to keep in mind that they represent abstract time spans (in numbers of seconds). Thus, adding durations to date-time objects (i.e., calendar times of the “POSIXct” class) can yield unexpected results:
# Apparent oddities with durations:
<- ymd_hms("2020-03-29 00:00:00", tz = "Europe/Berlin")
t1 + dhours(10) # add 10 hours (as duration)
t1 #> [1] "2020-03-29 11:00:00 CEST"
+ ddays(1) # add 1 day (as duration)
t1 #> [1] "2020-03-30 01:00:00 CEST"
<- ymd("2020-02-28")
t2 + dyears(1) # add a year (as duration)
t2 #> [1] "2021-02-27 06:00:00 UTC"
We see that adding a duration of 10 hours or 1 day to t1
seemingly created a difference of 11 or 25 hours (in calendar time, but note the switch of time zone, due to daylight saving time, DST. Similarly, adding a year’s worth of seconds to t2
moved the date back by a day (due to 2020 being a leap year). Both results are correct, of course, if we really meant to add time spans as a specific number of seconds (i.e., durations).
But as we often mean something else when thinking “ten hours later,” “tomorrow,” or “next year,” durations are rather limited when calculating time spans in human units. But that’s ok — for that’s exactly what periods are for.
2. Periods
When asking our original question:
- How long ago did the 911-attacks take place?
receiving the number 7514.109) as its answer would be precise, but probably not satisfy us. This is because we typically do not think about longer periods of time in terms of an exact number of seconds. Instead, we tend to provide counts of various units of time so that their sum fills out the period of time we are dealing with.
In lubridate, periods are time spans that are expressed in human common-sense units of time (e.g., hours, days, months, years). Importantly, a period varies in its length (when expressed as durations, i.e., number of seconds, except periods defined in seconds) based on its context. For instance, the leap year 2020 is 366 days long (as it contains a February 29, 2020), whereas the year 2021 is only 365 days long. Flexible periods turn into fixed time spans (of various lengths) when added to a specific time point (date-time or calendar time).
# Define a period p:
<- as.period(tm_now - tm_911)
p
p#> [1] "7514d 2H 36M 35.7599129676819S"
# Representation:
class(p)
#> [1] "Period"
#> attr(,"package")
#> [1] "lubridate"
is.period(p)
#> [1] TRUE
is.numeric(p)
#> [1] TRUE
As a consequence of their nature, periods are suited to set and track the change in the “clock time” between two events (date-times).
Periods are expressed and measured in common time units (ranging from seconds to years) and provide each unit as integer values (though seconds can be non-integers). Periods are created by simple constructor functions (that are all plural versions of the desired time unit):
# Creating periods:
# from time differences:
as.period(tm_now - tm_911)
#> [1] "7514d 2H 36M 35.7599129676819S"
# in units of integers:
seconds(7)
#> [1] "7S"
minutes(5)
#> [1] "5M 0S"
hours(2)
#> [1] "2H 0M 0S"
days(5)
#> [1] "5d 0H 0M 0S"
weeks(2)
#> [1] "14d 0H 0M 0S"
months(6)
#> [1] "6m 0d 0H 0M 0S"
years(10)
#> [1] "10y 0m 0d 0H 0M 0S"
As they objects of class “Period” are numbers, they can be used in computing arithmetic expressions:
# Math with periods:
now() + years(1) + months(1) + days(1) + hours(12)
#> [1] "2023-05-10 05:22:36 CEST"
hours(1) == 60 * seconds(60)
#> [1] TRUE
hours(3) == 3 * minutes(60)
#> [1] TRUE
days(1) == hours(24)
#> [1] TRUE
weeks(2) == days(14)
#> [1] TRUE
When computing with periods, each unit is applied separately. The distribution of periods among units is non-trivial (e.g., the duration of some days, months, or years are longer than others), but this complexity is hidden from us. In fact, as we tend to represent dates and times in terms of periods (at least as long as we think of calendar time), using periods in calculations typically yields more intuitive results than adding durations (see above):
# Intuitive results with periods:
<- ymd_hms("2020-03-29 00:00:00", tz = "Europe/Berlin")
t1 + hours(10) # add 10 hours (as period)
t1 #> [1] "2020-03-29 10:00:00 CEST"
+ years(1) # add 1 year (as period)
t1 #> [1] "2021-03-29 CEST"
<- ymd("2020-02-28")
t2 + years(1) # add a year (as period)
t2 #> [1] "2021-02-28"
Thus, when reckoning with times and dates in various human-based units (like days, weeks, or months), periods are most likely the type of time span that we want to use.
3. Intervals
In lubridate, intervals are time spans that are bound by two time points that are real date-times (or calendar times). Thus, intervals are durations anchored in date-times (or calendar times) and provide a bridge between durations (i.e., number of seconds) and periods (i.e., common-sense time units) when at least one point in calendar time is known.
A first way of defining an interval requires a time span x
(which can be a time difference, duration, or period) and a start
date (typically a date-time or “POSIXct” object):
# Define an interval i1:
<- (tm_now - tm_911) # from difftime:
tdiff <- as.interval(x = tdiff, start = tm_911)
i1
i1#> [1] 2001-09-11 08:46:00 EDT--2022-04-08 11:22:35 EDT
<- as.duration(tm_now - tm_911) # from duration
tdur <- as.interval(x = tdur, start = tm_911)
i2 == i1
i2 #> [1] TRUE
<- as.period(tm_now - tm_911) # from period
tper <- as.interval(x = tper, start = tm_911)
i3 == i1
i3 #> [1] TRUE
An alternative way of defining an interval uses its start
and end
points (as date-time objects) and places a special operator %--%
between them (using “infix” notation):
# Define interval by start %--% end:
<- tm_911 %--% tm_now
i4 == i1
i4 #> [1] TRUE
Internally, lubridate represents intervals as objects of class “Interval,” which is numeric in nature:
# Representation:
class(i1)
#> [1] "Interval"
#> attr(,"package")
#> [1] "lubridate"
is.interval(i1)
#> [1] TRUE
is.numeric(i1)
#> [1] TRUE
Since an interval is anchored firmly in calendar time, both the exact number of seconds that passed (i.e., a duration) and the number of variable length time units that occurred during the interval (i.e., a period) can be calculated from a given interval.
For accurately converting intervals into durations or periods, we can use the as.duration()
and as.period()
functions:
as.duration(i1)
#> [1] "649218995.759913s (~20.57 years)"
as.period(i1)
#> [1] "20y 6m 28d 2H 36M 35.7599129676819S"
Multiple transformations between durations, periods, and intervals yield the expected results (except for rounding differences):
as.duration(as.period(i1))
#> [1] "649359395.759913s (~20.58 years)"
as.period(as.duration(i1))
#> [1] "20y 0m 209d 2H 36M 35.7599129676819S"
but returning from durations or periods to an interval requires specifying a start date (as an anchor):
# interval requires a start date:
as.interval(as.duration(as.period(i1)), start = tm_911)
#> [1] 2001-09-11 08:46:00 EDT--2022-04-10 02:22:35 EDT
as.interval(as.period(as.duration(i1)), start = tm_911)
#> [1] 2001-09-11 08:46:00 EDT--2022-04-08 11:22:35 EDT
Intervals can be thought of as lines with given start and end points on a linear axis of time. Thus, we can ask and answer a range of interesting questions when dealing with one or more intervals:
# Define intervals:
# (a) based on durations:
<- tm_now %--% (tm_now + dmonths(6))
next_6m_dur <- (tm_now - dmonths(6)) %--% tm_now
last_6m_dur # (b) based on periods:
<- tm_now %--% (tm_now + months(6))
next_6m_per <- (today() - days(2)) %--% (today() - days(1))
day_before_yesterday
# Note:
> next_6m_dur
next_6m_per #> [1] TRUE
# Querying intervals: ----
# Start and end points:
int_start(next_6m_per)
#> [1] "2022-04-08 17:22:35 CEST"
int_end(day_before_yesterday)
#> [1] "2022-04-07 UTC"
# Length (in seconds):
int_length(next_6m_dur)
#> [1] 15778800
int_length(next_6m_per)
#> [1] 15811200
# Flip or shift interval:
int_flip(day_before_yesterday)
#> [1] 2022-04-07 UTC--2022-04-06 UTC
int_shift(day_before_yesterday, by = days(3))
#> [1] 2022-04-09 UTC--2022-04-10 UTC
# Overlaps:
int_overlaps(next_6m_dur, next_6m_per)
#> [1] TRUE
int_overlaps(next_6m_dur, day_before_yesterday)
#> [1] FALSE
# Same start or end point:
int_aligns(next_6m_dur, next_6m_per)
#> [1] TRUE
int_aligns(next_6m_dur, last_6m_dur)
#> [1] FALSE
The infix operator x %within% y
allows checking whether an interval or date-time x
lies within an interval or list of intervals y
:
# within:
%within% next_6m_dur
tm_now #> [1] TRUE
%within% last_6m_dur
day_before_yesterday #> [1] TRUE
%within% next_6m_dur
day_before_yesterday #> [1] FALSE
The int_diff()
function is similar to the base R function diff()
, but returns the intervals that occur between the elements of a vector of date-times:
# A vector of date-time points:
<- now() + months(1:4)
tv
tv#> [1] "2022-05-08 17:22:37 CEST" "2022-06-08 17:22:37 CEST"
#> [3] "2022-07-08 17:22:37 CEST" "2022-08-08 17:22:37 CEST"
# Intervals between date-times:
int_diff(tv)
#> [1] 2022-05-08 17:22:37 CEST--2022-06-08 17:22:37 CEST
#> [2] 2022-06-08 17:22:37 CEST--2022-07-08 17:22:37 CEST
#> [3] 2022-07-08 17:22:37 CEST--2022-08-08 17:22:37 CEST
as.duration(int_diff(tv))
#> [1] "2678400s (~4.43 weeks)" "2592000s (~4.29 weeks)" "2678400s (~4.43 weeks)"
as.period(int_diff(tv))
#> [1] "1m 0d 0H 0M 0S" "1m 0d 0H 0M 0S" "1m 0d 0H 0M 0S"
Dividing time spans
When asking “How long…” or “How old…” questions, we often are looking for answers that express a time span in terms of another one.
For instance, we can determine how many durations or periods fall into a given interval of time by dividing intervals by other time spans. This is straightforward for durations:
# an interval
next_6m_dur #> [1] 2022-04-08 17:22:35 CEST--2022-10-08 08:22:35 CEST
is.interval(next_6m_dur)
#> [1] TRUE
# Dividing intervals by durations:
/ dyears(1)
next_6m_dur #> [1] 0.5
/ dmonths(1)
next_6m_dur #> [1] 6
/ dweeks(1)
next_6m_dur #> [1] 26.08929
/ ddays(1)
next_6m_dur #> [1] 182.625
— yet may yield unexpected results — and also works for periods:
# an interval
next_6m_per #> [1] 2022-04-08 17:22:35 CEST--2022-10-08 17:22:35 CEST
is.interval(next_6m_per)
#> [1] TRUE
# Dividing intervals by periods:
/ years(1)
next_6m_per #> [1] 0.5013699
/ months(1)
next_6m_per #> [1] 6
/ weeks(1)
next_6m_per #> [1] 26.14286
/ days(1)
next_6m_per #> [1] 183
However, we cannot divide time differences or durations by periods, or periods by durations:
<- (tm_now - tm_911) # difftime (from above)
td
# would NOT work:
/ years(1) # cannot divide difftime / Period
td as.duration(td) / years(1) # cannot divide Duration / Period
as.period(td) / dyears(1) # cannot divide Period / Duration
In practical contexts, we often do not care about exact durations, but are primarily interested in the number of completed time periods. These can be computed by dividing time intervals by periods (by using integer division):
# an interval
next_6m_per #> [1] 2022-04-08 17:22:35 CEST--2022-10-08 17:22:35 CEST
is.interval(next_6m_per)
#> [1] TRUE
# Dividing intervals by periods:
%/% years(1)
next_6m_per #> [1] 0
%/% months(1)
next_6m_per #> [1] 6
%/% weeks(1)
next_6m_per #> [1] 26
%/% days(1)
next_6m_per #> [1] 183
Having learned about three different types of time spans and their combinations, we finally are in a position to answer our original question:
- How long ago did the 911-attacks take place?
In most applied contexts, the following estimates — based on a duration, a period, or an interval — would count as informative answers:
# (1) from a duration:
<- as.duration(tm_now - tm_911)
t_dur round(t_dur / dyears(1), 1)
#> [1] 20.6
round(t_dur / dmonths(1), 1)
#> [1] 246.9
# (2) from a period:
<- as.period(tm_now - tm_911)
t_per round(t_per / years(1), 1)
#> [1] 20.6
round(t_per / months(1), 1)
#> [1] 246.9
# (3) from an interval:
<- tm_911 %--% tm_now
t_int %/% years(1)
t_int #> [1] 20
%/% months(1)
t_int #> [1] 246
We see once more, that — in R, as in life — many different ways can yield satisfactory results. Which way is best depends on many additional details, but it’s good to know what our options are.
Choosing the right time span
Given three different time spans, which one should we use? As always, this depends on the task that we want to do.
Chapter 16: Dates and times of r4ds recommends to always use the simplest type that solves our problem. When our primary concern is for amounts of time elapsed in terms of seconds, we use durations. When time spans are to be measured in common-sense units, periods typically provide the best solutions. And if we need to measure time spans that are bounded by calendar times, we use intervals, or combine several time spans.
Most everyday questions about time spans can be solved by either computing durations, periods, or intervals, or by dividing time intervals by durations or periods. Keep in mind that not all combinations of the different time spans concepts and arithmetic operations make sense. (Figure 16.1 provides an overview of the arithmetic operations that are allowed between pairs of date/time classes.)
10.3.4 Other reasons to lubridate
This section collects some additional examples of computing with dates and times with lubridate commands.
Checking date and time objects
Given a multiplicity of object types — a “Date” class and two different date-time classes (i.e., the “POSIXct” and “POSIXlt” classes) — it is easy to get confused which type of date or time we are dealing with. Fortunately, the lubridate package provides convenient test functions that verify the class of a date or time object:
# "difftime"
# (from above)
td #> Time difference of 7514.109 days
is.difftime(td)
#> [1] TRUE
is.difftime(tday)
#> [1] FALSE
# "Date":
tday#> [1] "2022-04-08"
is.Date(tday)
#> [1] TRUE
is.POSIXct(tday)
#> [1] FALSE
# "POSIXct" calendar time:
ctnow#> [1] "2022-04-08 17:22:30 CEST"
is.POSIXct(ctnow)
#> [1] TRUE
is.POSIXlt(ctnow)
#> [1] FALSE
# "POSIXlt" local time:
ltnow#> [1] "2022-04-08 17:22:30 CEST"
is.POSIXlt(ltnow)
#> [1] TRUE
is.Date(ltnow)
#> [1] FALSE
Whenever dealing with multiple date-time classes, these functions are very helpful.
Rounding dates and times
When computing with dates, times, and various time spans, our resolution of interest is rarely a specific number of seconds. As we have seen in Section 10.3.3, this issue can often be addressed by performing computations in terms of periods or intervals or by dividing intervals by durations or periods.
For date-times (i.e., objects of the “POSIXct” class), rounding often is an issue as well.
To address this concern, lubridate provides a range of conventient rounding functions that allow setting the direction and the unit
used for rounding:
<- ymd_hms("2020-11-16 12:01:30"))
(t #> [1] "2020-11-16 12:01:30 UTC"
# Rounding times (by units):
round_date(t, unit = "second") # no change
#> [1] "2020-11-16 12:01:30 UTC"
round_date(t, unit = "minute") # rounded up
#> [1] "2020-11-16 12:02:00 UTC"
round_date(t, unit = "hour") # rounded down
#> [1] "2020-11-16 12:00:00 UTC"
round_date(t, unit = "day") # rounded up
#> [1] "2020-11-17 UTC"
round_date(t, unit = "week") # rounded down
#> [1] "2020-11-15 UTC"
# Force rounding up:
ceiling_date(t, unit = "minute")
#> [1] "2020-11-16 12:02:00 UTC"
ceiling_date(t, unit = "hour")
#> [1] "2020-11-16 13:00:00 UTC"
ceiling_date(t, unit = "day")
#> [1] "2020-11-17 UTC"
ceiling_date(t, unit = "week")
#> [1] "2020-11-22 UTC"
# Force rounding down:
floor_date(t, unit = "minute")
#> [1] "2020-11-16 12:01:00 UTC"
floor_date(t, unit = "hour")
#> [1] "2020-11-16 12:00:00 UTC"
floor_date(t, unit = "day")
#> [1] "2020-11-16 UTC"
floor_date(t, unit = "week")
#> [1] "2020-11-15 UTC"
Time zone conversions
In Section 10.2.4, we noted the importance of time zones and mentioned that many base R functions include a tz
argument for setting them (see Sys.timezone()
for your current system setting and OlsonNames()
for available options).
Sys.timezone() # current system setting
OlsonNames() # time zone options
When not explicitly specifying any time zone information, any date-times created in R either use our local system setting (here: “Europe/Berlin,” which may or may not include daylight saving time, DST) or default to “UTC” (Coordinated Universal Time).
For instance, when scheduling dates for the next four quarters (starting now()
in a time zone with DST), the summer dates will automatically include DST information:
<- now(tzone = "Europe/Berlin") + months(seq(0, 12, by = 4)))
(quarters #> [1] "2022-04-08 17:22:37 CEST" "2022-08-08 17:22:37 CEST"
#> [3] "2022-12-08 17:22:37 CET" "2023-04-08 17:22:37 CEST"
The lubridate functions for parsing date-times also have a tz
argument.
For instance, here are three specific date-time definitions (with different time zones):
<- ymd_hms("2020-12-24 23:00:00", tz = "Australia/Melbourne"))
(t1 #> [1] "2020-12-24 23:00:00 AEDT"
<- ymd_hms("2020-12-24 13:00:00", tz = "Europe/Berlin"))
(t2 #> [1] "2020-12-24 13:00:00 CET"
<- ymd_hms("2020-12-24 07:00:00", tz = "US/Eastern"))
(t3 #> [1] "2020-12-24 07:00:00 EST"
When computing their differences (as objects of the “difftime” class), we realize that t1
, t2
, and t3
, actually denote the same point (instant or moment) in time:
- t2
t1 #> Time difference of 0 secs
- t3
t1 #> Time difference of 0 secs
However, when manipulating times (e.g., by creating a new vector), information regarding time zones is often lost (or unified, based on the initial time zone):
<- c(t1, t2, t3))
(t4 #> [1] "2020-12-24 23:00:00 AEDT" "2020-12-24 23:00:00 AEDT"
#> [3] "2020-12-24 23:00:00 AEDT"
The vector t4
also shows that t1
, t2
, and t3
all denote the same moment in time, a fact that was only obscured by diplaying this time for different locations. However, the vector only shows this time for one particular time zone (specifically, the time zone of its first element tz(t1)
). Thus, we can still wonder: How should we best express this particular time?
The need to (re-)introduce time zone information to time objects creates two distinct tasks, with corresponding solutions:
Change time zone information by keeping the actual time points the same, but changing their representation (i.e., display fixed time points for a different time zone).
Change time zone information by keeping the representation the same, but changing the actual time points (i.e., display different time points that have the same nominal appearance for a different time zone).
The with_tz()
function addresses the first task: It changes time zone information (and thus changes the nominal time display) without changing the underlying point in time that is being represented:
# same ACTUAL time (but different display in different tz):
with_tz(t4, tz = "UTC")
#> [1] "2020-12-24 12:00:00 UTC" "2020-12-24 12:00:00 UTC"
#> [3] "2020-12-24 12:00:00 UTC"
This shows that the three identical times (which were merely expressed differently by t1
, t2
, and t3
) all denote noon on 2020-Dec-24 when expressed in terms of UTC (Coordinated Universal Time).
By contrast, the force_tz()
function addresses the second task: It preserves the appearance of its input times (i.e., the nominal time displayed), but changes the actual time points that are being represented:
# same NOMINAL time (but different actual time in different tz):
force_tz(t4, tz = "UTC")
#> [1] "2020-12-24 23:00:00 UTC" "2020-12-24 23:00:00 UTC"
#> [3] "2020-12-24 23:00:00 UTC"
The difference between both tasks and functions is subtle, but important:
with_tz()
only changes the appearance of time points, but keeps the time points intact.
By contrast, force_tz()
preserves the appearance of time, but changes the time actually represented.
When converting times into different time zones, we typically only want to change the appearance of time (aka. the “sense” of time, i.e., the particular way in which fixed time points are being displayed to us), rather than the actually denoted point in time (aka. the “referent” or “meaning” of the time displayed). Thus, we typically want to use with_tz()
, rather than force_tz()
, when converting some given times into a different time zone.
Checking for leap years
A good question to ask is: Is some specific year y
a leap year?
Many people can answer this question for the current year (e.g., “Yes, the year 2020 had a February, 29.”).
But what about the year 2066?
What about the year of dt_Titanic
(i.e., 1912)?
And what about the year MCMLXXXIV?
The hard core solution to this problem consists in studying the definition of a leap year and then implementing it into a command or function. The corresponding definition Wikipedia: leap year reads:
…in the Gregorian calendar, each leap year has 366 days instead of 365,
by extending February to 29 days rather than the common 28.
These extra days occur in each year which is an integer multiple of 4
(except for years evenly divisible by 100, which are not leap years
unless evenly divisible by 400).
In R, we could implement this definition as follows:
<- 2020
y %% 4 == 0) & ((y %% 100 != 0) | (y %% 400 == 0))
(y #> [1] TRUE
<- 2066
y %% 4 == 0) & ((y %% 100 != 0) | (y %% 400 == 0))
(y #> [1] FALSE
However, since we have learned about time points and time spans, we can solve such tasks by using heuristics.
For instance, we could define the interval from January 1st of year y
to January 1st of year y+1
and determine the amount of days (as durations or periods) that fit into this interval:
<- ymd("2020 01 01")
jan01_2020 <- ymd("2021 01 01")
jan01_2021
%--% jan01_2021) / ddays(1) # durations
(jan01_2020 #> [1] 366
%--% jan01_2021) %/% days(1) # periods
(jan01_2020 #> [1] 366
If the solution is 366, the year y
is a leap year, if it is 365 it is no leap year.
Similar solutions can be achieved by rounding dates, measuring and comparing their duration in other time units, or by trying to define the date of February 29, of year y
and checking whether this succeeds:
<- ymd("2020 02 29"))
(feb29_2020 #> [1] "2020-02-29"
<- ymd("2066 02 29"))
(feb29_2066 #> [1] NA
All these solutions should yield the same result, as long as we can rely on R’s internal date-time definitions, any functions used in our derivation, and our ability to correctly use the corresponding commands and understand their results.
A much simpler solution is finding a function that solves the task. The lubridate actually provides a leap_year()
function that gets the job done:
leap_year(2020)
#> [1] TRUE
leap_year(2066)
#> [1] FALSE
The benefits of using an existing R function are two-fold:
it saves us effort and time, and
it can be used flexibly with other features of our programming language:
# Was some given Date in a leap year?
leap_year(year(dt_Titanic))
#> [1] TRUE
# Is the current year a leap year?
leap_year(today())
#> [1] FALSE
# Is FALSE (as a number) a leap year?
leap_year(as.numeric(FALSE))
#> [1] TRUE
# What about the next 10 years?
leap_year(today() + years(0:10))
#> [1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE
Thus, functions are a pretty big deal — which is why we will learn more about them in the next Chapter 11 on writing functions).
The price of using existing functions is that we need to trust that their author(s) knew what they were doing.
In the case of the lubridate package (Spinu, Grolemund, & Wickham, 2021), its very likely that the authors can be trusted, as the package has been well-established and has been widely used (though also has changed quite a bit over the years).
Incidentally, the definition of the leap_year
function contains a line:
%%4 == 0) & ((year%%100 != 0) | (year%%400 == 0)) (year
which looks very much like our leap year definition and initial base R solution from above.
Other functions
The lubridate package defines many other nifty functions:
am(dt)
/pm(dt)
: Does a date-time objectdt
occur am or pm?
days_in_month(dt)
: Get the number of days in the month ofdt
dst(dt)
: Get daylight saving time indicator ofdt
format_ISO8601(dt)
: Format in ISO8601 character format
rollback(dt)
: Roll back date to last day of previous month
date_decimal(n)
: Converts a decimal numbern
to the corresponding date
Here are some examples of their results:
<- now())
(dt #> [1] "2022-04-08 17:22:38 CEST"
# Other functions:
am(dt)
#> [1] FALSE
pm(dt)
#> [1] TRUE
days_in_month(dt)
#> Apr
#> 30
dst(dt)
#> [1] TRUE
format_ISO8601(dt)
#> [1] "2022-04-08T17:22:38"
rollback(dt)
#> [1] "2022-03-31 17:22:38 CEST"
date_decimal(2020.531)
#> [1] "2020-07-13 08:18:14 UTC"
We conclude this section with some practice tasks that recapitulate the date and time functionality of the lubridate package.
Practice
Solve the following tasks by using lubridate functions:
- Local conventions and names of weekdays:
- Predict, evaluate, and explain the results of the following commands:
wday(sat_noon) # => 7
# Explanation: Week appears to start at Saturday (7)
wday(sat_noon, label = FALSE, week_start = 7)
wday(sat_noon, label = TRUE)
# Changing week_start to Monday (1):
wday(sat_noon, label = TRUE, abbr = FALSE, week_start = 1)
# Setting locale (to get German language names):
wday(sat_noon, label = TRUE, abbr = FALSE, week_start = 1,
locale = Sys.setlocale("LC_TIME", "de_DE.UTF-8"))
# Re-setting locale (back to en_US default):
Sys.setlocale(category = "LC_ALL", locale = "")
- Full circle with date-time-dates:
We learned that the as_date()
and as_datetime()
functions allow us converting between times and dates.
- Predict, evaluate, and explain the results of the following commands:
<- ymd_hms("2020-12-31 23:30:45")
time_1 <- as_date(time_1)
date_1 <- as_datetime(date_1)
time_2 == time_1
time_2 # format(time_2, explicate_time)
Answer: As time_2
is created from date_1
(i.e., a “Date” object), it lacks the time information of time_1
.
- How can we repair
time_2
to matchtime_1
?
Solution
+ hms("23:30:45")) == time_1 (time_2
- Durations vs. periods:
- Predict, evaluate, and explain the results of the following commands:
months(1) # period
dmonths(1) # duration
months(1) / days(1)
months(1) / ddays(1)
dmonths(1) / ddays(1)
dmonths(1) / days(1)
- Predict, evaluate, and explain the results of the following two commands:
<- ymd("2020-01-30")
d
d
+ dyears(1) # => "2021-01-29" (adding a duration)
d + years(1) # => "2021-01-30" (adding a period) d
Answer:
The command d <- ymd("2020-01-20")
assigns d
to a particular date (i.e., an instant in time).
To this, we add a time span (of 1 year) in two different ways:
+ years(1)
adds the period of 1 year (in human units), yielding the same date a year later.
By contrast, + dyears(1)
adds the duration of 1 year (as an exact number of seconds).
As 2020 is a leap year (i.e., containing a date of “2020-02-29” and a total number of 366 days) both additions yield different results.
Thus, when dealing with common-sense units of time, adding periods typically yields intuitively more plausible results.
Note also the default time zone settings to UTC (Coordinated Universal Time).
- Explain the different results of the following two commands:
<- ymd_hms("2020-03-28 12:00:00", tz = "Europe/Berlin")
sat_noon # format(sat_noon, explicate_time)
# Adding time spans:
+ ddays(1))
(sat_noon #> [1] "2020-03-29 13:00:00 CEST"
+ days(1))
(sat_noon #> [1] "2020-03-29 12:00:00 CEST"
Answer:
sat_noon
is assigned to a particular date-time point (i.e., instant or moment) in time: Sat, 2020-03-28 12:00:00 CET (+0100 from UTC).
The tz
specification ensures that the time zone is set to CET
(i.e., corresponds to a specific location).
A difference between adding a duration of ddays(1)
and adding a period of days(1)
implies that a time shift has occurred.
In this case, Germany introduced daylight saving time (DST) on “2020-03-29”:
At 2am, the clocks are set forwards by 1 hour. Thus, adding the duration of 1 day (as in + ddays(1)
) yields a later time than adding the period of 1 day (as in + days(1)
). Again, adding periods yields more predictable results.
- Durations, periods, and intervals:
- Predict, evaluate, and explain the different results of the following two commands:
# interval ends:
int_end(next_6m_dur)
#> [1] "2022-10-08 08:22:35 CEST"
int_end(next_6m_per)
#> [1] "2022-10-08 17:22:35 CEST"
- Evaluate and explain the result of the following expression in terms of their notions of time and in common-sense terms:
int_start(int_shift(day_before_yesterday, by = days(2))) + hours(12) - dseconds(30)
Answer: We can re-construct the answer in 4 steps:
- The
day_before_yesterday
was defined as an interval (see above). - Shifting this interval by a period of 2 days yields the interval of today (from 00:00:00 to 24:00:00).
- Adding a period of 12 hours to the start of today marks a specific date-time point: noon today.
- Subtracting a duration of 30 seconds yields a date-time point precisely 30 seconds before noon today.
Note that this example involves four different notions of time: Date-time points (i.e., instants, moments, or “POSIXct” objects) and three different types of time spans.
- Predict, evaluate, and explain the results of the following expressions:
<- today() - years(1)
one_year_ago
# Dividing time spans:
dyears(1) / ddays(365) # durations
years(1) / days(1) # periods
# Dividing intervals:
%--% today()) / ddays(1) # durations
(one_year_ago %--% today()) %/% days(1) # periods (one_year_ago
Hint: These examples are inspired by Section 16.4.3 Intervals (Wickham & Grolemund, 2017), which also provides a short explanation. However, note that some definitions seem to have changed.
- Leap years in Roman numerals:
In Section 10.3.4 above, we left the leap year question regarding MCMLXXXIV unresolved:
- Was the year MCMLXXXIV (represented in Roman numerals) a leap year?
Answer:
A bit of experimentation with the R utils function as.roman()
reveals that the character sequence “MCMLXXXIV” represents the calendar year 1984 in Roman numerals. As it turns out, this happens to be a leap year:
<- utils::as.roman(1984))
(y_Roman #> [1] MCMLXXXIV
<- as.numeric(y_Roman))
(y_num #> [1] 1984
leap_year(y_num)
#> [1] TRUE
References
The advantage of this approach is that we start with a set of date-times
dt
that we later want to re-create from its components (i.e., by using themake_date()
andmake_datetime()
functions).↩︎In Section 16.4.5 Exercises of r4ds, the first question asks: “Why is there
months()
but nodmonths()
?” Thus, I can only guess that the functiondmonths()
was absent from earlier versions of lubridate, but then was added later. And since the value ofdyears(1)
suffers from the same problem (as leap years are a day longer than non-leap years), it seems ok to provide an average for estimation purposes.↩︎