## 10.3 Dates and times with lubridate

The previous section has shown that **base** R provides basic support for handling dates and times, but as the corresponding classes and functions can be confusing, this remains challenging. To facilitate working with dates and times, the **lubridate** package (Spinu et al., 2020) provides a more coherent and user-friendly framework. This section illustrates key **lubridate** commands and concepts.

As we only need **lubridate** in those sessions in which we are dealing with dates and times, the package is *not* part of the core tidyverse. Hence, we need to load it in addition to the core tidyverse packages when we want to use it:

```
library(tidyverse) # loads core tidyverse packages
library(lubridate) # loads this particular package
```

Just as **base** R provided two separate functions for obtaining the current date and time — specifically, `Sys.Date()`

and `Sys.time()`

— the **lubridate** package provides two corresponding functions:

```
today() # today's date
#> [1] "2020-07-07"
now() # current date-time
#> [1] "2020-07-07 13:02:15 CEST"
```

To learn about the internal representation of both objects, we can inspect their class in R:

```
class(today()) # a "Date" object
#> [1] "Date"
class(now()) # a "POSIXct" object
#> [1] "POSIXct" "POSIXt"
```

We see that the **lubridate** package uses the two key classes discussed in Section 10.2.2:

`today()`

returns the current date (as a “Date” object).`now()`

returns the current calendar time (as a date-time, i.e., “POSIXct” object).

Both the `today()`

and `now()`

functions also accept a `tzone`

argument for specifying a time zone.
To see what time zones are used by default, we can apply the `tz()`

function to the results of both functions:

```
tz(today()) # uses "UTC" (Coordinated Universal Time) as default
#> [1] "UTC"
tz(now()) # uses local system setting as default
#> [1] ""
Sys.timezone() # show local system setting
#> [1] "Europe/Berlin"
```

Thus, both `today()`

and `now()`

are convenient short-cuts, but should be handled with care when dealing with different time zones.

### 10.3.1 Parsing dates and times

When learning how to read and retrieve (elements of) dates and times in **base** R, we spent a lot of time and effort on conversion specifications (e.g., strings like `"%Y-%m-%d"`

or `"%H:%M:%S"`

) that specified parsing and formatting instructions in the so-called POSIX standard (see Section 10.2.2). Although it is good to know POSIX, as it is widely used and powerful, it can also seem a bit cumbersome and clumsy. Thus, **lubridate** adopts a more intuitive approach to parsing dates and times.

To create new dates or times, **lubridate** provides functions that parse (i.e., read or scan) them from various other objects.
Specifically, we can read dates or times

- from character strings (representing dates or times);

- from variables (denoting date or time components);

- from other types (i.e., date or time objects).

The next sections introduce the **lubridate** functions for each of these object types.

#### 1. Read from character strings

In Chapter 6 on Importing data, we encountered some **readr** functions that parse character vectors into dates or times (see the `parse_date()`

, `parse_datetime()`

and `parse_time()`

functions in Section 6.2.1).

The **lubridate** package provides even simpler tools for reading in dates and times.
The function names are combinations of the initial letter of basic date and time components:

- date components are:
`y`

year,`m`

month,`d`

day - time components are:
`h`

hour,`m`

minute,`s`

seconds

The order of these components in the function name determines how the arguments (provided as strings) are interpreted. Here are some examples:

- Dates from strings: Without any further specification, a date-denoting string like
`"02 04 06"`

would be highly ambiguous (see**Table 10.1**of Section 10.2.2). To read this string into a date,**lubridate**allows us to use a combination of`d`

,`m`

, and`y`

to indicate which date-related element each numeric component describes:

```
ymd("06 02 04")
#> [1] "2006-02-04"
ymd("20060204") # works as well
#> [1] "2006-02-04"
ymd("2006 2 4")
#> [1] "2006-02-04"
ymd("2006 : 2///04")
#> [1] "2006-02-04"
ymd(060204) # works with numbers
#> [1] "2006-02-04"
dmy("4 2 06")
#> [1] "2006-02-04"
dmy("04 02 2006")
#> [1] "2006-02-04"
mdy("02 04 06")
#> [1] "2006-02-04"
mdy(020406) # works with numbers
#> [1] "2006-02-04"
# Even "abysmal" orders are possible:
myd("06 02 04")
#> [1] "2002-06-04"
dym("06 02 04")
#> [1] "2002-04-06"
# Resulting representation:
class(ymd(Sys.Date())) # a "Date" object
#> [1] "Date"
```

- Dates with times (i.e., date) from strings:

```
ymd_hms("2020-04-01 10:30:13")
#> [1] "2020-04-01 10:30:13 UTC"
ymd_hm("2020/04/01 10.30")
#> [1] "2020-04-01 10:30:00 UTC"
# Resulting representation:
class(ymd_hms("2020-04-01 10:30:13")) # a "POSIXct" object
#> [1] "POSIXct" "POSIXt"
class(ymd_hms(Sys.time()))
#> [1] "POSIXct" "POSIXt"
```

- Times from strings:

```
hms("15:16:17")
#> [1] "15H 16M 17S"
hms("10 11 12")
#> [1] "10H 11M 12S"
hm("13 14")
#> [1] "13H 14M 0S"
# Resulting representation:
class(hms("15:16:17")) # a lubridate "period" object
#> [1] "Period"
#> attr(,"package")
#> [1] "lubridate"
```

Note that the particular representation of date and time objects (e.g., as the columns of a tibble) varies with the functions that created the corresponding variable (column):

```
dt <- c("10:05 29/02/2020", "20:10 24/12/2020")
tb <- tibble::tibble(datetime = dt)
tb
#> # A tibble: 2 x 1
#> datetime
#> <chr>
#> 1 10:05 29/02/2020
#> 2 20:10 24/12/2020
tb2 <- tb %>%
separate(datetime, into = c("time", "day"), sep = " ", remove = FALSE) %>%
separate(time, into = c("hour", "min"), sep = ":", remove = FALSE) %>%
mutate(dt_1 = paste(day, time), # a character string
dt_2 = dmy_hm(dt_1), # a <dttm> object
ti_2 = hm(time) # a <period> object
)
tb2
#> # A tibble: 2 x 8
#> datetime time hour min day dt_1 dt_2 ti_2
#> <chr> <chr> <chr> <chr> <chr> <chr> <dttm> <Period>
#> 1 10:05 29/0… 10:05 10 05 29/02/… 29/02/20… 2020-02-29 10:05:00 10H 5M 0S
#> 2 20:10 24/1… 20:10 20 10 24/12/… 24/12/20… 2020-12-24 20:10:00 20H 10M 0S
```

#### 2. Read from date and time variables

Many datasets already contain variables that denote date components (i.e., values for years, months, weeks, or days) or time components (i.e., values for hours, minutes, or seconds).
Given our skills in dealing with Text data (from Chapter 9) we *could* first paste these variables into a character string and than parse this string into a date or time variable.
However, the **lubridate** package also provides more direct functions for converting such variables into dates or calendar times:

`make_date()`

expects inputs to`year`

,`month`

, and`day`

arguments to create an object of the “Date” class:

```
make_date(year = 2020, month = 7, day = 13) # as numbers
#> [1] "2020-07-13"
make_date(year = "2020", month = "7", day = "13") # as characters
#> [1] "2020-07-13"
make_date(year = 2020, month = "007", day = "013") # mixed
#> [1] "2020-07-13"
# Note defaults for missing elements:
make_date(year = 2020, month = 7) # day = 1
#> [1] "2020-07-01"
make_date(month = 2, day = 13) # year = 1970
#> [1] "1970-02-13"
make_date(year = 2020, day = 13) # month = 1
#> [1] "2020-01-13"
# Representation as "Date":
is.Date(make_date(year = 2020))
#> [1] TRUE
is.POSIXct(make_date(year = 2020))
#> [1] FALSE
```

Note that `make_date()`

accepts a variety of input types and fills in default values for missing elements.

`make_datetime()`

expects the same inputs as`make_date()`

(i.e.,`year`

,`month`

, and`day`

arguments), plus additional inputs to its time-related arguments (`hour`

,`min`

,`sec`

, and a time zone`tz`

, which defaults to UTC) to create a calendar time (i.e., “POSIXct”) object:

```
make_datetime(year = 2020, month = 7, day = 13,
hour = 10, min = 30, sec = 45, tz = "Europe/Zurich")
#> [1] "2020-07-13 10:30:45 CEST"
## Note: Inputs need to be numeric
# make_datetime(year = "2020", month = "7", day = "13",
# hour = "10", min = "30", sec = "45")
# Note defaults for missing elements:
make_datetime(year = 2020)
#> [1] "2020-01-01 UTC"
make_datetime(sec = 33)
#> [1] "1970-01-01 00:00:33 UTC"
# Representation as "POSIXct":
is.Date(make_datetime(year = 2020))
#> [1] FALSE
is.POSIXct(make_datetime(year = 2020))
#> [1] TRUE
```

Note that `make_datetime()`

is less flexible than `make_date()`

in expecting that all its arguments are numeric.
The **lubridate** additionally includes a `make_difftime()`

function for creating `difftime`

objects in various units of time (see the section on *durations* below).

To demonstrate these functions for creating dates or times from variables, we need some data that contains date and time variables.
Lacking such a dataset, we can create one. The following code snippet does this by working backwards:
We first use the `sample_time()`

function from **ds4psy** (to draw random samples of calendar times within a specific range of time) and then use a **dplyr** pipe to extract its date- and time-related components.^{46}

```
# Sample N calendar dates within the current year:
N <- 25
t_end <- ceiling_date(now(), "year") # ymd_hms("2020-12-31 23:59:59")
t_start <- t_end - years(1)
set.seed(246) # for reproducible randomness
dt_org <- sort(ds4psy::sample_time(n = N, from = t_start, to = t_end))
# Tibble with date and time components (as columns):
dt_tb <- tibble::tibble(dt_org) %>%
mutate(yr = year(dt_org),
mt = month(dt_org),
dy = day(dt_org),
hr = hour(dt_org),
mi = minute(dt_org),
sc = second(dt_org)
) %>%
select(-dt_org)
```

Just copy and run this code chunk and note that the resulting tibble `dt_tb`

contains numeric columns that contain date and time components:

yr | mt | dy | hr | mi | sc |
---|---|---|---|---|---|

2020 | 1 | 1 | 5 | 22 | 30 |

2020 | 2 | 10 | 3 | 15 | 39 |

2020 | 3 | 7 | 8 | 17 | 17 |

2020 | 3 | 31 | 19 | 5 | 35 |

2020 | 6 | 4 | 6 | 35 | 19 |

2020 | 6 | 4 | 22 | 25 | 1 |

Given this data, we can use `make_date()`

for creating dates, or `make_datetime()`

for creating calendar times:

```
dt_tb <- dt_tb %>%
mutate(date = make_date(year = yr, month = mt, day = dy),
datetime = make_datetime(year = yr, month = mt, day = dy,
hour = hr, min = mi, sec = sc)
)
head(dt_tb)
#> # A tibble: 6 x 8
#> yr mt dy hr mi sc date datetime
#> <dbl> <dbl> <int> <int> <int> <dbl> <date> <dttm>
#> 1 2020 1 1 5 22 30 2020-01-01 2020-01-01 05:22:30
#> 2 2020 2 10 3 15 39 2020-02-10 2020-02-10 03:15:39
#> 3 2020 3 7 8 17 17 2020-03-07 2020-03-07 08:17:17
#> 4 2020 3 31 19 5 35 2020-03-31 2020-03-31 19:05:35
#> 5 2020 6 4 6 35 19 2020-06-04 2020-06-04 06:35:19
#> 6 2020 6 4 22 25 1 2020-06-04 2020-06-04 22:25:01
```

To evaluate our success, we can verify whether the `datetime`

column in `dt_tb`

managed to re-construct the original vector `dt_org`

created above:

Ooops — this is awkward: Why do we not get out the original times `dt_org`

that we fed into the table?
A first hypothesis could be that calendar times (of the “POSIXct” class) are numeric objects and hence some differences may be due to rounding. We can check this by using the `round_date()`

function to round both our original vector (i.e., `dt_org`

) and the newly created one (`dt_tb$datetime`

) to the same units (e.g., “sec”):

```
# Round to nearest second:
all.equal(round_date(dt_org, unit = "sec"), round_date(dt_tb$datetime, unit = "sec"))
#> [1] "Mean absolute difference: 5472"
```

So rounding did not diminish the difference, and the discrepancies observed here are far too large to be due to rounding differences.
A clue to solving this puzzle is provided by computing the time difference between our newly created times `dt_tb$datetime`

and
the original times `dt_org`

:

```
# Inspect differences:
head(unclass(dt_tb$datetime)) - head(unclass(as.POSIXct(dt_org)))
#> [1] 3600 3600 3600 7200 7200 7200
```

This shows that our new calendar times `dt_tb$datetime`

are either 1 or 2 hours (i.e., 3600 seconds or 7200 seconds) later than our original times `dt_org`

. To detect the source of this difference, let’s look more closely at both vectors again:

```
# Inspect details:
head(dt_org)
#> [1] "2020-01-01 05:22:30 CET" "2020-02-10 03:15:39 CET"
#> [3] "2020-03-07 08:17:17 CET" "2020-03-31 19:05:35 CEST"
#> [5] "2020-06-04 06:35:19 CEST" "2020-06-04 22:25:01 CEST"
head(dt_tb$datetime)
#> [1] "2020-01-01 05:22:30 UTC" "2020-02-10 03:15:39 UTC"
#> [3] "2020-03-07 08:17:17 UTC" "2020-03-31 19:05:35 UTC"
#> [5] "2020-06-04 06:35:19 UTC" "2020-06-04 22:25:01 UTC"
```

We can see that both vectors show the same dates and times, but for different *time zones*.
Specifically, `make_datetime()`

used “UTC” by default, whereas our original vector `dt_org`

automatically used the current setting of our local system (here: “Europe/Berlin”):

Thus, to prevent such problems, we need to be explicit about the appropriate time zone when calling the `make_datetime()`

function:

```
dt_tb <- dt_tb %>%
mutate(date = make_date(year = yr, month = mt, day = dy),
datetime = make_datetime(year = yr, month = mt, day = dy,
hour = hr, min = mi, sec = sc,
tz = "") # tz = "" uses Sys.timezone()
)
head(dt_tb)
#> # A tibble: 6 x 8
#> yr mt dy hr mi sc date datetime
#> <dbl> <dbl> <int> <int> <int> <dbl> <date> <dttm>
#> 1 2020 1 1 5 22 30 2020-01-01 2020-01-01 05:22:30
#> 2 2020 2 10 3 15 39 2020-02-10 2020-02-10 03:15:39
#> 3 2020 3 7 8 17 17 2020-03-07 2020-03-07 08:17:17
#> 4 2020 3 31 19 5 35 2020-03-31 2020-03-31 19:05:35
#> 5 2020 6 4 6 35 19 2020-06-04 2020-06-04 06:35:19
#> 6 2020 6 4 22 25 1 2020-06-04 2020-06-04 22:25:01
```

Note that setting `tz = ""`

is a shortcut for using our local system time zone in `make_datetime()`

, rather than its “UTC” default. We could have been even more explicit by stating `tz = Sys.timezone()`

or `tz = "Europe/Berlin"`

.

As both vectors now used the same time zone (i.e., Europe/Berlin), they should contain the same date-time points:

This is reassuring — and good that we compared our result to the original date-time vector. The important lesson to learn here is:

- Always watch out for time zones when working with times.

We will reconsider this issue in **Exercise 2** (see Section 10.6.2).

#### 3. Read from dates or times

Given that R distinguishes between dates (e.g., of class “Date”, see Section 10.2.3) and dates with times (e.g., calendar times of class “POSIXct”, see Section 10.2.4), it is often necessary to switch between these formats. Let’s first re-create a date and a date-time object to work with:

```
tday <- today() # a date
is.Date(tday)
#> [1] TRUE
tnow <- now() # a date-time/calendar time
is.POSIXct(tnow)
#> [1] TRUE
```

The **lubridate** functions `as_date()`

and `as_datetime()`

facilitate such conversions:

`as_date()`

converts date-times (i.e., calendar times) into dates.

`as_datetime()`

converts dates (of class “Date”) into date-times (if possible).

Converting date-times into dates is straightforward, as it merely drops the time-related information:

```
# 1. Get dates from times:
tday_2 <- as_date(tnow)
tday_2
#> [1] "2020-07-07"
is.Date(tday_2)
#> [1] TRUE
```

Converting dates into date-times is trickier, as dates lack information about times:

```
# 2. Get times from dates:
tnow_2 <- as_datetime(tday)
tnow_2
#> [1] "2020-07-07 UTC"
is.POSIXct(tnow_2)
#> [1] TRUE
feb_29 <- as_datetime(dmy("29-02-2020"))
feb_29
#> [1] "2020-02-29 UTC"
is.POSIXct(feb_29)
#> [1] TRUE
```

We see that `tnow_2`

and `feb_29`

are calendar dates (i.e., date with times of type “POSIXct”), but still seem to lack time information.
Nevertheless, the time information is there, but the default time of the date was set to “00:00:00 UTC”.
This becomes apparent when explicating the time object by supplying a more detailed `format`

argument:

```
# explicate_time <- "%a, %Y-%m-%d %H:%M:%S %Z (%z from UTC)" # (defined above)
format(tnow_2, format = explicate_time)
#> [1] "Tue, 2020-07-07 00:00:00 UTC (+0000 from UTC)"
format(feb_29, format = explicate_time)
#> [1] "Sat, 2020-02-29 00:00:00 UTC (+0000 from UTC)"
```

Again, we see: Always watch out for time zones when working with times.

When the `as_date()`

and `as_datetime()`

functions receive *numeric* inputs, they add a corresponding number of increments to the Unix epoch at “1970-01-01 00:00:00 UTC” (see Wikipedia: Unix_time for details).
Note that date increments are interpreted as *days*, whereas time increments are interpreted as *seconds*:

```
# Dates from numeric inputs:
as_date(0) # Unix epoch
#> [1] "1970-01-01"
as_date(1) # increment: +1 day
#> [1] "1970-01-02"
as_date(365) # +1 year
#> [1] "1971-01-01"
# Times from numeric inputs:
as_datetime(0) # Unix epoch
#> [1] "1970-01-01 UTC"
as_datetime(1) # increment: +1 second
#> [1] "1970-01-01 00:00:01 UTC"
as_datetime(5 * 60 * 60) # +5 hours
#> [1] "1970-01-01 05:00:00 UTC"
as_datetime(24 * 60 * 60) # +1 day
#> [1] "1970-01-02 UTC"
as_datetime(365 * 24 * 60 * 60) # +1 year
#> [1] "1971-01-01 UTC"
```

### 10.3.2 Get and set date and time components

Having succeeded in *creating* date-time objects (from strings, other variables, or a date), we can ask additional questions:

- How can we
*get*or*set*individiual date and time components (of date-time objects)?

To illustrate this, we can use our `tnow`

scalar, which is an object of the “POSIXct” class:

Actually, we have already encountered a pretty nifty way of retrieving individual date and time components (see the conversion functions of the POSIX standard in Section 10.2.4 above, or evaluate `?strptime`

):

```
# Using format (see POSIX above):
format(tnow, "%A")
#> [1] "Tuesday"
format(tnow, "%F")
#> [1] "2020-07-07"
format(tnow, "%T %Z")
#> [1] "13:02:16 CEST"
# explicate_time <- "%a, %Y-%m-%d %H:%M:%S %Z (%z from UTC)" # (defined above)
format(tnow, format = explicate_time)
#> [1] "Tue, 2020-07-07 13:02:16 CEST (+0200 from UTC)"
```

However, using the `format()`

function with a `format`

argument according to the POSIX standard is pretty geeky.
If **lubridate** lives up to its name, its functions should flow a bit more fluently.

#### Getting date components

In fact, the following **lubridate** functions are pretty self-explanatory in retrieving individual date and time components from a date-time object (of the “POSIXct” or “POSIXlt” class):

```
# Get date components (from POSIX times):
year(tnow) # full year (numeric)
#> [1] 2020
month(tnow) # month in year (numeric)
#> [1] 7
mday(tnow) # day in month (numeric)
#> [1] 7
# Noteworthy date components:
yday(tnow) # day of year (numeric). See format(tnow, "%j")
#> [1] 189
wday(tnow) # day of week (numeric, week_start = 7 by default), see format(tnow, "%w")
#> [1] 3
```

For date components with names, using `label`

and `abbr`

arguments yields character strings as outputs:

```
# Get names instead of numbers:
month(tnow, label = TRUE, abbr = TRUE) # month in year (name)
#> [1] Jul
#> 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
wday(tnow, label = TRUE, abbr = FALSE) # day of week (name)
#> [1] Tuesday
#> 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
```

It is noteworthy and re-assuring that these date-related functions also work for dates (of class “Date”):

#### Getting time components

Getting time-related components from date-time objects (i.e., “POSIXct” times) is just as easy:

```
# Get time components (from "POSIXt" objects):
tnow # a POSIXct calendar time
#> [1] "2020-07-07 13:02:16 CEST"
hour(tnow)
#> [1] 13
minute(tnow)
#> [1] 2
second(tnow)
#> [1] 16.07128
tz(tnow) # time zone
#> [1] ""
```

Note the following defaults when asking for time components of dates:

#### Setting date and time components

Interestingly, the same **lubridate** functions that *get* date and time components can also be used to *set* those components:

```
# Copy:
new_tnow <- tnow
new_tnow # check
#> [1] "2020-07-07 13:02:16 CEST"
# SET date components:
year(new_tnow) <- 2022
month(new_tnow) <- 12
day(new_tnow) <- 31
new_tnow # check
#> [1] "2022-12-31 13:02:16 CET"
# SET time components:
hour(new_tnow) <- 23
minute(new_tnow) <- 45
second(new_tnow) <- 59
tz(new_tnow) <- "Australia/Sydney"
new_tnow # check
#> [1] "2022-12-31 23:45:59 AEDT"
```

What if we re-set a date component that depends on the date?

```
new_tnow
#> [1] "2022-12-31 23:45:59 AEDT"
format(new_tnow, explicate_time)
#> [1] "Sat, 2022-12-31 23:45:59 AEDT (+1100 from UTC)"
# day of week:
wday(new_tnow)
#> [1] 7
wday(new_tnow) <- 4 # CHANGE the weekday (4 is "Wed"")
new_tnow # Note that date has shifted to accommodate wday
#> [1] "2022-12-28 23:45:59 AEDT"
wday(new_tnow, label = TRUE)
#> [1] Wed
#> Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
# day in year:
yday(new_tnow)
#> [1] 362
yday(new_tnow) <- 366 # CHANGE the yday (note value)
new_tnow # Note that date has shifted (to the next year!) to accommodate yday
#> [1] "2023-01-01 23:45:59 AEDT"
yday(new_tnow)
#> [1] 1
```

Thus, we can use the same functions that get information from dates and date-times to set its elements. However, beware that setting date-time components can have unintended consequences. As date-time components are not independent of each other, setting some components typically affects other components.

### 10.3.3 Working with time spans

In Section 10.2.1, we distinguished between *time points* (often called *instants* or *moments*)
and *time spans* (aka. *durations*, *intervals*, or *time periods*).
However, different usages of these time span terms actually imply different concepts.
To enable accurate computations, we need to distinguish more carefully between the different types of time spans.

In fact, **lubridate** implements its own ontology of time spans.

Beyond *time points* (i.e., a particular instant or moment in time), the package distinguishes between 3 types of *time spans*:

*durations*are time spans in exact numbers of seconds*periods*are time spans in human units (e.g., days, months, years)*intervals*are time spans with a given start and end point in time

These different time span concepts were inspired by the Joda Time project (Colebourne & O’Neill, 2010) (see the original article by Grolemund & Wickham (2011) for background information).

Essentially, both *durations* and *intervals* express physical time spans (a specific number of seconds), whereas *periods* express time spans in human units that may vary based on context (e.g., not every day has the same number of hours and not every month or year have the same number of days). *Intervals* are *durations* that are anchored in calendar time (i.e., intervals have start and end points that are real date-times).

We will consider each type of time span to see how they are created and find out what we can do with them. To motivate our explorations, consider the following example:

On Tuesday, September 11, 2001, the terrorist group al-Qaeda attacked several targets in the United States in a coordinated fashion.
At 08:46 a.m., five hijackers crashed an American Airlines plane into the northern facade of the World Trade Center in New York City.
Many remember the vivid images of this particular event (a so-called *dread risk* event), even though it happened many years ago.
This raises the question:

*How long ago*did the 911-attacks take place?

Please take a moment (or rather: some time span) to think about potential answers to this question: What would you accept as an informative answer? How does this answer depend on when or where the question is being asked? What kind of accuracy would you expect? And which temporal unit(s) would an answer be expressed in?

#### 1. Durations

As a first approach for answering the question “How long ago did the 911-attacks take place?”, we can enter the particular time point of this event and subtract it from `now()`

to compute a time difference object in R:

```
# Define 2 date-times (calendar times):
t_911 <- ymd_hm("2001 September 11, 8:46", tz = "US/Eastern")
t_now <- now() # re-set current time
# Time difference:
td <- t_now - t_911
td
#> Time difference of 6873.928 days
```

The time difference `td`

represents a duration as an R object of class “difftime”, which is automatically displayed as a count of “days”.
The corresponding `difftime()`

function (see Section 10.2.4) offers a range of `units`

varying from “secs” to “weeks”, which are all rather limited in this case:

```
# Representation:
class(td)
#> [1] "difftime"
is.difftime(td)
#> [1] TRUE
is.numeric(td)
#> [1] FALSE
# difftime units:
difftime(t_now, t_911, units = "weeks")
#> Time difference of 981.9897 weeks
difftime(t_now, t_911, units = "auto")
#> Time difference of 6873.928 days
difftime(t_now, t_911, units = "secs")
#> Time difference of 593907377 secs
```

For time spans exceeding a few months, the duration class provided by **lubridate** is a better alternative:
The **lubridate** notion of *duration* measures time spans as the number of elapsed seconds.

Durations are internally defined as a special class of object and record time spans in numeric form (as numbers of seconds):

```
# Representation:
class(d)
#> [1] "Duration"
#> attr(,"package")
#> [1] "lubridate"
is.duration(d)
#> [1] TRUE
is.numeric(d)
#> [1] TRUE
unclass(d)
#> [1] 593907377
# Note:
unclass(d) == as.numeric(difftime(t_now, t_911, units = "secs"))
#> [1] TRUE
```

There are several constructor functions (all starting with `d`

) that facilitate defining durations:

```
# Creating durations:
dseconds(10)
#> [1] "10s"
dminutes(30)
#> [1] "1800s (~30 minutes)"
dhours(1)
#> [1] "3600s (~1 hours)"
ddays(2)
#> [1] "172800s (~2 days)"
dweeks(3)
#> [1] "1814400s (~3 weeks)"
dmonths(1) # ?: Does this make sense?
#> [1] "2629800s (~4.35 weeks)"
dyears(2)
#> [1] "63115200s (~2 years)"
```

Note that all these definitions internally create “Duration” objects that denote numbers of elapsed seconds, but are printed in a more human-readable fashion.
The `dmonths()`

function is flagged (with `?`

) as it’s underlying notion is a bit tricky. We just learned that *durations* are defined as time spans measuring an exact number of seconds — but how many seconds *are* there in a month? The answer clearly depends on the month in question (e.g., July is longer than June, and both are longer than February) and can only be determined when the particular month is not known.^{47} Thus, `dmonths(1)`

(evaluating to 2629800) can only be an estimate and should be handled with care in practical applications.

As they are numbers, durations can be used in arithmetic expressions:

```
# Math with durations:
dmonths(2) + dweeks(3) - ddays(4)
#> [1] "6728400s (~11.12 weeks)"
(dyears(1) + dhours(6))/dmonths(6)
#> [1] 2.001369
dyears(2) == 4 * dmonths(6)
#> [1] TRUE
dyears(2) == 2 * 365.25 * ddays(1)
#> [1] TRUE
```

However, we need to keep in mind that they represent abstract time spans (in numbers of seconds). Thus, adding durations to date-time objects (i.e., calendar times of the “POSIXct” class) can yield unexpected results:

```
# Apparent oddities with durations:
t1 <- ymd_hms("2020-03-29 00:00:00", tz = "Europe/Berlin")
t1 + dhours(10) # add 10 hours (as duration)
#> [1] "2020-03-29 11:00:00 CEST"
t1 + ddays(1) # add 1 day (as duration)
#> [1] "2020-03-30 01:00:00 CEST"
t2 <- ymd("2020-02-28")
t2 + dyears(1) # add a year (as duration)
#> [1] "2021-02-27 06:00:00 UTC"
```

We see that adding a duration of 10 hours or 1 day to `t1`

seemingly created a difference of 11 or 25 hours (in calendar time, but note the switch of time zone, due to daylight saving time, DST. Similarly, adding a year’s worth of seconds to `t2`

moved the date back by a day (due to 2020 being a leap year). Both results are correct, of course, if we really meant to add time spans as a specific number of seconds (i.e., durations).

But as we often *mean* something else when thinking “ten hours later”, “tomorrow”, or “next year”, *durations* are rather limited when calculating time spans in human units. But that’s ok — for that’s exactly what *periods* are for.

#### 2. Periods

When asking our original question:

*How long ago*did the 911-attacks take place?

receiving the number 6873.928) as its answer would be precise, but probably not satisfy us. This is because we typically do not think about longer periods of time in terms of an exact number of seconds. Instead, we tend to provide counts of various units of time so that their sum fills out the period of time we are dealing with.

In **lubridate**, *periods* are time spans that are expressed in human common-sense units of time (e.g., hours, days, months, years).
Importantly, a period varies in its length (when expressed as durations, i.e., number of seconds, except periods defined in seconds) based on its context. For instance, the leap year 2020 is 366 days long (as it contains a February 29, 2020), whereas the year 2021 is only 365 days long. Flexible periods turn into fixed time spans (of various lengths) when added to a specific time point (date-time or calendar time).

```
# Define a period p:
p <- as.period(t_now - t_911)
p
#> [1] "6873d 22H 16M 16.6333210468292S"
# Representation:
class(p)
#> [1] "Period"
#> attr(,"package")
#> [1] "lubridate"
is.period(p)
#> [1] TRUE
is.numeric(p)
#> [1] TRUE
```

As a consequence of their nature, periods are suited to set and track the change in the “clock time” between two events (date-times).

Periods are expressed and measured in common time units (ranging from seconds to years) and provide each unit as integer values (though seconds can be non-integers). Periods are created by simple constructor functions (that are all plural versions of the desired time unit):

```
# Creating periods:
# from time differences:
as.period(t_now - t_911)
#> [1] "6873d 22H 16M 16.6333210468292S"
# in units of integers:
seconds(7)
#> [1] "7S"
minutes(5)
#> [1] "5M 0S"
hours(2)
#> [1] "2H 0M 0S"
days(5)
#> [1] "5d 0H 0M 0S"
weeks(2)
#> [1] "14d 0H 0M 0S"
months(6)
#> [1] "6m 0d 0H 0M 0S"
years(10)
#> [1] "10y 0m 0d 0H 0M 0S"
```

As they objects of class “Period” are numbers, they can be used in computing arithmetic expressions:

```
# Math with periods:
now() + years(1) + months(1) + days(1) + hours(12)
#> [1] "2021-08-09 01:02:16 CEST"
hours(1) == 60 * seconds(60)
#> [1] TRUE
hours(3) == 3 * minutes(60)
#> [1] TRUE
days(1) == hours(24)
#> [1] TRUE
weeks(2) == days(14)
#> [1] TRUE
```

When computing with periods, each unit is applied separately. The distribution of periods among units is non-trivial (e.g., the duration of some days, months, or years are longer than others), but this complexity is hidden from us. In fact, as we tend to represent dates and times in terms of periods (at least as long as we think of calendar time), using periods in calculations typically yields more intuitive results than adding durations (see above):

```
# Intuitive results with periods:
t1 <- ymd_hms("2020-03-29 00:00:00", tz = "Europe/Berlin")
t1 + hours(10) # add 10 hours (as period)
#> [1] "2020-03-29 10:00:00 CEST"
t1 + years(1) # add 1 year (as period)
#> [1] "2021-03-29 CEST"
t2 <- ymd("2020-02-28")
t2 + years(1) # add a year (as period)
#> [1] "2021-02-28"
```

Thus, when reckoning with times and dates in various human-based units (like days, weeks, or months), periods are most likely the type of time span that we want to use.

#### 3. Intervals

In **lubridate**, *intervals* are time spans that are bound by two time points that are real date-times (or calendar times).
Thus, intervals are durations anchored in date-times (or calendar times) and provide a bridge between durations (i.e., number of seconds) and periods (i.e., common-sense time units) when at least one point in calendar time is known.

A first way of defining an interval requires a time span `x`

(which can be a time difference, duration, or period) and a `start`

date (typically a date-time or “POSIXct” object):

```
# Define an interval i1:
tdiff <- (t_now - t_911) # from difftime:
i1 <- as.interval(x = tdiff, start = t_911)
i1
#> [1] 2001-09-11 08:46:00 EDT--2020-07-07 07:02:16 EDT
tdur <- as.duration(t_now - t_911) # from duration
i2 <- as.interval(x = tdur, start = t_911)
i2 == i1
#> [1] TRUE
tper <- as.period(t_now - t_911) # from period
i3 <- as.interval(x = tper, start = t_911)
i3 == i1
#> [1] TRUE
```

An alternative way of defining an interval uses its `start`

and `end`

points (as date-time objects) and places a special operator `%--%`

between them (using “infix” notation):

Internally, **lubridate** represents intervals as objects of class “Interval”, which is numeric in nature:

Since an interval is anchored firmly in calendar time, both the exact number of seconds that passed (i.e., a *duration*) and the number of variable length time units that occurred during the interval (i.e., a *period*) can be calculated from a given interval.
For accurately converting intervals into durations or periods, we can use the `as.duration()`

and `as.period()`

functions:

Multiple transformations between durations, periods, and intervals yield the expected results (except for rounding differences):

but returning from durations or periods to an interval requires specifying a start date (as an anchor):

```
# interval requires start date:
as.interval(as.duration(as.period(i)), start = t_911)
#> [1] 2001-09-11 08:46:00 EDT--2001-09-11 08:47:40 EDT
as.interval(as.period(as.duration(i)), start = t_911)
#> [1] 2001-09-11 08:46:00 EDT--2001-09-11 08:47:40 EDT
```

Intervals can be thought of as lines with given start and end points on a linear axis of time. Thus, we can ask and answer a range of interesting questions when dealing with one or more intervals:

```
# Define intervals:
# (a) based on durations:
next_6m_dur <- t_now %--% (t_now + dmonths(6))
last_6m_dur <- (t_now - dmonths(6)) %--% t_now
# (b) based on periods:
next_6m_per <- t_now %--% (t_now + months(6))
day_before_yesterday <- (today() - days(2)) %--% (today() - days(1))
# Note:
next_6m_per > next_6m_dur
#> [1] TRUE
# Querying intervals: ----
# Start and end points:
int_start(next_6m_per)
#> [1] "2020-07-07 13:02:16 CEST"
int_end(day_before_yesterday)
#> [1] "2020-07-06 UTC"
# Length (in seconds):
int_length(next_6m_dur)
#> [1] 15778800
int_length(next_6m_per)
#> [1] 15901200
# Flip or shift interval:
int_flip(day_before_yesterday)
#> [1] 2020-07-06 UTC--2020-07-05 UTC
int_shift(day_before_yesterday, by = days(3))
#> [1] 2020-07-08 UTC--2020-07-09 UTC
# Overlaps:
int_overlaps(next_6m_dur, next_6m_per)
#> [1] TRUE
int_overlaps(next_6m_dur, day_before_yesterday)
#> [1] FALSE
# Same start or end point:
int_aligns(next_6m_dur, next_6m_per)
#> [1] TRUE
int_aligns(next_6m_dur, last_6m_dur)
#> [1] FALSE
```

The infix operator `x %within% y`

allows checking whether an interval or date-time `x`

lies within an interval or list of intervals `y`

:

```
# within:
t_now %within% next_6m_dur
#> [1] TRUE
day_before_yesterday %within% last_6m_dur
#> [1] TRUE
day_before_yesterday %within% next_6m_dur
#> [1] FALSE
```

The `int_diff()`

function is similar to the **base** R function `diff()`

, but returns the intervals that occur between the elements of a vector of date-times:

```
# A vector of date-time points:
tv <- now() + months(1:4)
tv
#> [1] "2020-08-07 13:02:17 CEST" "2020-09-07 13:02:17 CEST"
#> [3] "2020-10-07 13:02:17 CEST" "2020-11-07 13:02:17 CET"
# Intervals between date-times:
int_diff(tv)
#> [1] 2020-08-07 13:02:17 CEST--2020-09-07 13:02:17 CEST
#> [2] 2020-09-07 13:02:17 CEST--2020-10-07 13:02:17 CEST
#> [3] 2020-10-07 13:02:17 CEST--2020-11-07 13:02:17 CET
as.duration(int_diff(tv))
#> [1] "2678400s (~4.43 weeks)" "2592000s (~4.29 weeks)" "2682000s (~4.43 weeks)"
as.period(int_diff(tv))
#> [1] "1m 0d 0H 0M 0S" "1m 0d 0H 0M 0S" "1m 0d 0H 0M 0S"
```

#### Divinding time spans

When asking “How long…” or “How old…” questions, we often are looking for answers that express a time span in terms of another one.

For instance, we can determine how many durations or periods fall into a given *interval* of time by dividing intervals by other time spans. This is straightforward for *durations*:

```
next_6m_dur # an interval
#> [1] 2020-07-07 13:02:16 CEST--2021-01-06 03:02:16 CET
is.interval(next_6m_dur)
#> [1] TRUE
# Dividing intervals by durations:
next_6m_dur / dyears(1)
#> [1] 0.5
next_6m_dur / dmonths(1)
#> [1] 6
next_6m_dur / dweeks(1)
#> [1] 26.08929
next_6m_dur / ddays(1)
#> [1] 182.625
```

— yet may yield unexpected results — and also works for *periods*:

```
next_6m_per # an interval
#> [1] 2020-07-07 13:02:16 CEST--2021-01-07 13:02:16 CET
is.interval(next_6m_per)
#> [1] TRUE
# Dividing intervals by periods:
next_6m_per / years(1)
#> [1] 0.5042237
next_6m_per / months(1)
#> [1] 6
next_6m_per / weeks(1)
#> [1] 26.28571
next_6m_per / days(1)
#> [1] 184
```

However, we *cannot* divide time differences or durations by periods, or periods by durations:

```
td <- (t_now - t_911) # difftime (from above)
# would NOT work:
td / years(1) # cannot divide difftime / Period
as.duration(td) / years(1) # cannot divide Duration / Period
as.period(td) / dyears(1) # cannot divide Period / Duration
```

In practical contexts, we often do not care about exact durations, but are primarily interested in the number of *completed* time periods. These can be computed by dividing time *intervals* by *periods* (by using integer division):

```
next_6m_per # an interval
#> [1] 2020-07-07 13:02:16 CEST--2021-01-07 13:02:16 CET
is.interval(next_6m_per)
#> [1] TRUE
# Dividing intervals by periods:
next_6m_per %/% years(1)
#> [1] 0
next_6m_per %/% months(1)
#> [1] 6
next_6m_per %/% weeks(1)
#> [1] 26
next_6m_per %/% days(1)
#> [1] 184
```

Having learned about three different types of *time spans* and their combinations, we finally are in a position to answer our original question:

*How long ago*did the 911-attacks take place?

In most applied contexts, the following estimates — based on a *duration*, a *period*, or an *interval* — would count as informative answers:

```
# (1) from a duration:
t_dur <- as.duration(t_now - t_911)
round(t_dur / dyears(1), 1)
#> [1] 18.8
round(t_dur / dmonths(1), 1)
#> [1] 225.8
# (2) from a period:
t_per <- as.period(t_now - t_911)
round(t_per / years(1), 1)
#> [1] 18.8
round(t_per / months(1), 1)
#> [1] 225.8
# (3) from an interval:
t_int <- t_911 %--% t_now
t_int %/% years(1)
#> [1] 18
t_int %/% months(1)
#> [1] 225
```

We see once more, that — in R, as in life — many different ways can yield satisfactory results. Which way is best depends on many additional details, but it’s good to know what our options are.

#### Choosing the right time span

Given three different time spans, which one should we use? As always, this depends on the task that we want to do.

Chapter 16: Dates and times of r4ds recommends to always use the simplest type that solves our problem.
When our primary concern is for amounts of time elapsed in terms of seconds, we use *durations*.
When time spans are to be measured in common-sense units, *periods* typically provide the best solutions.
And if we need to measure time spans that are bounded by calendar times, we use *intervals*, or combine several time spans.

Most everyday questions about time spans can be solved by either computing durations, periods, or intervals, or by dividing time intervals by durations or periods. Keep in mind that not all combinations of the different time spans concepts and arithmetic operations make sense. (Figure 16.1 provides an overview of the arithmetic operations that are allowed between pairs of date/time classes.)

### 10.3.4 Other reasons to lubridate

This section collects some additional examples of computing with dates and times with **lubridate** commands.

#### Checking date and time objects

Given a multiplicity of object types — a “Date” class and two different date-time classes (i.e., the “POSIXct” and “POSIXlt” classes) — it is easy to get confused which type of date or time we are dealing with. Fortunately, the **lubridate** package provides convenient test functions that verify the class of a date or time object:

```
# "difftime"
td # (from above)
#> Time difference of 6873.928 days
is.difftime(td)
#> [1] TRUE
is.difftime(tday)
#> [1] FALSE
# "Date":
tday
#> [1] "2020-07-07"
is.Date(tday)
#> [1] TRUE
is.POSIXct(tday)
#> [1] FALSE
# "POSIXct" calendar time:
ctnow
#> [1] "2020-07-07 13:02:13 CEST"
is.POSIXct(ctnow)
#> [1] TRUE
is.POSIXlt(ctnow)
#> [1] FALSE
# "POSIXlt" local time:
ltnow
#> [1] "2020-07-07 13:02:13 CEST"
is.POSIXlt(ltnow)
#> [1] TRUE
is.Date(ltnow)
#> [1] FALSE
```

Whenever dealing with multiple date-time classes, these functions are very helpful.

#### Rounding dates and times

When computing with dates, times, and various time spans, our resolution of interest is rarely a specific number of seconds. As we have seen in Section 10.3.3, this issue can often be addressed by performing computations in terms of periods or intervals or by dividing intervals by durations or periods.

For date-times (i.e., objects of the “POSIXct” class), *rounding* often is an issue as well.
To address this concern, **lubridate** provides a range of conventient rounding functions that allow setting the direction and the `unit`

used for rounding:

```
(t <- ymd_hms("2020-11-16 12:01:30"))
#> [1] "2020-11-16 12:01:30 UTC"
# Rounding times (by units):
round_date(t, unit = "second") # no change
#> [1] "2020-11-16 12:01:30 UTC"
round_date(t, unit = "minute") # rounded up
#> [1] "2020-11-16 12:02:00 UTC"
round_date(t, unit = "hour") # rounded down
#> [1] "2020-11-16 12:00:00 UTC"
round_date(t, unit = "day") # rounded up
#> [1] "2020-11-17 UTC"
round_date(t, unit = "week") # rounded down
#> [1] "2020-11-15 UTC"
# Force rounding up:
ceiling_date(t, unit = "minute")
#> [1] "2020-11-16 12:02:00 UTC"
ceiling_date(t, unit = "hour")
#> [1] "2020-11-16 13:00:00 UTC"
ceiling_date(t, unit = "day")
#> [1] "2020-11-17 UTC"
ceiling_date(t, unit = "week")
#> [1] "2020-11-22 UTC"
# Force rounding down:
floor_date(t, unit = "minute")
#> [1] "2020-11-16 12:01:00 UTC"
floor_date(t, unit = "hour")
#> [1] "2020-11-16 12:00:00 UTC"
floor_date(t, unit = "day")
#> [1] "2020-11-16 UTC"
floor_date(t, unit = "week")
#> [1] "2020-11-15 UTC"
```

#### Time zone conversions

In Section 10.2.4, we noted the importance of time zones and mentioned that many **base** R functions include a `tz`

argument for setting them (see `Sys.timezone()`

for your current system setting and `OlsonNames()`

for available options).

When not explicitly specifying any time zone information, any date-times created in R either use our local system setting (here: “Europe/Berlin”, which may or may not include daylight saving time, DST) or default to “UTC” (Coordinated Universal Time).
For instance, when scheduling dates for the next four quarters (starting `now()`

in a time zone with DST), the summer dates will automatically include DST information:

```
(quarters <- now(tzone = "Europe/Berlin") + months(seq(0, 12, by = 4)))
#> [1] "2020-07-07 13:02:17 CEST" "2020-11-07 13:02:17 CET"
#> [3] "2021-03-07 13:02:17 CET" "2021-07-07 13:02:17 CEST"
```

The **lubridate** functions for parsing date-times also have a `tz`

argument.
For instance, here are three specific date-time definitions (with different time zones):

```
(t1 <- ymd_hms("2020-12-24 23:00:00", tz = "Australia/Melbourne"))
#> [1] "2020-12-24 23:00:00 AEDT"
(t2 <- ymd_hms("2020-12-24 13:00:00", tz = "Europe/Berlin"))
#> [1] "2020-12-24 13:00:00 CET"
(t3 <- ymd_hms("2020-12-24 07:00:00", tz = "US/Eastern"))
#> [1] "2020-12-24 07:00:00 EST"
```

When computing their differences (as `difftime`

objects), we realize that `t1`

, `t2`

, and `t3`

, actually denote the *same* point (instant or moment) in time:

However, when manipulating times (e.g., by creating a new vector), information regarding time zones is often lost (or unified, based on the initial time zone):

```
(t4 <- c(t1, t2, t3))
#> [1] "2020-12-24 23:00:00 AEDT" "2020-12-24 23:00:00 AEDT"
#> [3] "2020-12-24 23:00:00 AEDT"
```

The vector `t4`

also shows that `t1`

, `t2`

, and `t3`

all denote the *same* moment in time, a fact that was only obscured by diplaying this time for different locations. However, the vector only shows this time for one particular time zone (specifically, the time zone of its first element `tz(t1)`

). Thus, we can still wonder: How should we best express this particular time?

The need to (re-)introduce time zone information to time objects creates two distinct tasks, with corresponding solutions:

Change time zone information by keeping the actual time points the same, but changing their representation (i.e., display fixed time points for a different time zone).

Change time zone information by keeping the representation the same, but changing the actual time points (i.e., display different time points that have the same nominal appearance for a different time zone).

The `with_tz()`

function addresses the first task: It changes time zone information (and thus changes the nominal time display) without changing the underlying point in time that is being represented:

```
# same ACTUAL time (but different display in different tz):
with_tz(t4, tz = "UTC")
#> [1] "2020-12-24 12:00:00 UTC" "2020-12-24 12:00:00 UTC"
#> [3] "2020-12-24 12:00:00 UTC"
```

This shows that the three identical times (which were merely expressed differently by `t1`

, `t2`

, and `t3`

) all denote noon on 2020-Dec-24 when expressed in terms of UTC (Coordinated Universal Time).

By contrast, the `force_tz()`

function addresses the second task: It preserves the appearance of its input times (i.e., the nominal time displayed), but changes the actual time points that are being represented:

```
# same NOMINAL time (but different actual time in different tz):
force_tz(t4, tz = "UTC")
#> [1] "2020-12-24 23:00:00 UTC" "2020-12-24 23:00:00 UTC"
#> [3] "2020-12-24 23:00:00 UTC"
```

The difference between both tasks and functions is subtle, but important:
`with_tz()`

only changes the appearance of time points, but keeps the time points intact.
By contrast, `force_tz()`

preserves the appearance of time, but changes the time actually represented.
When converting times into different time zones, we typically only want to change the appearance of time (aka. the “sense” of time, i.e., the particular way in which fixed time points are being displayed to us), rather than the actually denoted point in time (aka. the “referent” or “meaning” of the time displayed). Thus, we typically want to use `with_tz()`

, rather than `force_tz()`

, when converting some given times into a different time zone.

#### Checking for leap years

A good question to ask is: Is some specific year `y`

a leap year?

Many people can answer this question for the current year (e.g., “Yes, the year 2020 had a February, 29.”).
But what about the year 2066?
What about the year of `Titanic_sinks`

(i.e., 1912)?
And what about the year MCMLXXXIV?

The hard core solution to this problem consists in studying the definition of a *leap year* and then implementing it into a command or function. The corresponding definition Wikipedia: leap year reads:

…in the Gregorian calendar, each leap year has 366 days instead of 365,

by extending February to 29 days rather than the common 28.

These extra days occur in each year which is an integer multiple of 4

(except for years evenly divisible by 100, which are not leap years

unless evenly divisible by 400).

In R, we could implement this definition as follows:

```
y <- 2020
(y %% 4 == 0) & ((y %% 100 != 0) | (y %% 400 == 0))
#> [1] TRUE
y <- 2066
(y %% 4 == 0) & ((y %% 100 != 0) | (y %% 400 == 0))
#> [1] FALSE
```

However, since we have learned about time points and time spans, we can solve such tasks by using heuristics.
For instance, we could define the interval from January 1st of year `y`

to January 1st of year `y+1`

and determine the amount of days (as durations or periods) that fit into this interval:

```
jan01_2020 <- ymd("2020 01 01")
jan01_2021 <- ymd("2021 01 01")
(jan01_2020 %--% jan01_2021) / ddays(1) # durations
#> [1] 366
(jan01_2020 %--% jan01_2021) %/% days(1) # periods
#> [1] 366
```

If the solution is 366, the year `y`

is a leap year, if it is 365 it is no leap year.

Similar solutions can be achieved by rounding dates, measuring and comparing their duration in other time units, or by trying to define the date of February, 29, of year `y`

and checking whether this succeeds:

All these solutions should yield the same result, as long as we can rely on R’s internal date-time definitions, any functions used in our derivation, and our ability to correctly use the corresponding commands and understand their results.

A much simpler solution is finding a function that solves the task. The **lubridate** actually provides a `leap_year()`

function that gets the job done:

The benefits of using an existing R function are two-fold:

it saves us effort and time, and

it can be used flexibly with other features of our programming language:

```
# Was some given Date in a leap year?
leap_year(year(Titanic_sinks))
#> [1] TRUE
# Is the current year a leap year?
leap_year(today())
#> [1] TRUE
# Is FALSE (as a number) a leap year?
leap_year(as.numeric(FALSE))
#> [1] TRUE
# What about the next 10 years?
leap_year(today() + years(0:10))
#> [1] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
```

Thus, functions are a pretty big deal — which is why we will learn more about them in the next Chapter 11 on writing functions).

The price of using existing functions is that we need to trust that their author(s) knew what they were doing.
In the case of the **lubridate** package (Spinu et al., 2020), its very likely that the authors can be trusted, as the package has been well-established and has been widely used (though also has changed quite a bit over the years).
Incidentally, the definition of the `leap_year`

function contains a line:

which looks very much like our leap year definition and initial **base** R solution from above.

#### Other functions

The **lubridate** package defines many other nifty functions:

`am(dt)`

/`pm(dt)`

: Does a date-time object`dt`

occur am or pm?

`days_in_month(dt)`

: Get the number of days in the month of`dt`

`dst(dt)`

: Get daylight saving time indicator of`dt`

`format_ISO8601(dt)`

: Format in ISO8601 character format

`rollback(dt)`

: Roll back date to last day of previous month

`date_decimal(n)`

: Converts a decimal number`n`

to the corresponding date

Here are some examples of their results:

```
(dt <- now())
#> [1] "2020-07-07 13:02:18 CEST"
# Other functions:
am(dt)
#> [1] FALSE
pm(dt)
#> [1] TRUE
days_in_month(dt)
#> Jul
#> 31
dst(dt)
#> [1] TRUE
format_ISO8601(dt)
#> [1] "2020-07-07T13:02:18"
rollback(dt)
#> [1] "2020-06-30 13:02:18 CEST"
date_decimal(2020.531)
#> [1] "2020-07-13 08:18:14 UTC"
```

We conclude this section with some practice tasks that recapitulate the date and time functionality of the **lubridate** package.

#### Practice

Solve the following tasks by using **lubridate** functions:

- Local conventions and names of weekdays:

- Predict, evaluate, and explain the results of the following commands:

```
wday(sat_noon) # => 7
# Explanation: Week appears to start at Saturday (7)
wday(sat_noon, label = FALSE, week_start = 7)
wday(sat_noon, label = TRUE)
# Changing week_start to Monday (1):
wday(sat_noon, label = TRUE, abbr = FALSE, week_start = 1)
# Setting locale (to get German language names):
wday(sat_noon, label = TRUE, abbr = FALSE, week_start = 1,
locale = Sys.setlocale("LC_TIME", "de_DE.UTF-8"))
# Re-setting locale (back to en_US default):
Sys.setlocale(category = "LC_ALL", locale = "")
```

- Full circle with date-time-dates:

We learned that the `as_date()`

and `as_datetime()`

functions allow us converting between times and dates.

- Predict, evaluate, and explain the results of the following commands:

```
time_1 <- ymd_hms("2020-12-31 23:30:45")
date_1 <- as_date(time_1)
time_2 <- as_datetime(date_1)
time_2 == time_1
# format(time_2, explicate_time)
```

**Answer:** As `time_2`

is created from `date_1`

(i.e., a “Date” object), it lacks the time information of `time_1`

.

- How can we repair
`time_2`

to match`time_1`

?

#### Solution

- Durations vs. periods:

- Predict, evaluate, and explain the results of the following commands:

```
months(1) # period
dmonths(1) # duration
months(1) / days(1)
months(1) / ddays(1)
dmonths(1) / ddays(1)
dmonths(1) / days(1)
```

- Predict, evaluate, and explain the results of the following two commands:

```
d <- ymd("2020-01-30")
d
d + dyears(1) # => "2021-01-29" (adding a duration)
d + years(1) # => "2021-01-30" (adding a period)
```

**Answer:**
The command `d <- ymd("2020-01-20")`

assigns `d`

to a particular date (i.e., an *instant* in time).
To this, we add a time span (of 1 year) in two different ways:
`+ years(1)`

adds the *period* of 1 year (in human units), yielding the same date a year later.
By contrast, `+ dyears(1)`

adds the *duration* of 1 year (as an exact number of seconds).
As 2020 is a leap year (i.e., containing a date of “2020-02-29” and a total number of 366 days) both additions yield different results.
Thus, when dealing with common-sense units of time, adding periods typically yields intuitively more plausible results.

Note also the default time zone settings to UTC (Coordinated Universal Time).

- Explain the different results of the following two commands:

```
sat_noon <- ymd_hms("2020-03-28 12:00:00", tz = "Europe/Berlin")
# format(sat_noon, explicate_time)
# Adding time spans:
(sat_noon + ddays(1))
#> [1] "2020-03-29 13:00:00 CEST"
(sat_noon + days(1))
#> [1] "2020-03-29 12:00:00 CEST"
```

**Answer:**
`sat_noon`

is assigned to a particular *date-time point* (i.e., *instant* or *moment*) in time: Sat, 2020-03-28 12:00:00 CET (+0100 from UTC).
The `tz`

specification ensures that the time zone is set to `CET`

(i.e., corresponds to a specific location).
A difference between adding a *duration* of `ddays(1)`

and adding a *period* of `days(1)`

implies that a time shift has occurred.
In this case, Germany introduced daylight saving time (DST) on “2020-03-29”:
At 2am, the clocks are set forwards by 1 hour. Thus, adding the *duration* of 1 day (as in `+ ddays(1)`

) yields a later time than adding the *period* of 1 day (as in `+ days(1)`

). Again, adding periods yields more predictable results.

- Durations, periods, and intervals:

- Predict, evaluate, and explain the different results of the following two commands:

```
# interval ends:
int_end(next_6m_dur)
#> [1] "2021-01-06 03:02:16 CET"
int_end(next_6m_per)
#> [1] "2021-01-07 13:02:16 CET"
```

- Evaluate and explain the result of the following expression in terms of their notions of time and in common-sense terms:

**Answer:** We can re-construct the answer in 4 steps:

- The
`day_before_yesterday`

was defined as an*interval*(see above). - Shifting this interval by a
*period*of 2 days yields the*interval*of*today*(from 00:00:00 to 24:00:00). - Adding a
*period*of 12 hours to the start of*today*marks a specific date-time*point*:*noon today*. - Subtracting a
*duration*of 30 seconds yields a date-time*point*precisely*30 seconds before noon today*.

Note that this example involves four different notions of time:
Date-time *points* (i.e., *instants*, *moments*, or “POSIXct” objects) and three different types of *time spans*.

- Predict, evaluate, and explain the results of the following expressions:

```
one_year_ago <- today() - years(1)
# Dividing time spans:
dyears(1) / ddays(365) # durations
years(1) / days(1) # periods
# Dividing intervals:
(one_year_ago %--% today()) / ddays(1) # durations
(one_year_ago %--% today()) %/% days(1) # periods
```

**Hint:** These examples are inspired by Section 16.4.3 Intervals (Wickham & Grolemund, 2017), which also provides a short explanation. However, note that some definitions seem to have changed.

- Leap years in Roman numerals:

In Section 10.3.4 above, we left the leap year question regarding MCMLXXXIV unresolved:

- Was the year MCMLXXXIV (represented in Roman numerals) a leap year?

**Answer:**
A bit of experimentation with the R **utils** function `as.roman()`

reveals that the character sequence “MCMLXXXIV” represents the calendar year 1984 in Roman numerals. As it turns out, this happens to be a leap year:

### References

Colebourne, S., & O’Neill, B. (2010). Joda-time: Java date and time API. *Release*, *1*(2), 4–1. Retrieved from https://www.joda.org/joda-time/

Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. *Journal of Statistical Software, Articles*, *40*(3), 1–25. https://doi.org/10.18637/jss.v040.i03

Spinu, V., Grolemund, G., & Wickham, H. (2020). *lubridate: Make dealing with dates a little easier*. Retrieved from https://CRAN.R-project.org/package=lubridate

Wickham, H., & Grolemund, G. (2017). *R for data science: Import, tidy, transform, visualize, and model data*. Retrieved from http://r4ds.had.co.nz

The advantage of this approach is that we start with a set of date-times

`dt`

that we later want to re-create from its components (i.e., by using the`make_date()`

and`make_datetime()`

functions).↩In Section 16.4.5 Exercises of r4ds, the first question asks: “Why is there

`months()`

but no`dmonths()`

?” Thus, I can only guess that the function`dmonths()`

was absent from earlier versions of**lubridate**, but then was added later. And since the value of`dyears(1)`

suffers from the same problem (as leap years are a day longer than non-leap years), it seems ok to provide an average for estimation purposes.↩