10.4 Applications
Now that we appreciate the difficulties of dealing with dates and times, we are ready for some good news: The ds4psy package provides simple and straightforward functions that addess the most common date- and time-related tasks. As ds4psy provides no new object types for specific notions of dates or times, it is less powerful than lubridate and related packages. Nevertheless, it provides a range of simple functions that probably cover a large proportion of ordinary use cases.
10.4.1 Simple date and time functions
Assuming that we are dealing with data that contains dates and times, what do we typically want to do with them? Most of the time, we either want to get the current date or time, extract some specific element (e.g., the weekday, month, or year of a date), or perform simple calculations (e.g., to compute differences between dates and times).
The Date and POSIXt objects of R supports all these tasks, but the corresponding syntax and conversion specifications are cryptic and can be hard to remember (see Section 10.2.4). If we want to satisfice and are primarily interested in querying or displaying existing dates and times (rather than entering new dates and times) we do not need to mess with POSIXt. Instead, the ds4psy package provides a small and opinionated collection of functions that probably cover the vast majority of practical use cases for most people.
In the following, we briefly introduce functions that address the following tasks:
Querying dates and times:
- get the current date and time
- get some component of dates and times
Computing with dates and times:
- converting times between time zones
- checking for temporal idiosyncracies
- computing date and time differences
- sampling dates and times
Let’s load the package first:
10.4.2 Querying dates and times
Some simple functions allow asking for the current date and time, or extracting elements from existing dates and times.
Getting the current date and time
In my world, about 50% of all use cases are covered by two basic functions that ask for the current date or time:
cur_date()
: Get the current date, in two different orders (and an optional separator)
cur_time()
: Get the current time (with an optional display of seconds and a separator)
# Get/show current date:
cur_date()
#> [1] "2022-09-10"
# Options:
cur_date(rev = TRUE) # reverse order
#> [1] "10-09-2022"
cur_date(rev = TRUE, sep = ".") # set separator
#> [1] "10.09.2022"
# Get/show current time:
cur_time()
#> [1] "14:13"
# Options:
cur_time(seconds = TRUE) # include seconds
#> [1] "14:13:24"
cur_time(sep = "-") # set separator
#> [1] "14-13"
To avoid conflicts with base R and other packages — which already provide functions named date()
, time()
, today()
, and now()
— the ds4psy functions start with cur_
(for “current”).
By default, these functions return a character string denoting the current date or time.
However, by setting as_string = FALSE
they can be switched to returning “Date” or “POSIXct” objects:
Getting date and time components
An additional 30% of all use cases are covered by some functions that ask what_
questions about the identity or position of some temporal unit in some larger continuum of time:
what_wday()
: weekday of some date (default: today), as name (abbr or full)what_week()
: week number (in units of month, or year; as char or as integer)
what_month()
: month name (abbr or full) or month number (as char or as integer)
what_year()
: year as number (abbr or full, char or integer)
The what_
functions follow a simple heuristic: What is it that we usually want to hear as an answer x
when asking “What x
is it today?” or “What x
is it right now?”
In contrast to the cur_date()
and cur_time()
functions, all of the what_
-functions accept a when
argument as input.
If a what_
question is asked without a when
argument, the current date (i.e., Sys.Date()
) or time (i.e., Sys.time()
) is used:
# what_ defaults:
what_date()
#> [1] "2022-09-10"
what_time()
#> [1] "14:13"
what_wday()
#> [1] "Saturday"
what_week()
#> [1] "36"
what_month()
#> [1] "September"
what_year()
#> [1] "2022"
The what_
functions offer some options that provide the most common versions of these date and time components:
# what_ options:
what_date(rev = TRUE, sep = ".", month_form = "%b")
#> [1] "10.Sep.2022"
what_time(seconds = TRUE)
#> [1] "14:13:25"
what_time(seconds = TRUE, sep = ".", tz = "US/Pacific")
#> [1] "05.13.25"
what_wday(abbr = TRUE)
#> [1] "Sat"
what_week(unit = "month")
#> [1] "2"
what_month(abbr = TRUE)
#> [1] "Sep"
what_month(as_integer = TRUE)
#> [1] 9
what_year(abbr = TRUE, as_integer = TRUE)
#> [1] 22
By supplying other dates or times to when
(as scalars or vectors), we can ask and answer the what_
questions for arbitrary dates:
ds <- as.Date(c("2020-02-29", "2020-12-24", "2020-12-31"))
what_date(when = ds, month_form = "%b")
#> [1] "2020-Feb-29" "2020-Dec-24" "2020-Dec-31"
what_time(ds, tz = "UTC")
#> [1] "00:00" "00:00" "00:00"
what_wday(ds)
#> [1] "Saturday" "Thursday" "Thursday"
what_week(ds)
#> [1] "09" "52" "53"
what_month(ds)
#> [1] "February" "December" "December"
what_year(ds)
#> [1] "2020" "2020" "2020"
or arbitrary times:
ts <- as.POSIXct(c("2020-02-29 10:00:00", "2020-12-24 11:30:45", "2020-12-31 23:59:59"),
tz = "UTC")
what_date(when = ts, month_form = "%B") # Date (from time)
#> [1] "2020-February-29" "2020-December-24" "2021-January-01"
what_time(ts, tz = "US/Eastern") # Note: Time display shifted 5 hours
#> [1] "05:00" "06:30" "18:59"
what_wday(ts, abbr = TRUE) # weekday
#> [1] "Sat" "Thu" "Fri"
what_week(ts, unit = "month") # week (within month)
#> [1] "5" "4" "1"
what_month(ts, as_integer = TRUE) # month (as integer)
#> [1] 2 12 1
what_year(ts, as_integer = TRUE) # year (as integer)
#> [1] 2020 2020 2021
Internally, all these function provide handy shortcuts for date- and time-related commands and formatting instructions of the “POSIX” standard (see the date-time conversion functions in Section 10.2.4 and ?strftime
for details). Hence, anyone who feels comfortable with POSIX will not need these functions. For the rest of us, who cannot or do not care to remember POSIX, these functions can be useful.
10.4.3 Computing with dates and times
We have seen that R internally represents dates and times as numbers. As a consequence, many sophisticated computations with dates and times are possible in the numeric realm (i.e., without requiring special time-related data types or commands). However, practical applications often require that dates and times are expressed in temporal units. Things tend to get tricky when using dates and times for computing new dates and times. Some of the main tasks that appear in this contexts are:
- converting times between time zones
- checking for temporal idiosyncracies
- computing date and time differences
- sampling dates and times
In the following sections, we will briefly consider each of these tasks and illustrate some ds4psy functions that facilitate their solutions.
Converting times
Perhaps another 10% of all use cases for dealing with dates and times concern time conversions between different locations. As we have seen in Section 10.3.4, we need to distinguish between an actual point in time (i.e., a particular instant in history) and a specific time display (i.e., a particular configuration on a clock). Importantly, we can either express the same actual time in different time zones (by changing the time display) or show a fixed time display in different time zones (by changing the times). The following ds4psy functions correspond to these tasks:
change_tz()
converts a fixed actual time into the corresponding display in different time zone.change_time()
converts a fixed time diplay into the correpdonding time in a different time zone.
Both functions accept a time
(as a scalar or vector) as input (and will aim to parse any input of dates or character strings as times).
change_tz()
will express this time in terms of the time zone tz
:
# One minute before midnight (UTC):
t <- as.POSIXct("2020-12-31 23:59:00", tz = "UTC")
# Same time t in different time zones:
change_tz(t, tz = "NZ")
#> [1] "2021-01-01 12:59:00 NZDT"
change_tz(t, tz = "Europe/Berlin")
#> [1] "2021-01-01 00:59:00 CET"
change_tz(t, tz = "US/Hawaii")
#> [1] "2020-12-31 13:59:00 HST"
By contrast, change_time()
provides the (varying) points in time at which different time zones tz
show a given time display:
# Same time display in different time zones:
change_time(t, tz = "NZ")
#> [1] "2020-12-31 23:59:00 NZDT"
change_time(t, tz = "Europe/Berlin")
#> [1] "2020-12-31 23:59:00 CET"
change_time(t, tz = "US/Hawaii")
#> [1] "2020-12-31 23:59:00 HST"
Note that the last three time displays look very similar (except for the time zone), but in fact represent different points in time (and even different dates). To verify these differences, we need to convert all three different times into the same time zone:
# Convert 3 times into the same time zone:
change_tz(change_time(t, tz = "NZ"), tz = "UTC")
#> [1] "2020-12-31 10:59:00 UTC"
change_tz(change_time(t, tz = "Europe/Berlin"), tz = "UTC")
#> [1] "2020-12-31 22:59:00 UTC"
change_tz(change_time(t, tz = "US/Hawaii"), tz = "UTC")
#> [1] "2021-01-01 09:59:00 UTC"
Internally, the change_tz()
function returns a local time object of class “POSIXlt”, whereas the change_time()
function returns a calendar time object of class “POSIXct”. Ideally, users should not have to worry about these differences.
10.4.4 Checking temporal idiosyncracies
The tasks described so far cover about 90% of all use cases. The remaining 10% require additional machinery (e.g., some of the details from above, see Section 10.2.2, or the versatile toolbox provided by the lubridate package, see Section 10.3).
Overall, computing with dates and times would be easy if all units always had the same size. However, due to the existence of leap years and leap seconds, years and months have different durations, which also affects computations in other units (e.g., weeks, days, seconds). As a consequence, a common element of more sophisticated tasks involving dates and times is that we need to account for temporal idiosyncracies. Three of them are addressed by the following functions:
is_leap_year()
: Is a particular year (provided as a number or a “Date”) a leap year?
days_in_month()
: The number of days in a given month (to provided as a “Date”).
diff_tz()
: The time difference between two time zones (in “HH:MM” format or number of minutes).
The is_leap_year()
function simply asks whether a particular year (provided as a scalar or a vector of numbers or dates) is a leap year:
is_leap_year(2024) # year 2024, as a number
#> [1] TRUE
is_leap_year("2025") # year 2025, as a character
#> [1] FALSE
# Robustness: Works with dates
last_this_next_year <- c(cur_date(as_string = FALSE) - 365.25,
cur_date(as_string = FALSE),
cur_date(as_string = FALSE) + 365.25)
is_leap_year(last_this_next_year) # for dates
#> [1] FALSE TRUE FALSE
The days_in_month()
function provides the number of days of a particular month. This is a constant for most months, but the number of days in February varies for leap years:
# in 2024:
(m_2024 <- as.Date("2024-01-15") + (0:11 * 30))
#> [1] "2024-01-15" "2024-02-14" "2024-03-15" "2024-04-14" "2024-05-14"
#> [6] "2024-06-13" "2024-07-13" "2024-08-12" "2024-09-11" "2024-10-11"
#> [11] "2024-11-10" "2024-12-10"
days_in_month(m_2024) # a leap year:
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 31 29 31 30 31 30 31 31 30 31 30 31
# in 2025:
(m_2025 <- as.Date("2025-01-15") + (0:11 * 30))
#> [1] "2025-01-15" "2025-02-14" "2025-03-16" "2025-04-15" "2025-05-15"
#> [6] "2025-06-14" "2025-07-14" "2025-08-13" "2025-09-12" "2025-10-12"
#> [11] "2025-11-11" "2025-12-11"
days_in_month(m_2025) # no leap year:
#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 31 28 31 30 31 30 31 31 30 31 30 31
The diff_tz()
function provides the time difference between two times t1
and t2
that is exclusively due to both times being in different time zones.
This allows assessing the time-zone based differences between two “POSIXct” times that are located different time zones:
# Time zones differences:
tm <- "2024-01-01 01:00:00" # a nominal time
t1 <- as.POSIXct(tm, tz = "NZ")
t2 <- as.POSIXct(tm, tz = "Europe/Berlin")
t3 <- as.POSIXct(tm, tz = "US/Hawaii")
# as character (in "HH:MM"):
diff_tz(t1, t2)
#> [1] "-12:00"
diff_tz(t2, t3)
#> [1] "-11:00"
diff_tz(t1, t3)
#> [1] "-23:00"
# numeric difference (in minutes):
diff_tz(t1, t2, in_min = TRUE)
#> [1] -720
diff_tz(t2, t3, in_min = TRUE)
#> [1] -660
diff_tz(t1, t3, in_min = TRUE)
#> [1] -1380
Similarly, we can assess the time-zone based differences between two local “POSIXlt” times that are in different time zones:
# Compare 2 local times (POSIXlt):
now <- cur_time(as_string = FALSE)
t4 <- as.POSIXlt(now, tz = "NZ")
t5 <- as.POSIXlt(now, tz = "Europe/Berlin")
diff_tz(t4, t5)
#> [1] "-10:00"
diff_tz(t4, t5, in_min = TRUE)
#> [1] -600
Note that the diff_tz()
function — by providing only the time-zone based time difference — ignores any differences in the actual times:
# Ignore shift in actual time:
(t5_one_hour_later <- t5 + 3600)
#> [1] "2022-09-10 15:13:25 CEST"
diff_tz(t4, t5_one_hour_later) # as before
#> [1] "-10:00"
Shifts in time zones often coincide with shifts in locations. However, two times at the same location can nevertheless include time-zone based differences if the location happens to have switched to daylight saving time (DST) between both times:
# DSL shift: Spring ahead (on 2024-03-31: 02:00:00 > 03:00:00):
s6 <- "2024-03-31 01:00:00 CET" # before DSL switch
s7 <- "2024-03-31 11:00:00 CEST" # after DSL switch
t6 <- as.POSIXct(s6, tz = "Europe/Berlin") # CET
t7 <- as.POSIXct(s7, tz = "Europe/Berlin") # CEST
diff_tz(t6, t7) # 1 hour forwards
#> [1] "01:00"
diff_tz(t6, t7, in_min = TRUE)
#> [1] 60
10.4.5 Computing date and time differences
We may occasionally encounter use cases that require computing date or time differences (i.e., ages, expressed in various units time). As long as we are happy to express actual time differences in terms of weeks, days, hours, minutes, or seconds, subtracting dates or times (or using the difftime()
function) yields a “difftime” object (see Section 10.2.4):
# Difference between two dates:
d1 <- as.Date("2023-12-31")
d2 <- as.Date("2024-12-31")
d2 - d1
#> Time difference of 366 days
# Difference between two times:
t1 <- as.POSIXct("2024-01-01 00:00:01")
t2 <- as.POSIXct("2024-12-31 23:55:59")
t2 - t1
#> Time difference of 365.9972 days
# Using difftime() with various units:
difftime(t2, t1, units = "weeks")
#> Time difference of 52.28531 weeks
difftime(t2, t1, units = "days")
#> Time difference of 365.9972 days
difftime(t2, t1, units = "hours")
#> Time difference of 8783.933 hours
difftime(t2, t1, units = "mins")
#> Time difference of 527036 mins
difftime(t2, t1, units = "secs")
#> Time difference of 31622158 secs
Things get tricky when we want to express time differences in larger measurement units of human time (i.e., periods of years and months). As both years and months have a variable number of days, expressing time differences in these units must account for idiosyncracies of time (like leap years, the exact number of days in each month, and leap seconds).
10.4.5.1 Example: Computing someone’s age
A suprisingly difficult problem consists in determining someone’s age when their date of birth (DOB) is known. The reason is not that some people are secretive about revealing their age, but that we cannot simply subtract their year of birth from the current year. Instead, we need to take into account, whether they already had their birthday in the current year.
Additionally, the difficulty of the problem depends on the units in which we want to express the solution. In the following, we will first consider two solutions that only take into account the number of completed years.
Let’s load some data to illustrate the problem and how we can solve it.
The dataset exp_num_dt
(available in the ds4psy package or as a CSV-file from rpository.com) contains the birth dates and study participation times of 1000 ficticious people.
We read the data file into a tibble dt
and select only its date-related variables:
# dt <- readr::read_csv("http://rpository.com/ds4psy/data/dt.csv") # online
dt <- ds4psy::exp_num_dt # ds4psy package
# Select only its date-time related variables:
dt <- dt %>% select(name:byear)
# dt # 1000 x 5
# Show head of dt:
knitr::kable(head(dt), caption = "Date-related data in `dt`.")
name | gender | bday | bmonth | byear |
---|---|---|---|---|
I.G. | male | 14 | 12 | 1968 |
O.B. | male | 10 | 4 | 1974 |
M.M. | male | 28 | 9 | 1987 |
V.J. | female | 15 | 2 | 1978 |
O.E. | male | 18 | 5 | 1985 |
Q.W. | male | 1 | 3 | 1968 |
Here are two possible approaches to this task:
Basic solution
Avoiding any package specialized on time-based data, we can do the following:
Start from today’s date and use “POSIX” format commands to extract the current year, month, and day.
Create a variable that tests whether someone has already had her/his birthday this year.
Calculate the person’s
age
by as the difference of the current year to her/his year of birth, subtracting 1 if the person has not yet had her/his birthday this year.
Check whether a person already had her/his birthday in the current year:
## A. Compute age in (completed) years:
# 1. Describe today's date:
# (a) with base R:
today <- Sys.Date()
cur_year <- as.numeric(format(today, "%Y"))
cur_month <- as.numeric(format(today, "%m"))
cur_day <- as.numeric(format(today, "%d"))
# # (b) with lubridate:
# today <- today()
# cur_year <- lubridate::year(today)
# cur_month <- lubridate::month(today)
# cur_day <- lubridate::day(today)
# # (c) with ds4psy:
# today <- cur_date(as_string = FALSE)
# cur_year <- what_year(today, as_integer = TRUE)
# cur_month <- what_month(today, as_integer = TRUE)
# cur_day <- as.numeric(substr(today, 9, 10))
# 2. Check whether someone had his birthday this year and
# 3. subtract "TRUE" (=1) from difference in years if not:
dt2 <- dt %>%
mutate(had_bday_this_year = ((bmonth < cur_month) | ((bmonth == cur_month) & (bday <= cur_day))),
age = (cur_year - byear) - !had_bday_this_year) %>%
arrange(bmonth, bday) %>%
select(name:byear, age, had_bday_this_year, everything()) # re-arrange variables
# # Check:
# today
# tail(dt2 %>% filter(had_bday_this_year == TRUE))
# head(dt2 %>% filter(had_bday_this_year == FALSE))
Note that we used a dplyr pipe to compute the two variables had_bday_this_year
and age
, but could also have done this by computing new variables in base R.
A lubridate solution
When using lubridate, we can compute someone’s age (in full years) as an interval and convert the result into a meaningful common-sense unit. (See Section 10.3.3 for different notions of time spans.)
# B. A lubridate solution:
library(lubridate)
td <- today()
# Exploration:
bd <- td - years(18) # today, 18 years ago (period)
(bd %--% td) # interval (of life)
#> [1] 2004-09-10 UTC--2022-09-10 UTC
(bd %--% td) / years(1) # in terms of years (as period)
#> [1] 18
(bd %--% td) %/% years(1) # integer division: completed years (as period)
#> [1] 18
# Computing age_2:
dt2 <- dt2 %>%
mutate(bdate = make_date(year = byear, month = bmonth, day = bday),
life = bdate %--% today(), # interval from bday to today()
age_2 = life %/% years(1) # integer division (into a period of full years)
) %>%
arrange(bmonth, bday) %>%
select(name:byear, bdate, age, age_2, everything())
# # Check:
# today()
# tail(dt2 %>% filter(had_bday_this_year == TRUE))
# head(dt2 %>% filter(had_bday_this_year == FALSE))
Both solutions should yield identical results:
Many additional solutions to the surprisingly difficult task of computing someone’s age (in completed years) are possible by other ways of computing and rounding time spans. In Exercise 9 of Chapter 11, we will write a function that computes someone’s age (see Section 11.6.9).
A ds4psy solution
For situations, in which the base R functionality of difftime()
are insufficient and the sophistication of various lubridate time spans seems unwarranted, the two following ds4psy functions are useful:
diff_dates()
computes the difference between two dates in common units of human time measurement (years, months, and days).diff_times()
computes the difference between two times in common units of human time measurement (years, months, days, hours, minutes, and seconds).
The following examples illustrate the diff_dates()
function:
dt_1 <- as.Date("2000-01-01")
dt_2 <- as.Date("2024-12-31")
diff_dates(from_date = dt_1, to_date = dt_2)
#> [1] "24y 11m 30d"
diff_dates(dt_1, dt_2, unit = "month")
#> [1] "299m 30d"
diff_dates(dt_1, dt_2, unit = "day")
#> [1] "9131d"
By default, diff_dates()
expects that from_date
and to_date
are dates (i.e., R objects of type “Date”).
If to_date
remains unspecified, the function uses the current date as its reference point:
diff_dates(from_date = dt_1)
#> [1] "22y 8m 9d"
diff_dates(from_date = dt_1, cur_date())
#> [1] "22y 8m 9d"
Actually, we have seen above that cur_date()
yields a character string.
Thus, the diff_dates()
function also works when entering strings that can be interpreted as valid dates:
If both date arguments are reversed, the result is negated:
The diff_dates()
function also works with vectors of dates.
When setting as_character = FALSE
, the results are returned numerically (as the columns of a tibble):
to_dates <- paste("2024", 1:12, "11", sep = "-")
diff_dates(from_date = "2000-01-01", to_dates, as_character = FALSE)
#> from_date to_date neg y m d
#> 1 2000-01-01 2024-01-11 24 0 10
#> 2 2000-01-01 2024-02-11 24 1 10
#> 3 2000-01-01 2024-03-11 24 2 10
#> 4 2000-01-01 2024-04-11 24 3 10
#> 5 2000-01-01 2024-05-11 24 4 10
#> 6 2000-01-01 2024-06-11 24 5 10
#> 7 2000-01-01 2024-07-11 24 6 10
#> 8 2000-01-01 2024-08-11 24 7 10
#> 9 2000-01-01 2024-09-11 24 8 10
#> 10 2000-01-01 2024-10-11 24 9 10
#> 11 2000-01-01 2024-11-11 24 10 10
#> 12 2000-01-01 2024-12-11 24 11 10
The diff_times()
function is an extension of diff_dates()
that includes temporal units within days (i.e., hours, minutes, and seconds):
t1 <- "2024-07-01 12:00:00"
t2 <- "2024-08-10 20:30:45"
diff_times(t1, t2, unit = "months")
#> [1] "1m 9d 8H 30M 45S"
diff_times()
is a somewhat experimental function that internally uses two different ways of computing the number of days between two times and provides a feedback message if the two derivations yield conflicting results. For instance, the following example yields two conflicting results:
t1 <- "2024-06-01 12:00:00"
t2 <- "2024-12-01 11:00:00"
diff_times(t1, t2, unit = "year")
#> [1] "0y 6m -1d 23H 0M 0S"
Without additional constraints, t2
can be viewed as either 5 months and 29 days or as 6 months and \(-1\) day later than t1
.
The solution provided here corresponds to the period of a lubridate interval:
Note that diff_times()
and lubridate::as.period()
can yield results that seem wrong in physical time units, but are right in human calendar sense.
For instance, when an interval contains a time shift (e.g., a switch in clock times from a winter to a summer season), we obtain two different results:
t1 <- "2024-03-31 01:00:00 CET" # before DSL switch (in 2024)
t2 <- "2024-03-31 11:00:00 CET" # after DSL switch (to CEST)
# Time zone and physical time difference:
diff_tz(t1, t2) # difference in time zones (1h)
#> [1] "01:00"
difftime(t2, t1) # physical time units: 9 hours
#> Time difference of 9 hours
# In human time units:
diff_times(t1, t2) # in human calendar units: 10 hours
#> [1] "0d 10H 0M 0S"
lubridate::as.period(lubridate::interval(t1, t2), unit = "hours")
#> [1] "10H 0M 0S"
Sampling dates and times
R’s ability to draw random samples is an important tool for simulations (see the sample()
function in Section 1.6.4).
The ds4psy package extends this functionality to dates and times:
sample_date()
draws random samples of dates (in a specified range)sample_time()
draws random samples of times (in a specified range)
By default, both functions draw random dates or times since the Unix epoch (i.e., “1970-01-01 00:00:00”):
# Sampling default ranges:
sample_date(size = 5)
#> [1] "2009-12-31" "1997-07-03" "1984-06-23" "2004-11-12" "2003-05-09"
sample_time(size = 5)
#> [1] "2022-05-08 03:32:53 CEST" "1973-11-27 18:43:50 CET"
#> [3] "1995-06-07 22:03:07 CEST" "1983-03-31 23:28:50 CEST"
#> [5] "1997-05-26 00:04:26 CEST"
but can be constrained to specific ranges:
# Restrict range to 1990s (1990 to 1999):
the_90s <- sample_date(from = "1990-01-01", to = "1999-12-31", size = 5)
sort(the_90s)
#> [1] "1991-08-20" "1993-03-18" "1996-05-29" "1997-08-17" "1999-11-22"
# Restrict range to last minute (allowing for duplicates):
last_min <- sample_time(from = Sys.time() - 60, to = Sys.time(),
size = 5, replace = TRUE, tz = "US/Eastern")
sort(last_min)
#> [1] "2022-09-10 14:12:30 EDT" "2022-09-10 14:13:10 EDT"
#> [3] "2022-09-10 14:13:13 EDT" "2022-09-10 14:13:18 EDT"
#> [5] "2022-09-10 14:13:20 EDT"
Before wrapping up this chapter, here are some practice exercises on the commands of this section.
Practice
- Converting times or time displays:
Compare the ds4psy functions change_tz()
and change_time()
with the lubridate functions with_tz()
and force_tz()
.
(See Exercise 4, Section 10.6.4.)
- Counting on leap years:
What does the expression 365 + is_leap_year(cur_date()))
represent and why?
# Number of days in current year:
365 + is_leap_year(cur_date())
# Explanation:
cur_date() # current date (as a string)
is_leap_year(cur_date()) # is TRUE of FALSE (by coercing string into date/year)
365 + is_leap_year(cur_date()) # adds Boolean expression as either 1 or 0
# Note: Number of days in last year:
365 + is_leap_year(cur_date(as_string = FALSE) - 365.25)
# Number of days in some particular date/year:
is_leap_year(as.Date("1984-09-11"))
- Shifting dates and times in time zones:
Predict, evaluate, and explain the result of the following expressions:
# (1) Same date and different times in same time zone:
t1 <- "2024-12-01 00:00:00 NZ"
t2 <- "2024-12-01 23:59:59 NZ"
diff_tz(t1, t2) # same date and tz
# (2) Different dates and times in same time zone:
tx <- "2024-06-30 00:00:00 Europe/London"
ty <- "2024-12-01 23:59:59 Europe/London"
diff_tz(tx, ty)
- Time elapsed since historic events
Earlier in this chapter, we computed the amount of time elapsed
since the sinking of the Titanic (defined as
dt_Titanic
above)the 9/11 attacks (defined as
tm_911
above)
Use the ds4psy functions diff_dates()
and diff_times()
to compute the amount of time elapsed since these events (in some meaningful human unit) and compare the results to the base R and lubridate solutions presented earlier.
- For the amount of time elapsed since
dt_Titanic
we get:
# Sinking of the Titanic:
dt_Titanic # a "Date" object
# (a) Using ds4psy:
now <- cur_time(as_string = FALSE)
diff_dates(dt_Titanic, now)
diff_dates(dt_Titanic, now, unit = "days")
diff_times(dt_Titanic, now, unit = "years")
diff_times(dt_Titanic, now, unit = "days")
# (b) Using base R:
base::difftime(now, dt_Titanic)
# (c) Using lubridate:
lubridate::as.period(lubridate::interval(dt_Titanic, now), unit = "years")
lubridate::as.period(lubridate::interval(dt_Titanic, now), unit = "days")
- For the amount of time elapsed since
tm_911
we get:
tm_911 # a "POSIXct" object
# 9/11 attacks:
diff_dates(tm_911, now)
diff_times(tm_911, now, unit = "days")
diff_times(tm_911, now, unit = "years")
diff_times(tm_911, now, unit = "days")
# Using base R:
base::difftime(now, tm_911)
# Using lubridate:
lubridate::as.period(lubridate::interval(tm_911, now), unit = "years")
lubridate::as.period(lubridate::interval(tm_911, now), unit = "days")
- What explains the differences between some results?
Answer: The differences in hour counts between the solutions of the ds4psy function diff_times()
and the periods of lubridate intervals can partly be explained by handling shifts in time zones in different ways:
- Checking date and time sampling functions
What kind of distributions do the ds4psy functions for sampling dates and times yield?
- Check the distributions of months and weekdays when using the
sample_date()
function. - Check the distributions of hours and minutes when using the
sample_time()
function.
Draw a sufficiently large number of samples and then use dplyr to count the number of categorical cases (e.g., of each month, weekday, hour, etc.).
- Analysis for
sample_date()
:
N <- 100000
dts <- sample_date(from = "2000-01-01", to = "2024-12-31", size = N, replace = TRUE)
# tibble with month and wday:
dtb <- tibble(date = dts) %>%
mutate(month = what_month(date, abbr = TRUE),
wday = what_wday(date, abbr = TRUE))
# Turn variables into factors:
dtb$month <- factor(dtb$month, levels = month.abb)
dtb$wday <- factor(dtb$wday, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))
# Counts: Percentages of total N:
dtb %>% group_by(month) %>% summarise(pc = n()/N * 100) # by month
dtb %>% group_by(wday) %>% summarise(pc = n()/N * 100) # by wday
Answer: The distribution of weekdays appear to be uniform.
Note that the shape of the distribution for months approximates days_in_month()
(e.g., with a minimum for February).
- Analogously, for
sample_time()
:
N <- 100000
ts <- sample_time(from = "2024-01-01", to = "2024-12-31", size = N, replace = TRUE)
# tibble with month and wday:
tbs <- tibble(time = ts) %>%
mutate(ts = what_time(time),
hour = substring(ts, 1, 2),
min = substring(ts, 4, 5))
# Counts: Percentages of total N:
tbs %>% group_by(hour) %>% summarise(pc = n()/N * 100) # by hour
tbs %>% group_by(min) %>% summarise(pc = n()/N * 100) # by min
Answer: The distributions of hours and minutes appear to be uniform.
10.4.6 Limitations
Currently, ds4psy mostly assumes that the variables with dates and times exist in our data (as “Date” or “POSIXct”/“POSIXlt” objects). Thus, it currently provides no dedicated parsing functions for dates and times. However, functions aim to parse date and time objects from a variety of formats (e.g., strings). For entering dates and times or generating them from other data, either use the POSIXt standard, or consult one of many nifty solutions (e.g., anytime, readr, and related packages).
Also, calculations with dates and times beyond date and time differences are not supported by the ds4psy package. However, you can often calculate with R’s elementary time classes (see Section 10.2.2) and then convert the result into more convenient formats or units.
This concludes our overview of date- and time-related functions. Let’s wrap up this chapter and do some exercises to check what we have learned.