10.4 Applications

Now that we appreciate the difficulties of dealing with dates and times, we are ready for some good news: The ds4psy package provides simple and straightforward functions that addess the most common date- and time-related tasks. As ds4psy provides no new object types for specific notions of dates or times, it is less powerful than lubridate and related packages. Nevertheless, it provides a range of simple functions that probably cover a large proportion of ordinary use cases.

10.4.1 Simple date and time functions

Assuming that we are dealing with data that contains dates and times, what do we typically want to do with them? Most of the time, we either want to get the current date or time, extract some specific element (e.g., the weekday, month, or year of a date), or perform simple calculations (e.g., to compute differences between dates and times).

The Date and POSIXt objects of R supports all these tasks, but the corresponding syntax and conversion specifications are cryptic and can be hard to remember (see Section 10.2.4). If we want to satisfice and are primarily interested in querying or displaying existing dates and times (rather than entering new dates and times) we do not need to mess with POSIXt. Instead, the ds4psy package provides a small and opinionated collection of functions that probably cover the vast majority of practical use cases for most people.

In the following, we briefly introduce functions that address the following tasks:

  1. Querying dates and times:

    • get the current date and time
    • get some component of dates and times
  1. Computing with dates and times:

    • converting times between time zones
    • checking for temporal idiosyncracies
    • computing date and time differences
    • sampling dates and times

Let’s load the package first:

10.4.2 Querying dates and times

Some simple functions allow asking for the current date and time, or extracting elements from existing dates and times.

Getting the current date and time

In my world, about 50% of all use cases are covered by two basic functions that ask for the current date or time:

  • cur_date(): Get the current date, in 2 different orders (and an optional separator)
  • cur_time(): Get the current time (with an optional display of seconds and a separator)

To avoid conflicts with base R and other packages — which already provide functions named date(), time(), today(), and now() — the ds4psy functions start with cur_ (for “current”).

By default, these functions return a character string denoting the current date or time. However, by setting as_string = FALSE they can be switched to returning “Date” or “POSIXct” objects:

Getting date and time components

An additional 30% of all use cases are covered by some functions that ask what_ questions about the identity or position of some temporal unit in some larger continuum of time:

  • what_wday(): weekday of some date (default: today), as name (abbr or full)
  • what_week(): week number (in units of month, or year; as char or as integer)
  • what_month(): month name (abbr or full) or month number (as char or as integer)
  • what_year(): year as number (abbr or full, char or integer)

The what_ functions follow a simple heuristic: What is it that we usually want to hear as an answer x when asking “What x is it today?” or “What x is it right now?”

In contrast to the cur_date() and cur_time() functions, all of the what_-functions accept a when argument as input. If a what_ question is asked without a when argument, the current date (i.e., Sys.Date()) or time (i.e., Sys.time()) is used:

The what_ functions offer some options that provide the most common versions of these date and time components:

By supplying other dates or times to when (as scalars or vectors), we can ask and answer the what_ questions for arbitrary dates:

or arbitrary times:

Internally, all these function provide handy shortcuts for date- and time-related commands and formatting instructions of the “POSIX” standard (see the date-time conversion functions in Section 10.2.4 and ?strftime for details). Hence, anyone who feels comfortable with POSIX will not need these functions. For the rest of us, who cannot or do not care to remember POSIX, these functions can be useful.

10.4.3 Computing with dates and times

We have seen that R internally represents dates and times as numbers. As a consequence, many sophisticated computations with dates and times are possible in the numeric realm (i.e., without requiring special time-related data types or commands). However, practical applications often require that dates and times are expressed in temporal units. Things tend to get tricky when using dates and times for computing new dates and times. Some of the main tasks that appear in this contexts are:

  • converting times between time zones
  • checking for temporal idiosyncracies
  • computing date and time differences
  • sampling dates and times

In the following sections, we will briefly consider each of these tasks and illustrate some ds4psy functions that facilitate their solutions.

Converting times

Perhaps another 10% of all use cases for dealing with dates and times concern time conversions between different locations. As we have seen in Section 10.3.4, we need to distinguish between an actual point in time (i.e., a particular instant in history) and a specific time display (i.e., a particular configuration on a clock). Importantly, we can either express the same actual time in different time zones (by changing the time display) or show a fixed time display in different time zones (by changing the times). The following ds4psy functions correspond to these tasks:

  1. change_tz() converts a fixed actual time into the corresponding display in different time zone.

  2. change_time() converts a fixed time diplay into the correpdonding time in a different time zone.

Both functions accept a time (as a scalar or vector) as input (and will aim to parse any input of dates or character strings as times). change_tz() will express this time in terms of the time zone tz:

By contrast, change_time() provides the (varying) points in time at which different time zones tz show a given time display:

Note that the last three time displays look very similar (except for the time zone), but in fact represent different points in time (and even different dates). To verify these differences, we need to convert all three different times into the same time zone:

Internally, the change_tz() function returns a local time object of class “POSIXlt”, whereas the change_time() function returns a calendar time object of class “POSIXct”. Ideally, users should not have to worry about these differences.

10.4.4 Checking temporal idiosyncracies

The tasks described so far cover about 90% of all use cases. The remaining 10% require additional machinery (e.g., some of the details from above, see Section 10.2.2, or the versatile toolbox provided by the lubridate package, see Section 10.3).

Overall, computing with dates and times would be easy if all units always had the same size. However, due to the existence of leap years and leap seconds, years and months have different durations, which also affects computations in other units (e.g., weeks, days, seconds). As a consequence, a common element of more sophisticated tasks involving dates and times is that we need to account for temporal idiosyncracies. Three of them are addressed by the following functions:

  • is_leap_year(): Is a particular year (provided as a number or a “Date”) a leap year?
  • days_in_month(): The number of days in a given month (to provided as a “Date”).
  • diff_tz(): The time difference between two time zones (in “HH:MM” format or number of minutes).

The is_leap_year() function simply asks whether a particular year (provided as a scalar or a vector of numbers or dates) is a leap year:

The days_in_month() function provides the number of days of a particular month. This is a constant for most months, but the number of days in February varies for leap years:

The diff_tz() function provides the time difference between two times t1 and t2 that is exclusively due to both times being in different time zones. This allows assessing the time-zone based differences between “POSIXct” times in different time zones:

Similarly, we can assess the time-zone based differences between “POSIXlt” times in different time zones:

Note that the diff_tz() function — by only measuring the time-zone based time difference — ignores any differences in the actual times:

Shifts in time zones often coincide with shifts in locations. However, two times at the same location can also include time-zone based differences if the location happens to have switched to daylight saving time (DST) between both times:

10.4.5 Computing date and time differences

We occasionally encounter use cases that require computing date or time differences (i.e., ages, expressed in various units time). As long as we are happy to express actual time differences in terms of weeks, days, hours, minutes, or seconds, subtracting dates or times (or using the difftime() function) yields a “difftime” object (see Section 10.2.4):

Things get tricky when we want to express time differences in larger measurement units of human time (i.e., periods of years and months). As both years and months have a variable number of days, expressing time differences in these units must account for idiosyncracies of time (like leap years, the exact number of days in each month, and leap seconds).

10.4.5.1 Example: Computing someone’s age

A suprisingly difficult problem consists in determining someone’s age when their date of birth (DOB) is known. The reason is not that some people are secretive about revealing their age, but that we cannot simply subtract their year of birth from the current year. Instead, we need to take into account, whether they already had their birthday in the current year.

Additionally, the difficulty of the problem depends on the units in which we want to express the solution. In the following, we will first consider two solutions that only take into account the number of completed years.

Let’s load some data to illustrate the problem and how we can solve it. The dataset exp_num_dt (available in the ds4psy package or as a CSV-file from rpository.com) contains the birth dates and study participation times of 1000 ficticious people.

We read the data file into a tibble dt and select only its date-related variables:

Table 10.2: Date-related data in dt.
name gender bday bmonth byear
I.G. male 14 12 1968
O.B. male 10 4 1974
M.M. male 28 9 1987
V.J. female 15 2 1978
O.E. male 18 5 1985
Q.W. male 1 3 1968

Here are two possible approaches to this task:

A base R solution

  1. Start from today’s date and use “POSIX” format commands to extract the current year, month, and day.

  2. Create a variable that tests whether someone has already had her/his birthday this year.

  3. Calculate the person’s age by as the difference of the current year to her/his year of birth, subtracting 1 if the person has not yet had her/his birthday this year.

Check whether a person already had her/his birthday in the current year.

Note that we used a dplyr pipe to compute the two variables had_bday_this_year and age, but could also have done this by computing new variables in base R.

A ds4psy solution

For situations, in which the base R functionality of difftime() are insufficient and the sophistication of various lubridate time spans seems unwarranted, the two following ds4psy functions are useful:

  • diff_dates() computes the difference between two dates in common units of human time measurement (years, months, and days).

  • diff_times() computes the difference between two times in common units of human time measurement (years, months, days, hours, minutes, and seconds).

The following examples illustrate the diff_dates() function:

By default, diff_dates() expects that from_date and to_date are dates (i.e., R objects of type “Date”). If to_date remains unspecified, the function uses the current date as its reference point:

Actually, we have seen above that cur_date() yields a character string. Thus, the diff_dates() function also works when entering strings that can be interpreted as valid dates:

If both date arguments are reversed, the result is negated:

The diff_dates() function also works with vectors of dates. When setting as_character = FALSE, the results are returned numerically (as the columns of a tibble):

The diff_times() function is an extension of diff_dates() that includes temporal units within days (i.e., hours, minutes, and seconds):

diff_times() is a somewhat experimential function that internally uses two different ways of computing the number of days between two times and provides a feedback message if the two derivations yield conflicting results. For instance, the following example yields two conflicting results:

Without additional constraints, t2 can be viewed as either 5 months and 29 days or as 6 months and \(-1\) day later than t1. The solution provided here corresponds to the period of a lubridate interval:

Sampling dates and times

R’s ability to draw random samples is an important tool for simulations (see the sample() function in Section 1.6.4). The ds4psy package extends this functionality to dates and times:

  • sample_date() draws random samples of dates (in a specified range)
  • sample_time() draws random samples of times (in a specified range)

By default, both functions draw random dates or times since the Unix epoch (i.e., “1970-01-01 00:00:00”):

but can be constrained to specific ranges:

Before wrapping up this chapter, here are some practice exercises on the commands of this section.

Practice

  1. Converting times or time displays:

Compare the ds4psy functions change_tz() and change_time() with the lubridate functions with_tz() and force_tz(). (See Exercise 4, Section 10.6.4.)

  1. Counting on leap years:

What does the expression 365 + is_leap_year(cur_date())) represent and why?

  1. Shifting dates and times in time zones:

Predict, evaluate, and explain the result of the following expressions:

  1. Time elapsed since historic events

Earlier in this chapter, we computed the amount of time elapsed

  • since the sinking of the Titanic (defined as dt_Titanic above)

  • the 9/11 attacks (defined as tm_911 above)

Use the ds4psy functions diff_dates() and diff_times() to compute the amount of time elapsed since these events (in some meaningful human unit) and compare the results to the base R and lubridate solutions presented earlier.

  • For the amount of time elapsed since dt_Titanic we get:
  • For the amount of time elapsed since tm_911 we get:
  • What explains the differences between some results?

Answer: The differences in hour counts between the solutions of the ds4psy function diff_times() and the periods of lubridate intervals can partly be explained by handling shifts in time zones in different ways:

  1. Checking date and time sampling functions

What kind of distributions do the ds4psy functions for sampling dates and times yield?

  • Check the distributions of months and weekdays when using the sample_date() function.
  • Check the distributions of hours and minutes when using the sample_time() function.

Draw a sufficiently large number of samples and then use dplyr to count the number of categorical cases (e.g., of each month, weekday, hour, etc.).

  • Analysis for sample_date():

Answer: The distribution of weekdays appear to be uniform. Note that the shape of the distribution for months approximates days_in_month() (e.g., with a minimum for February).

  • Analogously, for sample_time():

Answer: The distributions of hours and minutes appear to be uniform.

10.4.6 Limitations

Currently, ds4psy mostly assumes that the variables with dates and times exist in our data (as “Date” or “POSIXct”/“POSIXlt” objects). Thus, it currently provides no dedicated parsing functions for dates and times. However, functions aim to parse date and time objects from a variety of formats (e.g., strings). For entering dates and times or generating them from other data, either use the POSIXt standard, or consult one of many nifty solutions (e.g., anytime, readr, and related packages).

Also, calculations with dates and times beyond date and time differences are not supported by the ds4psy package. However, you can often calculate with R’s elementary time classes (see Section 10.2.2) and then convert the result into more convenient formats or units.

This concludes our overview of date- and time-related functions. Let’s wrap up this chapter and do some exercises to check what we have learned.