10.2 Essentials on dates and times

This chapter deals with date- and time-related data in R. This section briefly makes some terminological distinctions that provide a better conceptual grasp of notions of date and time (Section 10.2.1). The bulk of this section introduces base R classes and functions (Section 10.2.2) for creating and computing with dates (Section 10.2.3) and times (Section 10.2.4).

In a later section (Section 10.3), we will see that the lubridate package (Spinu et al., 2020) provides more convenient functions for the same tasks and goes beyond the base R functionality in providing several notions of time spans.

10.2.1 Terminology

This chapter is called “Time data”, but dates and times are different notions. When explicating our common-sense understanding of the terms “date” and “time”, we can distinguish between them as follows:

  • A date denotes a particular day, which denotes a time span or period of time that has a beginning and an end. Although its exact demarcations and duration can be controversial, we typically think of a date as a 24-hour period of time at a particular location.

  • A time denotes a particular instant or moment within a longer period of time. Thus, a time is typically thought of as a point within a day. Whereas a day is 24 hours long, a point in time has no duration. Measuring and reporting times inevitably raises issues about conventions (e.g., AM/PM), precision (e.g., minutes, seconds, milliseconds), and location (e.g., time shifts and zones).

Based on this conceptualization, any date contains an infinite number of times, just like a line contains an infinite number of points. But when scaling up to longer periods of time (e.g., years, or decades, or centuries), we can view each date as a particular point on a larger continuum of time.41

Interestingly, the complexities are reversed when defining data structures that represent dates and times. For precisely specifying a particular time also requires specifying the date within it is embedded. Thus, our definitions of time objects will necessarily be more complex than those of date objects.

Additional challenges are provided by the specific tasks that we want to tackle with date- and time-related data. We typically want to do many different things with time: Beyond denoting specific points in time, we mostly want to transform times (e.g., between time zones) and describe and calculate with time spans. A time span can also be referred to as a duration, a time period, or as an interval. As we will see, these concepts are based on slightly different notions of time, which are distinguished by the lubridate package (see Section 10.3.3).

10.2.2 Base R classes for dates and times

The base R package answers St. Augustine’s question What then is time? by no less than three classes of objects for representing dates and times:

  1. Date represents the date of days without considering times;
  2. POSIXct is the main date-time class for representing calendar time;
  3. POSIXlt is an additional date-time class for representing local time (as lists).

In the following sections, we will consider each of these classes in turn. Entering ?Dates in the R Console shows the documentation on the “Date” class; ?DateTimeClasses provides documentation on the two “POSIXt” classes.

10.2.3 Dates

The Date class is used when we are only dealing with dates, not times. As we have seen above that representing times is more complex than representing dates, we should always use this class when we are only interested in dates (e.g., the day on which some event \(x\) happened), rather than more specific details (e.g., the time of \(x\)).

Obtaining the current date

In R, the current date can be obtained by evaluating the Sys.Date() function:

Calling Sys.Date() prints a string, but actually returns an object of class Date that can be assigned to a variable (e.g., dt):

Evaluating ?Dates in the R Console provides documentation on the “Date” class.

Figuring out what kind of object dt is, is non-trivial. Curiously, dates appear somewhat similar to numbers, but not entirely:

To reveal the internal representation of dates in R, we can apply the base R function unclass() to a date:42

Curiously, evaluating unclass(dt) yields the number 18473. To find out what kind of number this is, we can apply the ds4psy function is.wholenumber() to the date dt:43

Thus, objects of the class Date are internally represented as numbers. More specifically, dates in R are actually integers. But this insight only raises further questions:

  • How are the values of these integers being determined?

  • More specifically, what do dates count, and where does the counting begin?

When viewing integers as counts of something, these questions asks about the mechanism, the unit, and the origin of counting. The following code snippets answer both questions:

Thus, the “original” date (corresponding to a count of zero) is defined as “1970-01-01” or “Thursday, 1970-01-01” and the internal value of a date denotes the number of days elapsed since then.

But this creates yet another conundrum:

  • What about dates before 1970? Do we need another system for historic dates?

Fortunately, the answer is no, as the following example shows:

When dates are represented as the number of days since “1970-01-01”, any earlier date is simply described by counting backwards and using negative integer values to indicate the direction of the count.

Incidentally, the date of “1970-01-01” is also known as the Unix epoch (see Wikipedia: Unix_time for details) and can be obtained by evaluating orgin (of the lubridate package):

Creating (and representing) dates

So far, we have learned that Sys.Date() shows the current date and dates are internally represented as integers. However, we also need some way of entering arbitrary dates (i.e., dates that are not today’s date). And once we have created a date, we will want to query it for particular parts (e.g., its month, or weekday). As we will see, entering and interpreting dates in base R can be tricky, as it requires knowing some vocabulary of a date-time description language called “POSIX” (see below).

When dealing with dates, we essentially face two distinct tasks:

  1. We want to create new date objects by entering or parsing dates (as strings).

  2. We want to retrieve dates (or particular elements of dates) from existing date objects.

Both of these tasks require that we can denote and label particular elements of dates — that will be the job of “POSIX” and so called “conversion specifications”. But before we learn to enter arbitrary dates, our insight that dates are actually represented as integers raises another interesting question:

  • If dates are numbers, are we to enter dates as numbers as well?

The following shows that entering dates as numbers is possible:

Thus, the date of the number 0 in R is January 1st of 1970 (i.e., the Unix epoch). However, as we typically do not think of dates in terms of integer distances from the Unix epoch, always entering dates as numbers would be rather inconvenient.

As we have already seen above (e.g., when defining the date Titanic_sinks), R accommodates our human-need for depicting dates in terms of years, months, and days. Note that input to the as.Date() function used to define the date Titanic_sinks actually was a character string:

It is good to know that this seems to work. However, it also raises further questions:

  • How does R figure out which number refers to the year, month, and day?

  • What else does R know about a given “Date” object?

The first of these questions either calls for some rule or convention (i.e., always enter and display dates in a specific order) or for a way of telling R which particular interpretation is desired by us. The second question hints at additional aspects of dates that we may want to find out, even though we did not specify them when entering the date. For instance, on which day of the week did the Titanic sink? How is the 4th month of the year called in our local language?

Both of these questions are addressed by the so-called “POSIX” standard that applies to both dates and times and will be discussed below (in Section 10.2.4). Effectively, this standard provides the elements of a language for describing and querying date and time objects to enter or obtain specific information.

For instance, by querying our Titanic_sinks date, we can obtain:

Actually, the date of “1912-04-15” is pretty unambiguous to begin with. We automatically recognize the 4-digit number as a year; and as the number 15 can hardly refer to a month, we have no trouble parsing the date as the 15th day of the month April. However, computers are not so good at making such intuitive judgments — and our intuition can also fail us when confronting with other examples. For instance, how should we (or R) interpret the following strings of text:

In the year 2020, we are likely to interpret these strings as denoting a date in the year 2001. But do they denote the 2nd day of March, or the 3rd day of February? And does it matter for the interpretation whether a string uses the symbol “-” or “/” as a separator between numbers? (Guess the results before viewing them below.)

Here is how R interprets the strings when evaluating them as the argument of the as.Date() function:

As it turns out, we were mistaken about the year, and the symbol used as separator did not matter. But without further details or examples, we still would not know which number denotes the day and which the month.

Thus, interpreting dates from data implies an enormous potential for confusion — and adding times (in Section 10.2.4) will only get more complicated. Essentially, these considerations illustrate an important point:

  • When representations are ambiguous, we typically rely on heuristics and conventions for their interpretation.

In the case of dates and times, such conventions regulate the identity and order of time-related units (i.e., the basic “elements” of dates, such as days, months, and years). Unfortunately, however, conventions do not yet solve our problems. As it turns out, representations of dates and times follow different conventions in different times and locations. Thus, a second insight comes into focus:

  • When heuristics and conventions yield conflicting results, we must define and rely on (ideally smart) standards.

But even if an excellent standard was created and ratified, not everyone would follow it. Thus, while technology is well-advised to adhere to standards, we still require tools for dealing with violations.

Overall, all this shows that issues of representation, interpretation, and standardization are closely intertwined. Thinking about ways of representing data raises comparative and normative questions. And even if established standards exist, gaining technological expertise requires skills and tools that allow us to cope with any inconsistencies, glitches, and outright violations that we may encounter. In short, dealing with dates and times requires handling a notoriously messy part of real-world data.

The elements of dates

This section uses an extended example that further illustrates common problems when recording dates in data and the solutions provided by R. As the example is rather long and convoluted, here is a sneak preview of its key points:

  • A single character string allows for many alternative representations of dates that correspond to different interpretations.

  • This abundance of options nurtures ambiguity, which is typically resolved by local conventions (e.g., in the EU and US). In the case of dates, even existing conventions are conflicting and confusing, as they involve inconsistent idiosyncracies and vary between locations.

  • Overall, the best way of entering dates in data uses a 4-digit year (yyyy) and arranges the units for years, months, and days in decreasing magnitude (yyyy-mm-dd). In R, this order is used by default and explicated by the conversion specification "%Y-%m-%d".

Example:

Assume the following assignement of x:

When being told that x represents a date, we should ask two questions:

  • What date does the character string x represent?

  • Which date will the expression as.Date(x) translate into?

Note that these points really address two distinct questions: The first asks about the meaning of x (i.e., a particular date in history), whereas the second asks about the default assumptions implemented in the as.Date() function of the R programming language. Ideally, there should be a correspondence between both issues, but the very fact that we can state and check this implies that they really are different.

To answer the first question, both R and any human interpreter would need to know what the three 2-digit sequences in the character string "01-02-03" represent. It is pretty clear that one of them will denote the year (expressed in R as %y if it only contains 2 digits, or %Y if it used the common 4-digit format), one will denote the month (%m), and one will denote the day (%d) within the current month. But knowing this still leaves room for six different orders of these three elements and corresponding interpretations (see Table 10.1 below).

To see what R would do by default, we can evaluate as.Date("01-02-03"):

Thus, R interprets as.Date(x) as 0001-02-03, which denotes the historical date of February 03, 0001 (Saturday), is internally represented in R as the negative integer -719129, and probably not the date we wanted to enter.

To prevent such issues and errors, we need to specify how the string "01-02-03" (or any other argument x) is to be interpreted. To this purpose, the as.Date() function takes a format argument that allows specifying a so-called “conversion specification” (provided as a character string). This second string tells R how to parse and interpret the first argument x (also provided as a string). When specifying the day, month, and year by a 2-digit number each (using %d for the day, m for the month, and y for the year), we can arrange these elements in 6 different orders.

Table 10.1: Possible orders and interpretations of three date elements (%y, %m, and %d).
Nr. Format: Date (in R): Details: Evaluation:
1 "%y-%m-%d" 2001-02-03 February 03, 2001 (Saturday) best (decreasing units)
2 "%y-%d-%m" 2001-03-02 March 02, 2001 (Friday) bad
3 "%m-%y-%d" 2002-01-03 January 03, 2002 (Thursday) abysmal
4 "%m-%d-%y" 2003-01-02 January 02, 2003 (Thursday) bad (despite US convention)
5 "%d-%y-%m" 2002-03-01 March 01, 2002 (Friday) abysmal
6 "%d-%m-%y" 2003-02-01 February 01, 2003 (Saturday) ok (increasing units, EU convention)

Table 10.1 shows the 6 possible candidate representations and corresponding interpretations of the string “01-02-03” when varying the orders of three 2-digit date elements (%y, %m, and %d). The table demonstrates that order really matters for the meaning of dates. This affects how we should enter dates into data and raises the question:

  • What should we do: Which is the best order to use?

To answer this normative question, let’s inspect and evaluate the 6 candidate interpretations:

  • Orders 3 and 5 both place the digits denoting the year (%y) in the middle between day and month. This makes absolutely no sense and is therefore classified as abysmal.

  • Orders 2 and 4 are similarly bad, by putting the day (%d) in the middle between month and year. This is also pretty bad and confusing, as it lacks any internal logic and relies on memorizing this specific sequence. Nevertheless, Order 4 (mm-dd-yy or %m-%d-%y) happens to be the dominant U.S. convention.

  • Order 6 (dd-mm-yy or %d-%m-%y) puts the temporal units in (increasing) order. This at least shows some consistency (in ordering the units by increasing magnitude), and happens to correspond to the convention used in most European countries. However, the EU-convention is still suboptimal.

  • The best solution clearly is provided by Order 1 (yy-mm-dd or %y-%m-%d): By arranging the three temporal units in decreasing magnitude, a set of dates would automatically be sorted (from older to newer dates). In analogy to the alphabetic order of words that helps locating them in a dictionary, such an order is called lexicographic. In the case of dates, ordering units by decreasing magnitude (i.e., year-month-day) also happens to be the ISO standard — and is used by default in R.

To sum up, this example teaches us not only how R deals with dates, but also two important lessons for avoiding ambiguity when recording dates in all kinds of data (not just in R):

  • Lesson 1: Always enter years in the 4-digit format ("%Y").

Although using a 2-digit format for years is possible (provided that "%y" is specified), using the more explicit 4-digit format ("%Y") reduces the potential for confusion. As any numeric elements denoting the day and month parts of dates will maximally contain two digits, the four digits denoting a year are always distinguishable.

However, even when it is unambiguous which number denotes the year, the digits for month "%m" and day "%d" can still be confused (unless the value for day is greater than 12, in which case it cannot be misinterpreted as a month). This motivates our second guideline for representing dates in data:

  • Lesson 2: Use a reasonable order of units when entering dates (ideally "%Y-%m-%d", as R does).

As we have seen in Table 10.1, only two of the six possible orders exhibit some sort of logical consistency (beyond random conventions):

  1. arrange units by increasing magnitude: dd-mm-yyyy or "%d-%m-%Y" (EU convention)

  2. arrange units by decreasing magnitude: yyyy-mm-dd or "%Y-%m-%d" (ISO standard, R)

Of these orders, arranging units by decreasing magnitude has the edge by being lexicographic and implemented in international standards.

Thus, whenever dealing with dates in data and entering the realms of R, we should prefer logical consistency over our ordinary habits or loyalties (e.g., shed the local conventions used in the EU or US). In short:

  • Always enter dates in the format yyyy-mm-dd (i.e., "%Y-%m-%d" in R).

Parsing and retrieving weekdays and months in dates

In the introduction (in Section 10.1), we mentioned that dates often contain elements that are not numbers. Apart from marks of punctuation, an important part of dates are the names for the seven days of a week (e.g., Monday, Tuesday, etc.) and the twelve months (e.g., January, February, etc.).

A neat aspect of encoding dates in R is that we can retrieve these elements in the same fashion as we obtain their numeric components. In the context of the date Titanic_sinks (defined above), we used the command

to explicate its weekday, month, week number, and year (in the 4-digit format).

The names of weekdays and months are often abbreviated and vary between languages. To determine and display our current system setting, we can evaluate:

To refer to the names of weekdays and months in R’s format argument, we need to specify the following conversion specifications:

  • %a: Abbreviated weekday name (in the current system locale). Options are: Mon, Tue, Wed, Thu, Fri, Sat, Sun

  • %A: Full weekday name (…):
    Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday

  • %b: Abbreviated month name (…):
    Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec

  • %B: Full month name (…):
    January, February, March, April, May, June, July, August, September, October, November, December

For the date Titanic_sinks, using these shortcuts as part of a character string in the format argument yields:

More generally, the details of an R object encoded as a “Date” can be revealed as follows:

Just as we can retrieve the elements of dates, we can also read strings that contain dates by using the same conversion specifications. Importantly, any character in the format string that is not part of a conversion specification is interpreted literally.

Taken together with the numeric conversion specifications for days, months, and years (above), these rules and tools provide great flexibility in reading a variety of dates (given as character strings) into R objects of type “Date”:

Computing with dates

Given that R objects of type “Date” are internally represented as integers, it is no surprise that we can calculate with them. However, computing with dates can be tricky, as soon as we depart from relatively small-scale units of seconds or days (see Section 10.3.3 for details and solutions).

Subtracting one day from another yields the difference between two dates (as a difftime object):

Date (and time) differences can also be computed by providing two dates (or times) to the difftime() function. Unfortunately, its options for the argument units stop at “weeks”, limiting its use for large date differences:

As long as we know that dates are integers, we can easily add or subtract other numbers of days to obtain new dates:

Note that the results of these arithmetic operations are still dates (of class “Date”):

Interestingly, we can apply our usual R vector magic with dates:

and an important R function like seq() even accommodates temporal units in its arguments:

As we will see below, the same flexibility also allows for computing with times.

Practice

Here are some practice tasks on dates:44

  1. Predicting dates in R:
  • Predict, evaluate, and explain the result of the following commands.

  • Which format corresponds to the conventions used in the US, EU, and R, respectively?

Hint: Note the details of the different format arguments.

Solution

  1. Reading dates into R:

Read the following dates (provided in different formats) into R (by using the as.Date function with an appropriate format):

  • "2020-07-01"
  • "1/7/2020"
  • "July 1st, 2020"
  • "2nd January, 03"
  • "Feb 29, 2020"
  • "February 29, 2010"

Solution

The three 2-digit numbers must denote the three date components %y %m and %d. In principle, these three components can be ordered in 6 different ways. However, we can easily exclude 4 such orders:

  • For 2 of them, the number 29 would denote the year %y (2029), which would be in the future (given that our survey happens in 2020).

  • For 2 others, the number 29 would denote the month %m, which does not make sense (as there are only 12 months in a year).

This leaves us with only 2 candidate interpretations:

  • "%m-%d-%y": February 29, 2010
  • "%y-%d-%m": October 29, 2002

Assuming that teenagers can take part of the survey, both of these seem possible. However, we saw in the previous exercise that there is no February 29, 2010 (as 2010 was no leap year). Hence, we have to conclude that the date denotes October 29, 2002.

Trying out the 6 possible orders in R yields:

  1. Name ranges of weekdays and months:

We have seen how to retrieve the name of the weekday or month for a particular date.

  • Can you also construct an R expression that prints all possible names of weekdays and months?

Hint: Define a date for a Monday in mid-January and then use R’s vector arithmetic to define the dates of the next 0:6 days and in all 12 months. Then format the resulting vectors with an appropriate conversion specification.

Solution

  1. Personal life times:

Enter the date of birth and death (or current date) of some person (e.g., a celebrity or yourself) and compute their age in various units of time (years, months, weeks, days, hours, minutes, seconds).

Hint: As the difftime() function does not allow for units beyond “weeks”, use heuristics and arithmetic for comuting larger units (months, years). (We will discuss more precise methods for dealing with such situations in Section 10.3.)

Solution

  1. Leap year scheduling:

As 2020 is a leap year (see Wikipedia), there was a February 29, 2020. Imagine a child was born on this date.

  • When are the next 10 years with this particular date? Use seq() to create a sequence that contains the corresponding dates of the next 10 leap years.

  • What happens when you start a yearly sequence of dates on a February 29 (for the years that are not leap years)?

  • We want more parties and presents: Can we schedule the dates for bi-annual celebrations (i.e., every 6 months)?

10.2.4 Times

This section deals with evaluating, reading, and computing with times in R. A we have noted above, times are more complicated than dates, as specifying a time includes specifying its date. Fortunately, much of what we learned about dates carries over to times. Thus, we can proceed more quickly in this section and focus on additional aspects.

One new aspect is that base R provides not only one, but two classes for representing times:

  • POSIXct is the main date-time class for representing calendar time;

  • POSIXlt is an additional date-time class for representing local time (as lists).

In the vast majority of use cases, we will use “POSIXct” objects (i.e., calendar times). As both “POSIXct” and “POSIXlt” are instances of a larger “POSIXt” class, the commands for setting and displaying (parts of) time objects apply to objects of both classes.

Obtaining the current time

By default, the Sys.time() function yields a “POSIXct” object that provides a calendar time or date-time (i.e., a combination of date and time, in a particular time zone):

Note that Sys.time() automatically added a time zone to tnow. This was chosen according to the settings of our local system (here: Evaluating Sys.timezone() yields “Europe/Berlin”).

As we have seen for Sys.Date(), calling Sys.time() prints a string, but actually returns an object of class POSIXct that can be assigned to a variable (e.g., tnow):

Entering ?DateTimeClasses in the R Console provides an overview of both “POSIXt” classes.

And just as we saw with dates, figuring out what kind of object tnow is, is non-trivial. Just like dates, times appear somewhat similar to numbers, but not entirely:

To reveal the internal representation of times in R, we can apply the base R function unclass() and the ds4psy function is.wholenumber() to tnow:

This shows that tnow is represented as a number as well, but not an integer. As it turns out, time in R is internally measured in terms of the number of seconds elapsed since some origin. This raises a profound question:

  • What is the origin of time?

with a prosaic and predictable answer (when only considering R):

  • Time is counted since the Unix epoch at “1970-01-01 00:00:00 UTC”.

This can be demonstrated by using the difftime() function as follows:

Thus, just like Sys.Date() represented today’s date by counting the number of days elapsed since “1970-01-01”, the Sys.time() function internally represents time as the number of seconds elapsed since “1970-01-01 00:00:00 UTC” (with UTC denoting the standard time zone of “Coordinated Universal Time”).

Creating (and representing) times

As we have seen for dates (i.e., objects of class “Date”, covered above in Section 10.2.3), we essentially face two distinct tasks when dealing with times:

  1. We want to create new time objects by entering or parsing times (as strings).

  2. We want to retrieve times (or particular elements of times) from existing time objects.

Both of these tasks require that we can denote and label particular elements of time. The system of specifying elements of times (and dates) in base R is using the POSIX standard.

The POSIX standard

The acronym POSIX stands for “Portable Operating System Interface” and defines standards for maintaining compatibility between different computer operating systems (with the “X” implying Unix, due to its independence from a particular manufacturer, see Wikipedia: POSIX for details).

This standard allows specifying particular components of calendar times (of class “POSIXct”) or local times (of class “POSIXlt”) by providing so-called “conversion specifications”. A conversion specification is a code for a particular element of date or time objects. In code, it is used inside a character string and introduced by the symbol %, usually followed by a single letter (or O or E and then a single letter). Any character appearing in a format string that is not part of a conversion specification is interpreted literally (and the double-symbol %% yields %).

Despite the POSIX standard, the exact details of the formats used in dates and times are platform-specific. However, the following conversion specifications are likely to be widely available:

  • %a: Abbreviated weekday name (in the current locale on this platform): Thu

  • %A: Full weekday name (…): Thursday

  • %b: Abbreviated month name (…): Jul

  • %B: Full month name (…): July

  • %C: Century (00–99): the integer part of the year divided by 100.

  • %d: Day of the month as decimal number (01–31).

  • %e: Day of the month as decimal number (1–31), with a prefix space for a single-digit number.

  • %h: Equivalent to %b.

  • %H: Hours as decimal number (00–23).

  • %I: Hours as decimal number (01–12).

  • %j: Day of year as decimal number (001–366).

  • %m: Month as decimal number (01–12).

  • %M: Minute as decimal number (00–59).

  • %p: AM/PM indicator in the locale. Used in conjunction with %I and not with %H.

  • %S: Second as integer (00–61), allowing for up to two leap-seconds.

  • %u: Weekday as a decimal number (1–7, Monday is 1).

  • %U: Week of the year as decimal number (00–53) using Sunday as the first day 1 of the week (US convention).

  • %V: Week of the year as decimal number (01–53) as defined in ISO 8601.

  • %w: Weekday as decimal number (0–6, Sunday is 0).

  • %W: Week of the year as decimal number (00–53), using Monday as the first day of week (UK convention).

  • %y: Year without century (00–99). On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 – that is the behaviour specified by the POSIX standards, but they do also state that “in a future version the default century inferred from a 2-digit year will change”.

  • %Y: Year with century. Note that whereas there was no zero in the original Gregorian calendar, the ISO 8601:2004 standard defines it as being valid (interpreted as 1BC, see Wikipedia: zero (year). As input, only years 0:9999 are accepted.

In practice, here are some examples how we can use conversion specifications for retrieving particular elements of a given time object (e.g., tnow):

Despite its length, this list is partial — and most people do not remember it (but look up ?strftime when they need a particular element). The following codes and shortcuts for combinations are most useful in practice:

Table 10.1: Useful options for formatting the calendar time 2020-07-30 20:19:13 (see ?strftime for details).
Option Details Example
%c Date and time (locale-specific on output, %a %b %e %H:%M:%S %Y on input) Thu Jul 30 20:19:13 2020
%D Date formatted as %m/%d/%y 07/30/20
%F Date equivalent to %Y-%m-%d (ISO 8601 date format) 2020-07-30
%T Time equivalent to %H:%M:%S 20:19:13
%x Date (locale-specific output, %y/%m/%d on input) 07/30/2020
%X Time (locale-specific output, %H:%M:%S on input) 20:19:13
%z Signed offset from UTC (i.e., positive values are ahead of UTC) +0200
%Z Time zone abbreviation (empty if not available) CEST

In practice, here are some examples of useful conversion specifications that are worth remembering:

Evaluate ?strftime for a more comprehensive list.

Parsing and retrieving times

Armed with the POSIX standard, we can create time objects from text strings and interpret them in a variety of ways.

Here are some examples for parsing character strings into time objects (of the “POSIXt” class):

Note that these times also have a date (and therefore all date-related elements, like a year, month, weekdays, etc.). As “POSIXct” times actually are calendar times (aka. date-times), the missing elements were filled in by today’s date. Thus, rather than entering only times, it typically makes more sense to enter date-times:

Having defined a time object, we can apply the same standards to retrieve particular elements of these times:

Specifying time zones

Different time zones add a local dimension to the notion of time and provide endless sources of complications for computing with dates and times. Many R functions that deal with dates or times provide a time zone argument (typically called tz or tzone). If this is unspecified (or tz = ""), most systems either use the time zone of our current system (which can be termined by evaluating Sys.timezone()) or default to “UTC” (which stands for “Coordinated Universal Time”, see Wikipedia: UTC for details), a standard time zone often used in scientific contexts and roughly equivalent to “GMT” (Greenwich Mean Time). As a bonus, UTC does not use any daylight saving time (DST).

The base R function OlsonNames() lists the 593 time zones available in R.

Why bother with specifying time zones? There are two main reasons for doing so:

  1. Specifying time zones allows for automatic adjustments for daylight saving time (DST) when computing time differences (for the same location):
  1. Specifying time zones allows computing time differences between particular times (at different locations):

The following example illustrates a special case: Same (nominal) time at two different locations:

We will return to the issue of time zone conversions below (when considering “POSIXlt” objects) and in discussing related lubridate functions (in Section 10.3.4).

Computing with times

Given that times are internally represented as numbers, it is clear that we can use them in numeric expressions.

Here are examples for comparing times and for using existing times for computing new times:

As we have seen for dates, we can compute time differences as difftime objects and express them in multiple units (but topping out at “weeks”):

A useful function that also works with times is trunc(). It takes an argument x, allows specifying the time-related units of interest, and truncates integers of any smaller units to zero:

Note that we were only considering calculations involving existing calendar times (or date-time points) here. We will cover calculations involving particular time spans (i.e., adding or subtracting specific durations or time periods, specified in terms of minutes, hours, days, weeks, months, or years) in Section 10.3.

Local times

Time objects using the “POSIXlt” class are called local times. In contrast to the calendar time objects (of class “POSIXct”), local times (of class “POSIXlt”) are stored as lists. This allows for an easier extraction of specific componants of a time.

We can convert a calendar time (e.g., tnow) into a local time by the as.POSIXlt() function:

Superficially, both time constructs look the same. Their differences really are “under the hood”, in the way the corresponding time is being represented:

We can create new local time objects using the POSIX standard (described above). For instance, let’s define xm_lt as 18:15:30 (in Oslo, Norway) on Xmas eve 2020:

Internally, POSIXlt objects are lists:

The representation of local times as lists allows for an easier extraction of specific components:

Note that some values may be unexpected. This is due to certain definitions that may cause surprises:

  • mon runs from 0–11 (rather than 1–12)
  • year is counted forwards from 1900

Note that truncating times (i.e., rounding off to different units) also works with POSIXlt times:

Beyond being internally stored as lists, we can ask: What are local times for? To answer this, suppose you wanted to solve the following task:

  • Task: Convert a given (calendar) time into the corresponding time of different time zone.

Our first attempt to solve this task (by using only “POSIXct” objects) would fail as follows:

We see that the two calendar times t_Berlin and t_NYC_x objects have different time zones, but show the same nominal time. This demonstrates that simply taking a calendar time of a specific time zone to create a calendar time of a different time zone changes the time zone, but not the nominal time.

Thus, to actually convert times (or, actually, the display of times by location), we need a different approach. This is where the notion of local times (as in “POSIXlt”) comes into play:

Importantly, using the as.POSIXlt() function has shifted the time display of t_NYC (i.e., the displayed number of hours moved back by 6 units), but kept the actual point in time identical to that of t_Berlin. When expressing both times (as “POSIXct” objects) in terms of UTC, we can see that they really denote the same point in time:

Thus, taking a “POSIXct” object and a local time zone tz as the input of as.POSIXlt() changes both the time zone and the nominal time displayed into the corresponding time display of the local time zone.

To wrap up this section, here are some examples of converting a current (local) time — stored as the “POSIXct” object tnow — into the corresponding time display in lots of different time zones:

# Convert a time into different time zones:
tnow  # a "POSIXct" object
#> [1] "2020-07-30 20:19:13 CEST"

as.POSIXlt(tnow, tz = "NZ")
#> [1] "2020-07-31 06:19:13 NZST"
as.POSIXlt(tnow, tz = "Australia/Melbourne")
#> [1] "2020-07-31 04:19:13 AEST"

as.POSIXlt(tnow, tz = "Japan")
#> [1] "2020-07-31 03:19:13 JST"
as.POSIXlt(tnow, tz = "Asia/Shanghai")
#> [1] "2020-07-31 02:19:13 CST"
as.POSIXlt(tnow, tz = "Hongkong")
#> [1] "2020-07-31 02:19:13 HKT"

as.POSIXlt(tnow, tz = "Asia/Bangkok")
#> [1] "2020-07-31 01:19:13 +07"
as.POSIXlt(tnow, tz = "Asia/Jakarta")
#> [1] "2020-07-31 01:19:13 WIB"
as.POSIXlt(tnow, tz = "Asia/Calcutta")
#> [1] "2020-07-30 23:49:13 IST"
as.POSIXlt(tnow, tz = "Asia/Damascus")
#> [1] "2020-07-30 21:19:13 EEST"
as.POSIXlt(tnow, tz = "Asia/Tel_Aviv")
#> [1] "2020-07-30 21:19:13 IDT"

as.POSIXlt(tnow, tz = "Europe/Moscow")
#> [1] "2020-07-30 21:19:13 MSK"
as.POSIXlt(tnow, tz = "Europe/Berlin")
#> [1] "2020-07-30 20:19:13 CEST"
as.POSIXlt(tnow, tz = "Europe/London")
#> [1] "2020-07-30 19:19:13 BST"

as.POSIXlt(tnow, tz = "Iceland")
#> [1] "2020-07-30 18:19:13 GMT"

as.POSIXlt(tnow, tz = "US/Eastern")
#> [1] "2020-07-30 14:19:13 EDT"
as.POSIXlt(tnow, tz = "US/Central")
#> [1] "2020-07-30 13:19:13 CDT"
as.POSIXlt(tnow, tz = "US/Pacific")
#> [1] "2020-07-30 11:19:13 PDT"
as.POSIXlt(tnow, tz = "US/Hawaii")
#> [1] "2020-07-30 08:19:13 HST"
as.POSIXlt(tnow, tz = "Canada/Central")
#> [1] "2020-07-30 13:19:13 CDT"

as.POSIXlt(tnow, tz = "Brazil/West")
#> [1] "2020-07-30 14:19:13 -04"
as.POSIXlt(tnow, tz = "Pacific/Fiji")
#> [1] "2020-07-31 06:19:13 +12"
as.POSIXlt(tnow, tz = "Antarctica/South_Pole")
#> [1] "2020-07-31 06:19:13 NZST"
as.POSIXlt(tnow, tz = "Zulu")
#> [1] "2020-07-30 18:19:13 UTC"

We will re-consider the issue of time conversions below, when discussing the difference between the lubridate functions with_tz() and force_tz() (see Section 10.3.4).

Practice

Here are some practice tasks on dealing with dates and times in base R:

  1. Rethinking dates and times:

    • If we were to write a function for today() or now(), which types of object (or R classes) should they return? Why?

    • Why should we typically use the “Date” class — rather than the classes “POSIXct” or “POSIXlt” — to record someone’s date of birth (DOB) or date of death (DOD)?

    • Under which circumstances should we use “POSIXct” objects to record DOB or DOD data?

    • Under which circumstances should we use “POSIXlt” objects to record date- or time-related data?

    • Based on our insights into the internal representation of dates and times in R (as numbers of days or seconds from the Unix epoch, using “1970-01-01 00:00:00 UTC” as a shared reference point), what numbers do we expect to represent the date and time of barely_69 <- "1969-12-31 23:59:59"?

Solution

  1. Explaining Sys.time()
  • Predict, evaluate, and explain the following expressions in your console:
  • What time zone was used to define st?
  • Predict, evaluate, and explain the following expressions:
  1. Reading and interpreting time strings:

Enter the following times into “POSIXct” objects:

  • “20:15” (of today)
  • “6.30 pm” on Christmas Eve (this year)
  • “23:59:59” on New Year’s Eve (this year, in New Zealand)
  • “8.30 am” (yesterday)
  • noon (in one week)

Hint: The last two times can be entered by specifying their dates or by adding or subtracting an appropriate time difference.

Solution

  1. Flight duration: Different times at different locations.

On December 10, 2020, a Lufthansa flight is scheduled to depart from Frankfurt am Main (Germany, FRA) at 10:50 and arrive in New York City (USA, JFK) at 13:40.

  • Compute the duration of the flight (in hours, minutes, and days).

Solution

  1. Time travels:

Suppose we want to watch the New Year fireworks of Sydney, Australia, on a TV screen in

  • New Zealand (NZ)
  • Calcutta (Asia)
  • Paris (Europe)
  • New York City (USA)
  • Hawaii (USA)

What time is it at these locations?

Solution

We first define the time of interest (at the location of interest) as a calendar time (of class “POSIXct”):

In a second step, we transform this time into local time objects (of class “POSIXlt”) with the appropriate time zone tz:

  1. Clustering time zones:

Note the large variety of European cities in OlsonNames().

  • Can you cluster them into categories so that all cities in a cluster use the same local time (i.e., share a time zone)?

Hint: An automated solution of this task would first use our skills in using regular expressions (see Appendix E) to detect or extract all OlsonNames() containing “Europe”, then convert a given time into local times in all these time zones, and finally sort them into clusters that show the same (local) times.

Here is a trial-and-error solution that only considers some time zones:

Solution

  • tc1 evaluates to 2020-12-24 13:30:45 — a local time (i.e., a “POSIXlt” object), as a list.
    The time is the result of converting t_x (in Europe/London) into the corresponding time at a different location (e.g., NYC, using US/Eastern time zone).

  • tc2 evaluates to 2020-12-24 18:30:45 — a calendar time (i.e., a “POSIXct” object). It took a local time of t_x (in London, UK) as its input and provided the same time in a different location (in the US/Eastern time zone).

  • tc3 evaluates to 2020-12-24 23:30:45 — a local time (i.e., a “POSIXlt” object), as a list.
    This time is the result of converting the calendar time tc2 (in US/Eastern) into the corresponding time at a different location (in Europe/London).

As these differences are subtle, but important, here are some code snippets that help exploring these time objects:

Overall, we have seen that base R provides ample support for creating and computing with dates and times. At the same time, dealing with three different classes and their relations to other data types (specifically: numbers) is challenging. Although most of the complications come from the inherent complexities of dates and times, the base R classes and commands for dates and times have been developing over time and appear somewhat dated today.45

In the next section, we will see that the lubridate package provides not only a more modern, but also a more user-friendly approach to dealing with dates and date-times.

References

Spinu, V., Grolemund, G., & Wickham, H. (2020). lubridate: Make dealing with dates a little easier. Retrieved from https://CRAN.R-project.org/package=lubridate


  1. See the discussions on Zeno’s paradoxes (e.g., in SEP or Wikipedia) for the philosophical implications of these concepts, and Section 10.7.5 for readable articles on the philosophy of time.

  2. As R is an object-oriented language, its objects are instances of particular classes (e.g., the integer 2 is an instance of the class “numeric”, the letter A is an instance of class “character”, etc.). The function unclass(x) removes the class attribute from an object x and provides us with a glimpse of what lies “under the hood”. When using R (or any other symbol system), we usually do not want to reflect on the details of the underlying representation. Even when aware that there exist multiple layers of representation, we normally do not want to talk about or reflect upon them. Instead, we typically use objects as objects without wondering or caring about how they are implemented. However, using unclass(x) becomes useful when we pause our normal usage and want to look beyond (or behind) the ordinary object to reveal the nature of its representation (i.e., see what x really is, in terms of its implementation).

  3. The ds4psy function is.wholenumber() does what is.integer() only appears to do: It checks whether its argument is an integer or “whole” number.

  4. For dating advice, better do that internet thing.

  5. Pardon the timely puns.