10.2 Essentials of dates and times
This chapter deals with date- and time-related data in R. This section briefly makes some terminological distinctions that provide a better conceptual grasp of notions of date and time (Section 10.2.1). The bulk of this section introduces base R classes and functions (Section 10.2.2) for creating and computing with dates (Section 10.2.3) and times (Section 10.2.4).
In a later section (Section 10.3), we will see that the lubridate package (Spinu et al., 2023) provides more convenient functions for the same tasks and goes beyond the base R functionality in providing several notions of time spans.
10.2.1 Terminology
This chapter is called “Dates and times”, but dates and times are different concepts. When explicating our common-sense understanding of the terms “date” and “time”, we can distinguish between them as follows:
A date denotes a particular day, which denotes a time span or period of time that has a beginning and a different end. Although its exact demarcations and duration can be controversial, we typically think of a date as a 24-hour period of time at a particular location.
A time denotes a particular instant or moment within a longer period of time. Thus, a time is typically thought of as a point within a day. Whereas a typical day is 24 hours long, a point in time has no duration (i.e., its beginning and end are identical). Measuring and reporting times inevitably raises many issues about conventions (e.g., AM/PM), precision (e.g., minutes, seconds, milliseconds), and location (e.g., time shifts and zones).
Based on this conceptualization, any date contains an infinite number of times, just like a line contains an infinite number of points. But when scaling up to longer periods of time (e.g., years, or decades, or centuries), we can view each date as a particular point on a larger continuum of time.57
Interestingly, the complexities are reversed when defining data structures that represent dates and times. For precisely specifying a particular time also requires specifying the date within it is embedded. Thus, our definitions of time objects will necessarily be more complex than those of date objects.
Additional challenges are provided by the specific tasks that we want to tackle with date- and time-related data. We typically want to do many different things with time: Beyond denoting specific points in time, we mostly want to transform times (e.g., between time zones) and describe and calculate with time spans. A time span can also be referred to as a duration, a time period, or as an interval. As we will see, these concepts are based on slightly different notions of time, which are distinguished by the lubridate package (see Section 10.3.3).
10.2.2 Base R classes for dates and times
The base R package answers St. Augustine’s question What then is time? by no less than three classes of objects for representing dates and times:
Date
represents the date of days without considering times;
POSIXct
is the main date-time class for representing calendar time;
POSIXlt
is an additional date-time class for representing local time (as lists).
In the following sections, we consider each of these classes in turn.
Entering ?Dates
in the R Console shows the documentation on the “Date” class;
?DateTimeClasses
provides documentation on the two “POSIXt” classes.
10.2.3 Dates
The “Date” class is used when we are only dealing with dates, not times.
As we have seen above that representing times is more complex than representing dates, we should always use this class when we are only interested in dates (e.g., the day on which some event \(x\) happened), rather than more specific details (e.g., the time of \(x\)).
Obtaining the current date
In R, the current date can be obtained by evaluating the Sys.Date()
function:
Sys.Date()
#> [1] "2022-09-10"
# Related functions:
base::date()
#> [1] "Sat Sep 10 14:13:13 2022"
lubridate::today()
#> [1] "2022-09-10"
ds4psy::cur_date()
#> [1] "2022-09-10"
Calling Sys.Date()
prints a string, but actually returns an object of class Date
that can be assigned to a variable (e.g., dt
):
# What kind of object?
dt <- Sys.Date()
dt
#> [1] "2024-10-16"
class(dt)
#> [1] "Date"
# ?Dates # shows documentation
Evaluating ?Dates
in the R Console provides documentation on the “Date” class.
Figuring out what kind of object dt
is, is non-trivial.
Curiously, dates appear somewhat similar to numbers, but also show some differences to them:
# What kind of object is dt?
typeof(dt)
#> [1] "double"
is.numeric(dt)
#> [1] FALSE
mode(dt)
#> [1] "numeric"
To reveal the internal representation of dates in R, we can apply the base R function unclass()
to a date:58
Curiously, evaluating unclass(dt)
yields the number 20012.
To find out what kind of number this is, we can apply the ds4psy function is_wholenumber()
to the date dt
:59
Thus, objects of the class Date
are internally represented as numbers.
More specifically, dates in R are actually integers.
But this insight only raises further questions:
How are the values of these integers being determined?
More specifically, what do dates count, and where does the counting begin?
When viewing integers as counts of something, these questions asks about the mechanism, the unit, and the origin of counting. The following code snippets answer both questions:
unclass(as.Date("1970-01-01")) # date of 0 "zero"
#> [1] 0
unclass(as.Date("1970-01-02")) # counting "days"
#> [1] 1
# The following is always TRUE:
unclass(Sys.Date() - as.Date("1970-01-01")) == Sys.Date()
#> [1] TRUE
Thus, the “original” date (corresponding to a count of zero) is defined as “1970-01-01” or “Thursday, 1970-01-01” and the internal value of a date denotes the number of days elapsed since then.
But this creates yet another conundrum:
- What about dates before 1970? Do we need another system for historic dates?
Fortunately, the answer is no, as the following example shows:
dt_Titanic <- as.Date("1912-04-15")
dt_Titanic
#> [1] "1912-04-15"
# Reveal internal representation:
unclass(dt_Titanic)
#> [1] -21080
When dates are represented as the number of days since “1970-01-01”, any earlier date is simply described by counting backwards and using negative integer values to indicate the direction of the count.
Incidentally, the date of “1970-01-01” is also known as the Unix epoch (see Wikipedia: Unix_time for details) and can be obtained by evaluating orgin
(of the lubridate package):
Creating (and representing) dates
So far, we have learned that Sys.Date()
shows the current date and dates are internally represented as integers.
However, we also need some way of entering arbitrary dates (i.e., dates that are not today’s date).
And once we have created a date, we will want to query it for particular parts (e.g., its month, or weekday).
As we will see, entering and interpreting dates in base R can be tricky, as it requires knowing some vocabulary of a date-time description language called “POSIX” (see below).
When dealing with dates, we essentially face two distinct tasks:
We want to create new date objects by entering or parsing dates (as strings).
We want to retrieve dates (or particular elements of dates) from existing date objects.
Both of these tasks require that we can denote and label particular elements of dates — that will be the job of “POSIX” and so called “conversion specifications”. But before we learn to enter arbitrary dates, our insight that dates are actually represented as integers raises another interesting question:
- If dates are numbers, are we to enter dates as numbers as well?
The following shows that entering dates as numbers is possible:
Thus, the date of the number 0 in R is the 1st of January in 1970 (i.e., the Unix epoch). However, as we typically do not think of dates in terms of integer distances from the Unix epoch, always entering dates as numbers would be rather inconvenient.
As we have already seen above (e.g., when defining a date like dt_Titanic
), R accommodates our human-need for depicting dates in terms of years, months, and days. Note that input to the as.Date()
function used to define the date dt_Titanic
actually was a character string:
It is good to know that this seems to work. However, it also raises further questions:
How does R figure out which number refers to the year, month, and day?
What else does R know about a given “Date” object?
The first of these questions either calls for some rule or convention (i.e., always enter and display dates in a specific order) or for a way of telling R which particular interpretation is desired by us. The second question hints at additional aspects of dates that we may want to find out, even though we did not specify them when entering the date. For instance, on which day of the week did the Titanic sink? How is the 4th month of the year called in our local language?
Both of these questions are addressed by the so-called “POSIX” standard that applies to both dates and times and will be discussed below (in Section 10.2.4). Effectively, this standard provides the elements of a language for describing and querying date and time objects to enter or obtain specific information.
For instance, by querying our dt_Titanic
date, we can obtain:
format(dt_Titanic, "The date %F denotes a %A in %B, in the %V. week of the year %Y.")
#> [1] "The date 1912-04-15 denotes a Monday in April, in the 16. week of the year 1912."
Actually, the date of “1912-04-15” is pretty unambiguous to begin with. We automatically recognize the 4-digit number as a year; and as the number 15 can hardly refer to a month, we have no trouble parsing the date as the 15th day of the month April. However, computers are not so good at making such intuitive judgments — and our intuition can also fail us when confronting with other examples. For instance, how should we (or R) interpret the following strings of text:
In the year 2024, we are likely to interpret these strings as denoting a date in the year 2001. But do they denote the 2nd day of March, or the 3rd day of February? And does it matter for the interpretation whether a string uses the symbol “-” or “/” as a separator between numbers? (Guess the results before viewing them below.)
Here is how R interprets the strings when evaluating them as the argument of the as.Date()
function:
As it turns out, we were mistaken about the year, and the symbol used as separator did not matter. But without further details or examples, we still would not know which number denotes the day and which the month.
Thus, interpreting dates from data implies an enormous potential for confusion — and adding times (in Section 10.2.4) will only get more complicated. Essentially, these considerations illustrate an important point:
- When representations are ambiguous, we typically rely on heuristics and conventions for their interpretation.
In the case of dates and times, such conventions regulate the identity and order of time-related units (i.e., the basic “elements” of dates, such as days, months, and years). Unfortunately, however, conventions do not yet solve our problems. As it turns out, representations of dates and times follow different conventions in different times and locations. Thus, a second insight comes into focus:
- When heuristics and conventions yield conflicting results, we must define and rely on (ideally smart) standards.
But even if an excellent standard was created and ratified, not everyone would follow it. Thus, while technology is well-advised to adhere to standards, we still require tools for dealing with violations.
Overall, all this shows that issues of representation, interpretation, and standardization are closely intertwined. Thinking about ways of representing data raises comparative and normative questions. And even if established standards exist, gaining technological expertise requires skills and tools that allow us to cope with any inconsistencies, glitches, and outright violations that we may encounter. In short, dealing with dates and times requires handling a notoriously messy part of real-world data.
The elements of dates
This section uses an extended example that further illustrates common problems when recording dates in data and the solutions provided by R. As the example is rather long and convoluted, here is a sneak preview of its key points:
A single character string allows for many alternative representations of dates that correspond to different interpretations.
This abundance of options nurtures ambiguity, which is typically resolved by local conventions (e.g., in the EU and US). In the case of dates, even existing conventions are conflicting and confusing, as they involve inconsistent idiosyncracies and vary between locations.
Overall, the best way of entering dates in data uses a 4-digit year (yyyy) and arranges the units for years, months, and days in decreasing magnitude (yyyy-mm-dd). In R, this order is used by default and explicated by the conversion specification
"%Y-%m-%d"
.
Example:
Assume the following assignement of x
:
When being told that x
represents a date, we should ask two questions:
What date does the character string
x
represent?Which date will the expression
as.Date(x)
translate into?
Note that these points really address two distinct questions: The first asks about the meaning of x
(i.e., a particular date in history), whereas the second asks about the default assumptions implemented in the as.Date()
function of the R programming language. Ideally, there should be a correspondence between both issues, but the very fact that we can state and check this implies that they really are different.
To answer the first question, both R and any human interpreter would need to know what the three 2-digit sequences in the character string "01-02-03"
represent. It is pretty clear that one of them will denote the year (expressed in R as %y
if it only contains 2 digits, or %Y
if it used the common 4-digit format), one will denote the month (%m
), and one will denote the day (%d
) within the current month. But knowing this still leaves room for six different orders of these three elements and corresponding interpretations (see Table 10.1 below).
To see what R would do by default, we can evaluate as.Date("01-02-03")
:
Thus, R interprets as.Date(x)
as 1-02-03, which denotes the historical date of February 03, 0001 (Saturday), is internally represented in R as the negative integer -719129, and probably not the date we wanted to enter.
To prevent such issues and errors, we need to specify how the string "01-02-03"
(or any other argument x
) is to be interpreted.
To this purpose, the as.Date()
function takes a format
argument that allows specifying a so-called “conversion specification” (provided as a character string). This second string tells R how to parse and interpret the first argument x
(also provided as a string). When specifying the day, month, and year by a 2-digit number each (using %d
for the day, m
for the month, and y
for the year), we can arrange these elements in six different orders.
Nr. | Format: | Date (in R): | Details: | Evaluation: |
---|---|---|---|---|
1 | "%y-%m-%d" |
2001-02-03 | February 03, 2001 (Saturday) | best (decreasing units) |
2 | "%y-%d-%m" |
2001-03-02 | March 02, 2001 (Friday) | bad |
3 | "%m-%y-%d" |
2002-01-03 | January 03, 2002 (Thursday) | abysmal |
4 | "%m-%d-%y" |
2003-01-02 | January 02, 2003 (Thursday) | bad (despite US convention) |
5 | "%d-%y-%m" |
2002-03-01 | March 01, 2002 (Friday) | abysmal |
6 | "%d-%m-%y" |
2003-02-01 | February 01, 2003 (Saturday) | ok (increasing units, EU convention) |
Table 10.1 shows the six possible candidate representations and corresponding interpretations of the string “01-02-03” when varying the orders of three 2-digit date elements (%y
, %m
, and %d
). The table demonstrates that order really matters for the meaning of dates. This affects how we should enter dates into data and raises the question:
- What should we do: Which is the best order to use?
To answer this normative question, let’s inspect and evaluate the 6 candidate interpretations:
Orders 3 and 5 both place the digits denoting the year (
%y
) in the middle between day and month. This makes absolutely no sense and is therefore classified as abysmal.Orders 2 and 4 are similarly bad, by putting the day (
%d
) in the middle between month and year. This is also pretty bad and confusing, as it lacks any internal logic and relies on memorizing this specific sequence. Nevertheless, Order 4 (mm-dd-yy or%m-%d-%y
) happens to be the dominant U.S. convention.Order 6 (dd-mm-yy or
%d-%m-%y
) puts the temporal units in (increasing) order. This at least shows some consistency (in ordering the units by increasing magnitude), and happens to correspond to the convention used in most European countries. However, the EU-convention is still suboptimal.The best solution clearly is provided by Order 1 (yy-mm-dd or
%y-%m-%d
): By arranging the three temporal units in decreasing magnitude, a set of dates would automatically be sorted (from older to newer dates). In analogy to the alphabetic order of words that helps locating them in a dictionary, such an order is called lexicographic. In the case of dates, ordering units by decreasing magnitude (i.e., year-month-day) also happens to be the ISO standard — and is used by default in R.
To sum up, this example teaches us not only how R deals with dates, but also two important lessons for avoiding ambiguity when recording dates in all kinds of data (not just in R):
- Lesson 1: Always enter years in the 4-digit format (
"%Y"
).
Although using a 2-digit format for years is possible (provided that "%y"
is specified), using the more explicit 4-digit format ("%Y"
) reduces the potential for confusion. As any numeric elements denoting the day and month parts of dates will maximally contain two digits, the four digits denoting a year are always distinguishable.
However, even when it is unambiguous which number denotes the year, the digits for month "%m"
and day "%d"
can still be confused (unless the value for day is greater than 12, in which case it cannot be misinterpreted as a month).
This motivates our second guideline for representing dates in data:
- Lesson 2: Use a reasonable order of units when entering dates (ideally
"%Y-%m-%d"
, as R does).
As we have seen in Table 10.1, only two of the six possible orders exhibit some sort of logical consistency (beyond random conventions):
arrange units by increasing magnitude: dd-mm-yyyy or
"%d-%m-%Y"
(EU convention)arrange units by decreasing magnitude: yyyy-mm-dd or
"%Y-%m-%d"
(ISO standard, R)
Of these orders, arranging units by decreasing magnitude has the edge by being lexicographic and implemented in international standards.
Thus, whenever dealing with dates in data and entering the realms of R, we should prefer logical consistency over our ordinary habits or loyalties (e.g., shed the local conventions used in the EU or US). In short:
- Always enter dates in the format yyyy-mm-dd (i.e.,
"%Y-%m-%d"
in R).
Parsing and retrieving weekdays and months in dates
In the introduction (in Section 10.1), we mentioned that dates often contain elements that are not numbers. Apart from marks of punctuation, an important part of dates are the names for the seven days of a week (e.g., Monday, Tuesday, etc.) and the twelve months (e.g., January, February, etc.).
A neat aspect of encoding dates in R is that we can retrieve these elements in the same fashion as we obtain their numeric components.
In the context of the date dt_Titanic
(defined above), we used the command
format(dt_Titanic, "The date %F denotes a %A in %B, in the %V. week of the year %Y.")
#> [1] "The date 1912-04-15 denotes a Monday in April, in the 16. week of the year 1912."
to explicate its weekday, month, week number, and year (in the 4-digit format).
The names of weekdays and months are often abbreviated and vary between languages. To determine and display our current system setting, we can evaluate:
To refer to the names of weekdays and months in R’s format
argument, we need to specify the following conversion specifications:
%a
: Abbreviated weekday name (in the current system locale). Options are: Mon, Tue, Wed, Thu, Fri, Sat, Sun%A
: Full weekday name (…):
Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday%b
: Abbreviated month name (…):
Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec%B
: Full month name (…):
January, February, March, April, May, June, July, August, September, October, November, December
For the date dt_Titanic
, using these shortcuts as part of a character string in the format
argument yields:
dt_Titanic # a date:
#> [1] "1912-04-15"
format(dt_Titanic, format = "%a") # abbr. weekday name:
#> [1] "Mon"
format(dt_Titanic, format = "%A") # full weekday name:
#> [1] "Monday"
format(dt_Titanic, format = "%b") # abbr. month name:
#> [1] "Apr"
format(dt_Titanic, format = "%B") # full month name:
#> [1] "April"
More generally, the details of an R object encoded as a “Date” can be revealed as follows:
explicate_date <- "The date '%y-%m-%d' denotes a %A in %B, %Y."
format(Sys.Date(), format = explicate_date)
#> [1] "The date '22-09-10' denotes a Saturday in September, 2022."
Just as we can retrieve the elements of dates, we can also read strings that contain dates by using the same conversion specifications. Importantly, any character in the format string that is not part of a conversion specification is interpreted literally.
Taken together with the numeric conversion specifications for days, months, and years (above), these rules and tools provide great flexibility in reading a variety of dates (given as character strings) into R objects of type “Date”:
as.Date("Monday, 2020-06-29", format = "%A, %Y-%m-%d")
#> [1] "2020-06-29"
as.Date("Tue, 20/06/30", format = "%a, %y/%m/%d")
#> [1] "2020-06-30"
as.Date("Jul 1, 2020", format = "%b %d, %Y")
#> [1] "2020-07-01"
as.Date("the 2nd day of July (in 2020)", format = "the %dnd day of %B (in %Y)")
#> [1] "2020-07-02"
Computing with dates
Given that R objects of type “Date” are internally represented as integers, it is no surprise that we can calculate with them. However, computing with dates can be tricky, as soon as we depart from relatively small-scale units of seconds or days (see Section 10.3.3 for details and solutions).
Subtracting one day from another yields the difference between two dates (as a difftime
object):
dt1 <- Sys.Date()
dt2 <- as.Date("2020-01-01")
(dtd <- dt1 - dt2) # difftime (in days)
#> Time difference of 1750 days
class(dtd)
#> [1] "difftime"
Date (and time) differences can also be computed by providing two dates (or times) to the difftime()
function.
Unfortunately, its options for the argument units
stop at “weeks”, limiting its use for large date differences:
# Other units:
difftime(dt1, dt2, units = "weeks")
#> Time difference of 140.4286 weeks
difftime(dt1, dt2, units = "days")
#> Time difference of 983 days
difftime(dt1, dt2, units = "hours")
#> Time difference of 23592 hours
difftime(dt1, dt2, units = "mins")
#> Time difference of 1415520 mins
difftime(dt1, dt2, units = "secs")
#> Time difference of 84931200 secs
# larger date difference:
difftime(dt1, dt_Titanic, units = "weeks")
#> Time difference of 5760.714 weeks
As long as we know that dates are integers, we can easily add or subtract other numbers of days to obtain new dates:
# Create new dates by adding/subtracting days (as integers):
dt1 + 14 # 14 days later
#> [1] "2022-09-24"
dt2 - 8 # 8 days earlier
#> [1] "2019-12-24"
Note that the results of these arithmetic operations are still dates (of class “Date”):
format(dt1 + 14, explicate_date) # 14 days later
#> [1] "The date '22-09-24' denotes a Saturday in September, 2022."
format(dt2 - 8, explicate_date) # 8 days earlier
#> [1] "The date '19-12-24' denotes a Tuesday in December, 2019."
Interestingly, we can apply our usual R vector magic with dates:
# Add a numeric vector to a date:
(next_week <- Sys.Date() + 0:6)
#> [1] "2024-10-16" "2024-10-17" "2024-10-18" "2024-10-19" "2024-10-20"
#> [6] "2024-10-21" "2024-10-22"
and an important R function like seq()
even accommodates temporal units in its arguments:
# Create a sequence of calendar dates:
(every_2_weeks <- seq(dt1, length = 10, by = "2 weeks"))
#> [1] "2022-09-10" "2022-09-24" "2022-10-08" "2022-10-22" "2022-11-05"
#> [6] "2022-11-19" "2022-12-03" "2022-12-17" "2022-12-31" "2023-01-14"
(every_2_months <- seq(dt1, length = 3, by = "2 months"))
#> [1] "2022-09-10" "2022-11-10" "2023-01-10"
(every_quarter <- seq(dt1, length = 4, by = "quarter"))
#> [1] "2022-09-10" "2022-12-10" "2023-03-10" "2023-06-10"
As we will see below, the same flexibility also allows for computing with times.
Practice
Here are some practice tasks on dates:60
- Predicting dates in R:
Predict, evaluate, and explain the result of the following commands.
Which format corresponds to the conventions used in the US, EU, and R, respectively?
Hint: Note the details of the different format
arguments.
Solution
as.Date("10 11 12", format = "%m %d %y")
as.Date("10/11/12", format = "%y/%d/%m")
as.Date("10:11:12", format = "%d:%m:%y")
as.Date("10-11-12", format = "%y-%m-%d")
as.Date("10/11/12", format = "%y %m %d")
- Reading dates into R:
Read the following dates (provided in different formats) into R (by using the as.Date
function with an appropriate format
):
"2024-07-01"
"1/7/2024"
"July 1st, 2024"
"2nd January, 04"
"Feb 29, 2024"
"February 29, 2025"
Solution
as.Date("2020-07-01") # works as default
as.Date("2020-07-01", format = "%Y-%m-%d")
# Non-standard formats:
as.Date("1/7/2024", format = "%d/%m/%Y")
as.Date("July 1st, 2024", format = "%B %dst, %Y")
as.Date("2nd Jan, 04", format = "%dnd %b, %y")
as.Date("Feb 29, 2024", format = "%b %d, %Y") # 2024 is a leap year
as.Date("Feb 29, 2025", format = "%B %d, %Y") # date does not exist
# See
# ?strptime # for a list of possible format symbols
# Sys.getlocale() # for current locale settings of your system
- The date detective:
In an online survey, a person notes that his or her birthday is on “02-29-10”.
- Assuming that there is no error, which date does the string “02-29-10” denote?
Hint: This puzzle can be solved without R, but R can help settling some issues.
Solution
The three 2-digit numbers must denote the three date components %y
%m
and %d
.
In principle, these three components can be ordered in 6 different ways.
However, we can easily exclude 4 of these orders:
For 2 of them, the number 29 would denote the year
%y
(2029), which would be in the future (given that our survey occurs in 2024).For 2 others, the number 29 would denote the month
%m
, which does not make sense (as there are only 12 months in a year).
This leaves us with only 2 candidate interpretations:
"%m-%d-%y"
: February 29, 2010"%y-%d-%m"
: October 29, 2002
Assuming that teenagers can take part of the survey, both of these seem possible. However, we saw in the previous exercise that there is no February 29, 2010 (as 2010 was no leap year). Hence, we have to conclude that the date denotes October 29, 2002.
Trying out the 6 possible orders in R yields:
d3 <- "02-29-10"
as.Date(d3, format = "%m-%y-%d") # possible future date
#> [1] "2029-02-10"
as.Date(d3, format = "%d-%y-%m") # possible future date
#> [1] "2029-10-02"
as.Date(d3, format = "%y-%m-%d") # non-sensical month
#> [1] NA
as.Date(d3, format = "%d-%m-%y") # non-sensical month
#> [1] NA
as.Date(d3, format = "%m-%d-%y") # does not exist (2010 no leap year)
#> [1] NA
as.Date(d3, format = "%y-%d-%m") # only possible date
#> [1] "2002-10-29"
- Name ranges of weekdays and months:
We have seen how to retrieve the name of the weekday or month for a particular date.
- Can you also construct an R expression that prints all possible names of weekdays and months?
Hint: Define a date for a Monday in mid-January and then use R’s vector arithmetic to define the dates of the next 0:6 days and in all 12 months. Then format the resulting vectors with an appropriate conversion specification.
Solution
explicate_date <- "The date '%Y-%m-%d' denotes a %A in %B."
# Some Monday in mid-January:
mon_jan <- "2020-01-20"
format(as.Date(mon_jan), format = explicate_date)
# Dates:
d_weekdays <- as.Date(mon_jan) + (0:6)
d_months <- as.Date(mon_jan) + (30 * 0:11)
# Solution:
format(d_weekdays, "%A/%a") # all 7 names of days
format(d_months, "%B/%b") # all 12 names of months
- Personal life times:
Enter the date of birth and death (or current date) of some person (e.g., a celebrity or yourself) and compute their age in various units of time (years, months, weeks, days, hours, minutes, seconds).
Hint: As the difftime()
function does not allow for units beyond “weeks”, use heuristics and arithmetic for comuting larger units (months, years). (We will discuss more precise methods for dealing with such situations in Section 10.3.)
Solution
# Sigmund Freud:
# born: 6 May 1856
# died: 23 September 1939 (aged 83)
# Source: <https://en.wikipedia.org/wiki/Sigmund_Freud>
sf_DOB <- as.Date("1856-05-06")
sf_DOD <- as.Date("1939-09-23")
# Explicate dates:
format(sf_DOB, explicate_date)
#> [1] "The date '56-05-06' denotes a Tuesday in May, 1856."
format(sf_DOD, explicate_date)
#> [1] "The date '39-09-23' denotes a Saturday in September, 1939."
# Time difference (in days):
sf_days <- sf_DOD - sf_DOB
sf_days
#> Time difference of 30454 days
# Note:
difftime(sf_DOD, sf_DOB, units = "weeks")
#> Time difference of 4350.571 weeks
difftime(sf_DOD, sf_DOB, units = "days")
#> Time difference of 30454 days
difftime(sf_DOD, sf_DOB, units = "hours")
#> Time difference of 730896 hours
# Compute approximate values:
# Approximate for 2 reasons:
# 1. Dates did not provide times (up to 24 hours of variation from true value).
# 2. Conversions from days into other units uses heuristics (e.g., 365 days per year)
# larger units:
as.numeric(sf_days)/365 # years
#> [1] 83.43562
as.numeric(sf_days)/(365) * 12 # months
#> [1] 1001.227
as.numeric(sf_days)/7 # weeks
#> [1] 4350.571
# given unit:
as.numeric(sf_days) # days
#> [1] 30454
# smaller units:
as.numeric(sf_days) * 24 # hours
#> [1] 730896
as.numeric(sf_days) * (24 * 60) # minutes
#> [1] 43853760
as.numeric(sf_days) * (24 * 60 * 60) # seconds
#> [1] 2631225600
- Leap year scheduling:
As 2024 is a leap year (see Wikipedia), there is a February 29, 2024. Imagine a child is born on this date.
When are the next ten years with this particular date? Use
seq()
to create a sequence that contains the corresponding dates of the next ten leap years.What happens when you start a yearly sequence of dates on a February 29 (for the years that are not leap years)?
We want more parties and presents: Can we schedule the dates for bi-annual celebrations (i.e., every six months)?
Solution
leap_years <- seq(as.Date("2024-02-29"), length = 11, by = "4 years")
leap_years[-1] # remove first date
#> [1] "2028-02-29" "2032-02-29" "2036-02-29" "2040-02-29" "2044-02-29"
#> [6] "2048-02-29" "2052-02-29" "2056-02-29" "2060-02-29" "2064-02-29"
yearly_feb29 <- seq(as.Date("2024-02-29"), length = 9, by = "year")
yearly_feb29
#> [1] "2024-02-29" "2025-03-01" "2026-03-01" "2027-03-01" "2028-02-29"
#> [6] "2029-03-01" "2030-03-01" "2031-03-01" "2032-02-29"
biannual_feb29 <- seq(as.Date("2024-02-29"), length = 9, by = "2 quarters")
biannual_feb29
#> [1] "2024-02-29" "2024-08-29" "2025-03-01" "2025-08-29" "2026-03-01"
#> [6] "2026-08-29" "2027-03-01" "2027-08-29" "2028-02-29"
Note that the existence of leap years causes difficulties when dealing with dates, as it is not immediately obvious whether a particular year falls into this category. We will re-visit the leap year issue in the context of lubridate commands below (in Section 10.3.4).
10.2.4 Times
This section deals with evaluating, reading, and computing with times in R. A we have noted above, times are more complicated than dates, as specifying a time includes specifying its date. Fortunately, much of what we learned about dates carries over to times. Thus, we can proceed more quickly in this section and focus on additional aspects.
One new aspect is that base R provides not only one, but two classes for representing times:
POSIXct
is the main date-time class for representing calendar time;POSIXlt
is an additional date-time class for representing local time (as lists).
In the vast majority of use cases, we will use “POSIXct” objects (i.e., calendar times). As both “POSIXct” and “POSIXlt” are instances of a larger “POSIXt” class, the commands for setting and displaying (parts of) time objects apply to objects of both classes.
Obtaining the current time
By default, the Sys.time()
function yields a “POSIXct” object that provides a calendar time or date-time (i.e., a combination of date and time, in a particular time zone):
# Current time (as "POSIXct"):
Sys.time()
#> [1] "2022-09-10 14:13:15 CEST"
# related functions:
lubridate::now()
#> [1] "2022-09-10 14:13:15 CEST"
ds4psy::cur_time()
#> [1] "14:13"
Note that Sys.time()
automatically added a time zone to tnow
. This was chosen according to the settings of our local system (here: Evaluating Sys.timezone()
yields “Europe/Berlin”).
As we have seen for Sys.Date()
, calling Sys.time()
prints a string, but actually returns an object of class POSIXct
that can be assigned to a variable (e.g., tnow
):
# What kind of object?
tnow <- Sys.time()
tnow
#> [1] "2024-10-16 22:07:17 CEST"
class(tnow)
#> [1] "POSIXct" "POSIXt"
# ?DateTimeClasses # shows documentation
Entering ?DateTimeClasses
in the R Console provides an overview of both “POSIXt” classes.
And just as we saw with dates, figuring out what kind of object tnow
is, is non-trivial.
Just like dates, times appear somewhat similar to numbers, but not entirely:
# What kind of object is tnow?
typeof(tnow)
#> [1] "double"
is.numeric(tnow)
#> [1] FALSE
mode(tnow)
#> [1] "numeric"
To reveal the internal representation of times in R, we can apply the base R function unclass()
and the ds4psy function is_wholenumber()
to tnow
:
This shows that tnow
is represented as a number as well, but not as an integer.
As it turns out, time in R is internally measured in terms of the number of seconds (and sub-seconds) elapsed since some origin. This raises a profound question:
- What is the origin of time?
with a prosaic and predictable answer (when only considering R):
- Time is counted since the Unix epoch at “1970-01-01 00:00:00 UTC”.
This can be demonstrated by using the difftime()
function as follows:
(tx <- unclass(tnow)) # number of seconds (since epoch)
#> [1] 1662811996
difftime(tnow, as.POSIXct("1970-01-01 00:00:00", tz = "UTC"), units = "secs")
#> Time difference of 1662811996 secs
tx == difftime(tnow, as.POSIXct("1970-01-01 00:00:00", tz = "UTC"), units = "secs")
#> [1] TRUE
Thus, just like Sys.Date()
represented today’s date by counting the number of days elapsed since “1970-01-01”, the Sys.time()
function internally represents time as the number of seconds elapsed since “1970-01-01 00:00:00 UTC” (with UTC denoting the standard time zone of “Coordinated Universal Time”).
Creating (and representing) times
As we have seen for dates (i.e., objects of class “Date”, covered above in Section 10.2.3), we essentially face two distinct tasks when dealing with times:
We want to create new time objects by entering or parsing times (as strings).
We want to retrieve times (or particular elements of times) from existing time objects.
Both of these tasks require that we can denote and label particular elements of time. The system of specifying elements of times (and dates) in base R is using the POSIX standard.
The POSIX standard
The acronym POSIX stands for “Portable Operating System Interface” and defines standards for maintaining compatibility between different computer operating systems (with the “X” implying Unix, due to its independence from a particular manufacturer, see Wikipedia: POSIX for details).
This standard allows specifying particular components of calendar times (of class “POSIXct”) or local times (of class “POSIXlt”) by providing so-called “conversion specifications”. A conversion specification is a code for a particular element of date or time objects. In code, it is used inside a character string and introduced by the symbol %
, usually followed by a single letter (or O or E and then a single letter). Any character appearing in a format
string that is not part of a conversion specification is interpreted literally (and the double-symbol %%
yields %
).
Despite the POSIX standard, the exact details of the formats used in dates and times are platform-specific.
However, the following conversion specifications are likely to be widely available
(the example after the colon evaluates each specification for our current tnow
value of Wednesday, 2024-10-16, 22:07:17):
%a
: Abbreviated weekday name (in the current locale on this platform): Wed%A
: Full weekday name (…): Wednesday%b
: Abbreviated month name (…): Oct%B
: Full month name (…): October%C
: Century ast two-digit decimal number (00–99): the integer part of the year divided by 100: 20%d
: Day of the month as two-digit decimal number (01–31): 16%e
: Day of the month as an integer (1–31), with a prefix space for a single-digit number: 16%h
: Equivalent to%b
: Oct%H
: Hours as two-digit decimal number (00–23): 22%I
: Hours as two-digit decimal number (01–12): 10%j
: Day of year as three-digit decimal number (001–366): 290%m
: Month as two-digit decimal number (01–12): 10%M
: Minute as two-digit decimal number (00–59): 07%p
: AM/PM indicator in the locale. Used in conjunction with%I
and not with%H
: PM%S
: Second as two-digit integer (00–61), allowing for up to two leap-seconds: 17%u
: Weekday as single-digit decimal number (1–7, Monday is 1): 3%U
: Week of the year as two-digit decimal number (00–53) using Sunday as the first day 1 of the week (US convention): 41%V
: Week of the year as two-digit decimal number (01–53) as defined in ISO 8601: 42%w
: Weekday as single-digit decimal number (0–6, Sunday is 0): 3%W
: Week of the year as two-digit decimal number (00–53), using Monday as the first day of week (UK convention): 42%y
: Year without century (00–99), as two-digit decimal number. On input, values 00 to 68 are prefixed by 20 and 69 to 99 by 19 — that is the behavior specified by the POSIX standard, but they do also state that “in a future version the default century inferred from a 2-digit year will change”: 24%Y
: Year with century. Note that whereas there was no zero in the original Gregorian calendar, the ISO 8601:2004 standard defines it as being valid (interpreted as 1BC, see Wikipedia: zero (year). As input, only years0:9999
are accepted: 2024
For details on additional specifications, evaluate ?strftime()
on your system.
In practice, most users only use a few formatting strings — those that correspond to the conventions in their specific locale.
Here are examples that show the variants of conversion specifications for retrieving particular elements of a given time object (e.g., tnow
):
tnow # current time and date (defined above)
#> [1] "2024-10-16 22:07:17 CEST"
# Displaying date elements:
format(tnow, "%A/%a/%u") # weekday name/nr
#> [1] "Wednesday/Wed/3"
format(tnow, "%B/%b/%m") # month name/nr
#> [1] "October/Oct/10"
format(tnow, "%Y/%y/%C") # year 4-digits/2-digits/century
#> [1] "2024/24/20"
format(tnow, "%V/%j") # week/day of year
#> [1] "42/290"
# Displaying time:
format(tnow, "%H:%M:%S (%Z, %z)") # hour:min:sec (time zone info)
#> [1] "22:07:17 (CEST, +0200)"
format(tnow, "%H:%M %p") # time AM/PM
#> [1] "22:07 PM"
Despite its length, the list above is only partial — and most people do not remember it (but look up ?strftime
when they need a particular element).
The following codes and shortcuts for combinations are most useful in practice:
Option | Details | Example |
---|---|---|
%c |
Date and time (locale-specific on output, %a %b %e %H:%M:%S %Y on input) |
Wed Oct 16 22:07:17 2024 |
%D |
Date formatted as %m/%d/%y |
10/16/24 |
%F |
Date equivalent to %Y-%m-%d (ISO 8601 date format) |
2024-10-16 |
%T |
Time equivalent to %H:%M:%S |
22:07:17 |
%x |
Date (locale-specific output, %y/%m/%d on input) |
10/16/2024 |
%X |
Time (locale-specific output, %H:%M:%S on input) |
22:07:17 |
%z |
Signed offset from UTC (i.e., positive values are ahead of UTC) | +0200 |
%Z |
Time zone abbreviation (empty if not available) | CEST |
In practice, here are some examples of useful conversion specifications that are worth remembering:
Evaluate ?strftime
for a more comprehensive list.
Parsing and retrieving times
Armed with the POSIX standard, we can create time objects from text strings and interpret them in a variety of ways.
Here are some examples for parsing character strings into time objects (of the “POSIXt” class):
# parse times:
(tm1 <- as.POSIXct("10:30:45", format = "%H:%M:%S"))
#> [1] "2022-09-10 10:30:45 CEST"
(tm2 <- as.POSIXct("16:45", format = "%H:%M"))
#> [1] "2022-09-10 16:45:00 CEST"
(tm3 <- as.POSIXct("2.30 am", format = "%I.%M %p"))
#> [1] "2022-09-10 02:30:00 CEST"
(tm4 <- as.POSIXct("5 pm", format = "%I %p"))
#> [1] "2022-09-10 17:00:00 CEST"
Note that these times also have a date (and therefore all date-related elements, like a year, month, weekdays, etc.). As “POSIXct” times actually are calendar times (aka. date-times), the missing elements were filled in by today’s date. Thus, rather than entering only times, it typically makes more sense to enter date-times:
# parse date-times:
(dt1 <- as.POSIXct("2020-07-01 07:13:59"))
#> [1] "2020-07-01 07:13:59 CEST"
(dt2 <- as.POSIXct("24 12 20 19:29:60", format = "%d %m %y %H:%M:%S"))
#> [1] "2020-12-24 19:30:00 CET"
(dt3 <- as.POSIXct("24.12. 2020 7:30 AM", format = "%d.%m. %Y %I:%M %p", tz = "UTC"))
#> [1] "2020-12-24 07:30:00 UTC"
(dt4 <- as.POSIXct("2020-12-31 23:59:59", tz = "US/Eastern"))
#> [1] "2020-12-31 23:59:59 EST"
Having defined a time object, we can apply the same standards to retrieve particular elements of these times:
# Use POSIX standard to retrieve elements of times:
# tnow <- Sys.time() # (defined above)
tnow
#> [1] "2022-09-10 14:13:15 CEST"
# Default interpretation:
strptime(tnow, "%Y-%m-%d %H:%M:%S")
#> [1] "2022-09-10 14:13:15 CEST"
# ?strptime # for documentation
# Extracing parts of times:
format(tnow, "%H:%M:%S") # 24-hour time with seconds
#> [1] "14:13:15"
format(tnow, "%I:%M %p") # 12-hour time
#> [1] "02:13 PM"
format(tnow, "%Z (%z)") # time zone
#> [1] "CEST (+0200)"
# Extracting date elements from times:
format(tnow, "%Y") # 4-digit year
#> [1] "2022"
format(tnow, "%C") # century
#> [1] "20"
format(tnow, "%B") # month name
#> [1] "September"
format(tnow, "%A") # weekday name
#> [1] "Saturday"
# Combining elements:
explicate_time <- "%a, %Y-%m-%d %H:%M:%S %Z (%z from UTC)"
format(tnow, format = explicate_time)
#> [1] "Sat, 2022-09-10 14:13:15 CEST (+0200 from UTC)"
Specifying time zones
Different time zones add a local dimension to the notion of time and provide endless sources of complications for computing with dates and times. Many R functions that deal with dates or times provide a time zone argument (typically called tz
or tzone
).
If this is unspecified (or tz = ""
), most systems either use the time zone of our current system (which can be termined by evaluating Sys.timezone()
) or default to “UTC” (which stands for “Coordinated Universal Time”, see Wikipedia: UTC for details), a standard time zone often used in scientific contexts and roughly equivalent to “GMT” (Greenwich Mean Time). As a bonus, UTC does not use any daylight saving time (DST).
The base R function OlsonNames()
lists the 597 time zones available in R.
## Specify time zones:
Sys.timezone() # current system time zone
#> [1] "Europe/Berlin"
# OlsonNames() # all valid time zones
as.POSIXct("2020-12-24 23:59:59", tz = "GMT")
#> [1] "2020-12-24 23:59:59 GMT"
as.POSIXct("2020-12-24 23:59:59", tz = "Europe/London")
#> [1] "2020-12-24 23:59:59 GMT"
as.POSIXct("2020-12-24 23:59:59", tz = "Europe/Berlin")
#> [1] "2020-12-24 23:59:59 CET"
as.POSIXct("2020-12-24 23:59:59", tz = "Europe/Moscow")
#> [1] "2020-12-24 23:59:59 MSK"
as.POSIXct("2020-12-24 23:59:59", tz = "Africa/Addis_Ababa")
#> [1] "2020-12-24 23:59:59 EAT"
as.POSIXct("2020-12-24 23:59:59", tz = "Australia/Sydney")
#> [1] "2020-12-24 23:59:59 AEDT"
as.POSIXct("2020-12-24 23:59:59", tz = "US/Hawaii")
#> [1] "2020-12-24 23:59:59 HST"
Why bother with specifying time zones? There are two main reasons for doing so:
- Specifying time zones allows for automatic adjustments for daylight saving time (DST) when computing time differences (for the same location):
# Note: automatically adjusts for daylight saving time (DST):
t1 <- "2020-03-29 00:00:00" # before switch to DST
t2 <- "2020-03-29 03:00:00" # after switch to DST
# UTC (without time shift):
as.POSIXct(t2, tz = "UTC") - as.POSIXct(t1, tz = "UTC")
#> Time difference of 3 hours
# Germany (with daylight saving time, DST):
as.POSIXct(t2, tz = "Europe/Berlin") - as.POSIXct(t1, tz = "Europe/Berlin")
#> Time difference of 2 hours
- Specifying time zones allows computing time differences between particular times (at different locations):
The following example illustrates a special case: Same (nominal) time at two different locations:
# Shifting Xmas eve across locations:
txmas <- "2020-12-24 20:00:00"
as.POSIXct(txmas, tz = "UTC") - as.POSIXct(txmas, tz = "Australia/Sydney")
#> Time difference of 11 hours
as.POSIXct(txmas, tz = "UTC") - as.POSIXct(txmas, tz = "Europe/Berlin")
#> Time difference of 1 hours
as.POSIXct(txmas, tz = "UTC") - as.POSIXct(txmas, tz = "US/Pacific")
#> Time difference of -8 hours
We will return to the issue of time zone conversions below (when considering “POSIXlt” objects) and in discussing related lubridate functions (in Section 10.3.4).
Computing with times
Given that times are internally represented as numbers, it is clear that we can use them in numeric expressions.
Here are examples for comparing times and for using existing times for computing new times:
# compare times:
tm2 > tm1
#> [1] TRUE
tm3 > tm2
#> [1] FALSE
# Add or subtract:
tm1 - 30 # seconds
#> [1] "2022-09-10 10:30:15 CEST"
tm1 + (30 * 60) # minutes
#> [1] "2022-09-10 11:00:45 CEST"
tm1 + (2 * 60 * 60) # hours
#> [1] "2022-09-10 12:30:45 CEST"
As we have seen for dates, we can compute time differences as difftime
objects and express them in multiple units (but topping out at “weeks”):
# Time differences:
(td1 <- tm2 - tm1) # hours
#> Time difference of 6.2375 hours
(td2 <- dt4 - tm2) # days
#> Time difference of -617.4063 days
(td3 <- dt4 - tm1) # days
#> Time difference of -617.1464 days
# => difftime objects:
lubridate::is.difftime(td1)
#> [1] TRUE
# Note:
difftime(tm2, tm1, units = "weeks")
#> Time difference of 0.03712798 weeks
difftime(tm2, tm1, units = "days")
#> Time difference of 0.2598958 days
difftime(tm2, tm1, units = "hours")
#> Time difference of 6.2375 hours
# But note:
td3 == (td1 + td2) # is TRUE
#> [1] TRUE
# But:
all.equal(td1, (td2 + td3)) # shows mismatch in units
#> [1] "Attributes: < Component \"units\": 1 string mismatch >"
#> [2] "Mean relative difference: 198.9243"
dplyr::near(td1, (td2 + td3)) # is FALSE
#> [1] FALSE
A useful function that also works with times is trunc()
. It takes an argument x
, allows specifying the time-related units
of interest, and truncates integers of any smaller units to zero:
dt4 # from above
#> [1] "2020-12-31 23:59:59 EST"
# Truncating times:
trunc(dt4, units = "mins")
#> [1] "2020-12-31 23:59:00 EST"
trunc(dt4, units = "hours")
#> [1] "2020-12-31 23:00:00 EST"
trunc(dt4, "days")
#> [1] "2020-12-31 EST"
trunc(dt4, "months")
#> [1] "2020-12-01 EST"
trunc(dt4, "years")
#> [1] "2020-01-01 EST"
Note that we were only considering calculations involving existing calendar times (or date-time points) here. We will cover calculations involving particular time spans (i.e., adding or subtracting specific durations or time periods, specified in terms of minutes, hours, days, weeks, months, or years) in Section 10.3.
Local times
Time objects using the “POSIXlt” class are called local times. In contrast to the calendar time objects (of class “POSIXct”), local times (of class “POSIXlt”) are stored as lists. This allows for an easier extraction of specific componants of a time.
We can convert a calendar time (e.g., tnow
) into a local time by the as.POSIXlt()
function:
ctnow <- tnow
ltnow <- as.POSIXlt(tnow)
# Both times seem similar:
ctnow
#> [1] "2022-09-10 14:13:15 CEST"
ltnow
#> [1] "2022-09-10 14:13:15 CEST"
Superficially, both time constructs look the same. Their differences really are “under the hood”, in the way the corresponding time is being represented:
# yet are objects of different classes:
class(ctnow)
#> [1] "POSIXct" "POSIXt"
class(ltnow)
#> [1] "POSIXlt" "POSIXt"
lubridate::is.POSIXct(ctnow)
#> [1] TRUE
lubridate::is.POSIXlt(ltnow)
#> [1] TRUE
# and represented differently:
unclass(ctnow)
#> [1] 1662811996
unlist(ltnow)
#> sec min hour mday
#> "15.7323849201202" "13" "14" "10"
#> mon year wday yday
#> "8" "122" "6" "252"
#> isdst zone gmtoff
#> "1" "CEST" "7200"
We can create new local time objects using the POSIX standard (described above).
For instance, let’s define xm_lt
as 18:15:30 (in Oslo, Norway) on Xmas eve 2020:
xm_lt <- as.POSIXlt("2020-12-24 18:15:30", tz = "Europe/Oslo")
xm_lt
#> [1] "2020-12-24 18:15:30 CET"
Internally, POSIXlt
objects are lists:
# unclass(xm_lt) # returns all elements
unlist(xm_lt) # is more readable
#> sec min hour mday mon year wday yday isdst zone gmtoff
#> "30" "15" "18" "24" "11" "120" "4" "358" "0" "CET" NA
The representation of local times as lists allows for an easier extraction of specific components:
xm_lt$hour
#> [1] 18
xm_lt$mday
#> [1] 24
xm_lt$mon # note value!
#> [1] 11
xm_lt$year # note value!
#> [1] 120
Note that some values may be unexpected. This is due to certain definitions that may cause surprises:
mon
runs from 0–11 (rather than 1–12)year
is counted forwards from 1900
Note that truncating times (i.e., rounding off to different units) also works with POSIXlt
times:
trunc(tm_lt, "mins")
trunc(tm_lt, "hours")
trunc(tm_lt, "days")
trunc(tm_lt, "months")
trunc(tm_lt, "years")
Beyond being internally stored as lists, we can ask: What are local times for? To answer this, suppose you wanted to solve the following task:
- Task: Convert a given (calendar) time into the corresponding time of different time zone.
Our first attempt to solve this task (by using only “POSIXct” objects) would fail as follows:
# Attempt to convert a time into different time zones:
# tnow <- Sys.time() # (defined above)
tnow
#> [1] "2022-09-10 14:13:15 CEST"
(t_Berlin <- as.POSIXct(tnow, tz = "Europe/Berlin"))
#> [1] "2022-09-10 14:13:15 CEST"
(t_NYC_x <- as.POSIXct(t_Berlin, tz = "US/Eastern"))
#> [1] "2022-09-10 14:13:15 CEST"
We see that the two calendar times t_Berlin
and t_NYC_x
objects have different time zones, but show the same nominal time.
This demonstrates that simply taking a calendar time of a specific time zone to create a calendar time of a different time zone changes the time zone, but not the nominal time.
Thus, to actually convert times (or, actually, the display of times by location), we need a different approach. This is where the notion of local times (as in “POSIXlt”) comes into play:
Importantly, using the as.POSIXlt()
function has shifted the time display of t_NYC
(i.e., the displayed number of hours moved back by 6 units), but kept the actual point in time identical to that of t_Berlin
. When expressing both times (as “POSIXct” objects) in terms of UTC, we can see that they really denote the same point in time:
as.POSIXlt(t_Berlin, tz = "UTC")
#> [1] "2022-09-10 12:13:15 UTC"
as.POSIXlt(as.POSIXct(t_NYC), tz = "UTC")
#> [1] "2022-09-10 12:13:15 UTC"
Thus, taking a “POSIXct” object and a local time zone tz
as the input of as.POSIXlt()
changes both the time zone and the nominal time displayed into the corresponding time display of the local time zone.
To wrap up this section, here are some examples of converting a current (local) time — stored as the “POSIXct” object tnow
— into the corresponding time display in lots of different time zones:
# Convert a time into different time zones:
tnow # a "POSIXct" object
#> [1] "2022-09-10 14:13:15 CEST"
as.POSIXlt(tnow, tz = "NZ")
#> [1] "2022-09-11 00:13:15 NZST"
as.POSIXlt(tnow, tz = "Australia/Melbourne")
#> [1] "2022-09-10 22:13:15 AEST"
as.POSIXlt(tnow, tz = "Japan")
#> [1] "2022-09-10 21:13:15 JST"
as.POSIXlt(tnow, tz = "Asia/Shanghai")
#> [1] "2022-09-10 20:13:15 CST"
as.POSIXlt(tnow, tz = "Hongkong")
#> [1] "2022-09-10 20:13:15 HKT"
as.POSIXlt(tnow, tz = "Asia/Bangkok")
#> [1] "2022-09-10 19:13:15 +07"
as.POSIXlt(tnow, tz = "Asia/Jakarta")
#> [1] "2022-09-10 19:13:15 WIB"
as.POSIXlt(tnow, tz = "Asia/Calcutta")
#> [1] "2022-09-10 17:43:15 IST"
as.POSIXlt(tnow, tz = "Asia/Damascus")
#> [1] "2022-09-10 15:13:15 EEST"
as.POSIXlt(tnow, tz = "Asia/Tel_Aviv")
#> [1] "2022-09-10 15:13:15 IDT"
as.POSIXlt(tnow, tz = "Europe/Moscow")
#> [1] "2022-09-10 15:13:15 MSK"
as.POSIXlt(tnow, tz = "Europe/Berlin")
#> [1] "2022-09-10 14:13:15 CEST"
as.POSIXlt(tnow, tz = "Europe/London")
#> [1] "2022-09-10 13:13:15 BST"
as.POSIXlt(tnow, tz = "Iceland")
#> [1] "2022-09-10 12:13:15 GMT"
as.POSIXlt(tnow, tz = "US/Eastern")
#> [1] "2022-09-10 08:13:15 EDT"
as.POSIXlt(tnow, tz = "US/Central")
#> [1] "2022-09-10 07:13:15 CDT"
as.POSIXlt(tnow, tz = "US/Pacific")
#> [1] "2022-09-10 05:13:15 PDT"
as.POSIXlt(tnow, tz = "US/Hawaii")
#> [1] "2022-09-10 02:13:15 HST"
as.POSIXlt(tnow, tz = "Canada/Central")
#> [1] "2022-09-10 07:13:15 CDT"
as.POSIXlt(tnow, tz = "Brazil/West")
#> [1] "2022-09-10 08:13:15 -04"
as.POSIXlt(tnow, tz = "Pacific/Fiji")
#> [1] "2022-09-11 00:13:15 +12"
as.POSIXlt(tnow, tz = "Antarctica/South_Pole")
#> [1] "2022-09-11 00:13:15 NZST"
as.POSIXlt(tnow, tz = "Zulu")
#> [1] "2022-09-10 12:13:15 UTC"
We will re-consider the issue of time conversions below, when discussing the difference between the lubridate functions with_tz()
and force_tz()
(see Section 10.3.4).
Practice
Here are some practice tasks on dealing with dates and times in base R:
Rethinking dates and times:
If we were to write a function for
today()
ornow()
, which types of object (or R classes) should they return? Why?Why should we typically use the “Date” class — rather than the classes “POSIXct” or “POSIXlt” — to record someone’s date of birth (DOB) or date of death (DOD)?
Under which circumstances should we use “POSIXct” objects to record DOB or DOD data?
Under which circumstances should we use “POSIXlt” objects to record date- or time-related data?
Based on our insights into the internal representation of dates and times in R (as numbers of days or seconds from the Unix epoch, using “1970-01-01 00:00:00 UTC” as a shared reference point), what numbers do we expect to represent the date and time of
barely_69 <- "1969-12-31 23:59:59"
?
Hint: Evaluate the following expressions
barely_69 <- "1969-12-31 23:59:59"
unclass(as.Date(barely_69))
unclass(as.POSIXct(barely_69)) # Note tzone effect!
- Explaining
Sys.time()
- Predict, evaluate, and explain the following expressions in your console:
- What time zone was used to define
st
?
- Predict, evaluate, and explain the following expressions:
- Reading and interpreting time strings:
Enter the following times into “POSIXct” objects:
- “20:15” (of today)
- “6.30 pm” on Christmas Eve (this year)
- “23:59:59” on New Year’s Eve (this year, in New Zealand)
- “8.30 am” (yesterday)
- noon (in one week)
Hint: The last two times can be entered by specifying their dates or by adding or subtracting an appropriate time difference.
Solution
# parse date-times:
(t1 <- as.POSIXct("20:15", format = "%H:%M"))
(t2 <- as.POSIXct("2020-12-24 6.30 pm", format = "%Y-%m-%d %I.%M %p"))
(t3 <- as.POSIXct("2020-12-31 23:59:59", format = "%Y-%m-%d %H:%M:%S", tz = "NZ"))
(t4 <- as.POSIXct("8.30 am", format = "%I.%M %p") - (24 * 60 * 60))
(t5 <- as.POSIXct("12:00", format = "%H:%M") + (7 * 24 * 60 * 60))
Now retrieve the year, abbreviated month name, full weekday name, the week and day number (within the year), time zone, and the difference to the UTC for each of these times.
Solution
# Define all requested details (as POSIX format string):
date_details <- "year: %Y, month: %b, weekday: %A, week: %V, day: %j, tz=%Z (%z)"
format(t1, date_details)
format(t2, date_details)
format(t3, date_details)
format(t4, date_details)
format(t5, date_details)
- Flight duration: Different times at different locations.
On December 10, 2020, a Lufthansa flight is scheduled to depart from Frankfurt am Main (Germany, FRA) at 10:50 and arrive in New York City (USA, JFK) at 13:40.
- Compute the duration of the flight (in hours, minutes, and days).
Solution
# Flight duration:
t_departure <- as.POSIXct("2020-12-10 10:50:00", tz = "Europe/Berlin")
t_arrival <- as.POSIXct("2020-12-10 13:40:00", tz = "US/Eastern")
# time difference:
t_arrival - t_departure
difftime(t_arrival, t_departure, units = "hours")
# other units:
difftime(t_arrival, t_departure, units = "days")
difftime(t_arrival, t_departure, units = "mins")
- Time travels:
Suppose we want to watch the New Year fireworks of Sydney, Australia, on a TV screen in
- New Zealand (NZ)
- Calcutta (Asia)
- Paris (Europe)
- New York City (USA)
- Hawaii (USA)
What time is it at these locations?
Solution
We first define the time of interest (at the location of interest) as a calendar time (of class “POSIXct”):
(t_Sydney <- as.POSIXct("2020-01-01 00:00:01", tz = "Australia/Sydney"))
#> [1] "2020-01-01 00:00:01 AEDT"
In a second step, we transform this time into local time objects (of class “POSIXlt”) with the appropriate time zone tz
:
as.POSIXlt(t_Sydney, tz = "NZ") # New Zealand
as.POSIXlt(t_Sydney, tz = "Asia/Calcutta") # Calcutta, Asia
as.POSIXlt(t_Sydney, tz = "Europe/Paris") # Paris, EU
as.POSIXlt(t_Sydney, tz = "US/Eastern") # NYC, USA
as.POSIXlt(t_Sydney, tz = "US/Hawaii") # Hawaii, USA
- Clustering time zones:
Note the large variety of European cities in OlsonNames()
.
- Can you cluster them into categories so that all cities in a cluster use the same local time (i.e., share a time zone)?
Hint: An automated solution of this task would first use our skills in using regular expressions (see Appendix E) to detect or extract all OlsonNames()
containing “Europe”, then convert a given time into local times in all these time zones, and finally sort them into clusters that show the same (local) times.
Here is a trial-and-error solution that only considers some time zones:
Solution
(UTC_12 <- as.POSIXct("12:00:00", format = "%H:%M:%S", tz = "UTC")) # define some date-time
# Practice task: Cluster time zones in Europe:
as.POSIXlt(UTC_12, tz = "Europe/Moscow")
as.POSIXlt(UTC_12, tz = "Europe/Athens")
as.POSIXlt(UTC_12, tz = "Europe/Istanbul")
as.POSIXlt(UTC_12, tz = "Europe/Vienna")
as.POSIXlt(UTC_12, tz = "Europe/Rome")
as.POSIXlt(UTC_12, tz = "Europe/Oslo")
as.POSIXlt(UTC_12, tz = "Europe/Stockholm")
as.POSIXlt(UTC_12, tz = "Europe/Madrid")
as.POSIXlt(UTC_12, tz = "Europe/Lisbon")
as.POSIXlt(UTC_12, tz = "Europe/London")
as.POSIXlt(UTC_12, tz = "Europe/Dublin")
Note: As we failed to set a date in defining UTC_12
, its date-related elements were automatically set to today’s date (i.e., Sys.Date()
, evaluating to 2024-10-16). During the summer months, the results of our conversions into local times also incorporate time shifts due to daylight saving time (DST) in the locations specified.
Re-thinking calendar times vs. local times:
- Predict, compare, and explain the results of the following commands:
Hint: Watch out for the differences between calendar times (“ct”) and local times (“lt”).
t_x <- "2020-12-24 18:30:45" # a character string
(tc1 <- as.POSIXlt(as.POSIXct(t_x, tz = "Europe/London"), tz = "US/Eastern"))
#> [1] "2020-12-24 13:30:45 EST"
(tc2 <- as.POSIXct(as.POSIXlt(t_x, tz = "Europe/London"), tz = "US/Eastern"))
#> [1] "2020-12-24 18:30:45 EST"
(tc3 <- as.POSIXlt(tc2, tz = "Europe/London"))
#> [1] "2020-12-24 23:30:45 GMT"
Solution
tc1
evaluates to 2020-12-24 13:30:45 — a local time (i.e., a “POSIXlt” object), as a list.
The time is the result of convertingt_x
(in Europe/London) into the corresponding time at a different location (e.g., NYC, using US/Eastern time zone).tc2
evaluates to 2020-12-24 18:30:45 — a calendar time (i.e., a “POSIXct” object). It took a local time oft_x
(in London, UK) as its input and provided the same time in a different location (in the US/Eastern time zone).tc3
evaluates to 2020-12-24 23:30:45 — a local time (i.e., a “POSIXlt” object), as a list.
This time is the result of converting the calendar timetc2
(in US/Eastern) into the corresponding time at a different location (in Europe/London).
As these differences are subtle, but important, here are some code snippets that help exploring these time objects:
# Reading the string t_x as a calendar time:
as.POSIXct(t_x, tz = "Europe/London")
# Converting 12:30:45 (London) to a local time in Eastern USA (e.g., NYC):
tc1 # is a local time (POSIXlt) object
unlist(tc1) # tc1 is a list
tc2 # took a local time as its input, but returns the same time
# as a calendar time `POSIXct` with tz = "US/Eastern"
# lubridate::is.POSIXct(tc2)
as.POSIXlt(t_x, tz = "Europe/London") # input was a local time
unclass(tc2) # a calender time
# Converting "2020-12-24 12:30:45 EST" to a local time in London, UK:
tc3
unlist(tc3) # a local time
Overall, we have seen that base R provides ample support for creating and computing with dates and times. At the same time, dealing with three different classes and their relations to other data types (specifically: numbers) is challenging. Although most of the complications come from the inherent complexities of dates and times, the base R classes and commands for dates and times have been developing over time and appear somewhat dated today.61
In the next section, we will see that the lubridate package provides not only a more modern, but also a more user-friendly approach to dealing with dates and date-times.
References
See the discussions on Zeno’s paradoxes (e.g., in SEP or Wikipedia) for the philosophical implications of these concepts, and Section 10.7.5 for readable articles on the philosophy of time.↩︎
As R is an object-oriented language, its objects are instances of particular classes (e.g., the integer 2 is an instance of the class “numeric”, the letter A is an instance of class “character”, etc.). The function
unclass(x)
removes the class attribute from an objectx
and provides us with a glimpse of what lies “under the hood”. When using R (or any other symbol system), we usually do not want to reflect on the details of the underlying representation. Even when aware that there exist multiple layers of representation, we normally do not want to talk about or reflect upon them. Instead, we typically use objects as objects without wondering or caring about how they are implemented. However, usingunclass(x)
becomes useful when we pause our normal usage and want to look beyond (or behind) the ordinary object to reveal the nature of its representation (i.e., see whatx
really is, in terms of its implementation).↩︎The ds4psy function
is_wholenumber()
does whatis.integer()
only appears to do: It checks whether its argument is an integer or “whole” number.↩︎For dating advice, better do that internet thing.↩︎
Pardon the timely puns.↩︎