What then is time?
If no one asks me, I know.
If I wish to explain it to one that asketh, I know not…
St. Augustine (~400AD), The Confessions, Book XI
Having braved the challenges of text data (in Chapter 9), there are at least two other prominent data types that we have not yet addressed so far: Dates and times. As we have seen for text, our ignorance of these data types is partly due to our number-centric perspective on computers and science. Another reason why time-related data appears so late in this book — and often is absent from introductions to data or computer science — is that working with dates and times tends to be difficult and complicated. In fact, any serious attempt at dealing with time will sooner or later reach the cognitive impasse expressed in the popular quote of St. Augustine: We seem to know an awful lot about time, yet struggle to explicate its nature and the relevant details.
Considering some examples illustrate some of the challenges inherent in quantifications of dates and times:
The year “2020” is defined by the Gregorian calendar used in most of the world today (i.e., as of 2020-07-30). This calendar was introduced in October 1582 as a revision of the Julian calendar, which was introduced 46 BC as a reform of the Roman calendar, which was based on ancient Greek calendars, etc. The Gregorian date of any event happening in the years from 1901 to 2099 is 13 days ahead of the corresponding date in the Julian calendar.
In our calendar, every year that is exactly divisible by four is a leap year, except for years that are exactly divisible by 100, but these centurial years are leap years if they are exactly divisible by 400. For example, the years 1800, 1900, and 2100 are not leap years, but the years 1600, 2000, and 2400 are leap years.
The year “2020” can be expressed as “MMXX” in Roman numerals. More generally, dates and times contain numerals that represent numbers (like “2” or “20” in “2020”, or “10”, “3”, and “0” in “10:30”), but are also represented as character strings. These strings often contain additional names (e.g., “Monday” or “July”) in a specific language and abstract symbols (e.g., punctuation marks “:” or “/”). Despite this wild mix of numbers, alphabetic characters, and other symbols, we often want to calculate with dates and times.
When representing dates and times in numeric form, doing arithmetic is complicated by local differences, a multiplicity of units, and countless cultural and historic conventions. If scheduling meetings often seems hard, so is calculating their dates from a computer’s perspective. When agreeing to meet “every 2 weeks”, the time difference between subsequent meetings is 14 days. But when agreeing to meet “every month”, the time difference between meetings varies based on the number of days of each month. And for solving a basic task like computing someone’s current age (e.g., in years), we first need to know our current coordinates in time and space to obtain an accurate result. And if these tasks do not yet seem daunting, start thinking about time zones, leap years, and leap seconds…
Thus, dates and times are context-based and can get complicated, for many good reasons:
they depend on various units of magnitude and measurement (e.g., years, months, days, etc., for dates, and hours, minutes, seconds, etc., for times);
their values depend on locations (e.g., time zones, daylight saving time);
their values and elements depend on cultural conventions (e.g., weeks starting on Sunday vs. on Monday) and languages (e.g., “saturday”, “Samstag”, or “samedi”);
they depend on pragmatic considerations — like our current perspective (e.g., an event \(x\) that happened “yesterday” has a specfic time, while “let’s do \(x\) tomorrow” typically denotes the day within \(x\) is to take place) and communicative intents (e.g., in “two weeks” or “a fortnight” differs from “bi-monthly”).
Fortunately, R provides a range of pretty good answers to St. Augustine’s question. From a computer’s perspective, dealing with all this messy human stuff requires appropriate data types, standardization, and a range of tools of sufficient technical sophistication (e.g., for computing time spans and rounding to different units of time). When dealing with text data (in Chapter 9), we saw that a key tool and skill for detecting and handling patterns consists in mastering regular expressions (see Appendix E). For dates and times, the elephant in the room is provided by the so-called “POSIX” standard (see Section 10.2). As we will see, R provides rich functionality for objects that conform to this standard and corresponding functions for entering and extracting date- and time-related information.
But beyond the technical details of particular standards, tools, and R packages, dealing with dates and times and also requires that we think as clearly as possible about the domain of time and its corresponding tasks. This chapter identifies some of these tasks and challenges, but mostly covers the key data types and various tools for dealing with dates and times.
After working through this chapter, you should be able to:
- understand the basic units of dates and times,
- distinguish between different time zones and notions of time spans,
- understand and use essential date and time classes of base R,
- understand and use key date and time functions of lubridate,
- create date and time variables from various inputs,
- perform basic computations with date and time variables,
- use simple ds4psy functions to query dates and times.
This chapter mostly uses the functions and some data provided by the lubridate package (Spinu et al., 2020).
In addition, the
sample_time() functions from the ds4psy package (Neth, 2020) are used for creating practice data.
10.1.3 Getting ready
This chapter formerly assumed that you have read and worked through Chapter 16: Dates and times of the r4ds book (Wickham & Grolemund, 2017). It now can be read by itself, but reading Chapter 16 of r4ds is still recommended.
Please do the following to get started:
Structure your document by inserting headings and empty lines between different parts. Here’s an example how your initial file could look:
Create an initial code chunk below the header of your
.Rmdfile that loads the R packages of the tidyverse (and see Section F.3.3 if you want to get rid of the messages and warnings of this chunk in your HTML output).
Save your file (e.g., as
nr_name.Rmdin the R folder of your current project) and remember saving and knitting it regularly as you keep adding content to it.
Neth, H. (2020). ds4psy: Data science for psychologists. Retrieved from https://CRAN.R-project.org/package=ds4psy
Spinu, V., Grolemund, G., & Wickham, H. (2020). lubridate: Make dealing with dates a little easier. Retrieved from https://CRAN.R-project.org/package=lubridate
Wickham, H., & Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data. Retrieved from http://r4ds.had.co.nz