Chapter 40 Dates and Times in R

The horrors of dates and times (especially times and time zones) is a bane of computing. Dates have changed by government fiat (Gregorian to Julian, etc.) several times in history, and time zones and daylight savings time practices change more frequently. Time zones To add to the complications, there are many, many date formats around the world, with different countries and cultures mixing and matching days, months, and years willy-nilly to indicate a particular date. For this reason, there is an international standard, ISO 8601, for dates and times. The standard calls for listing all dates in YYYY-MM-DD format, with numbers for the 4 digit year, 2 digit month, and 2 digit day, as in 2021-04-17 for April 17, 2021. All numbers are zero-padded (necessary zeroes added to the left) to maintain 4 or two digits as needed. Use this format whenever you can, for consistency and to avoid errors that will be painful later. Insist that all of your collaborators use ISO8601 format, and your life will be vastly improved.

Time in ISO8601 uses a 24 hour clock, and is formatted in two ways: the basic format is T[hh][mm][ss] and the extended format is T[hh]:[mm]:[ss]. Each number in a time is two digits, and is zero-padded when needed, so that one second after midnight is T00:00:01.

When possible, use only dates in your data analysis. Times are complex and surprisingly painful, due to unusual cases that frequently cause trouble. These include:

  • GMT (the time zone for Greenwich mean time) is close to, but not the same as UTC. Coordinated Universal Time (UTC) has no daylight savings, while GMT does.

  • There are leap years, and leap seconds, to keep track of.

  • January 30 + 1 month = NA, because there is no Feb 30.

  • There are more than 24 time zones, because of episodic local changes. For example, tz = “America/Detroit” is not quite the same as Eastern Standard Time, as Michigan had daylight savings from 1916-1968, not from 1969-72, then returned to DST in 1973 and thereafter.

  • With all of the odd geopolitical time zone and DST variations, there are now more than 600 distinct valid time zones that R keeps track of, and you can print these with the base R function OlsonNames(). Arthur David Olson was the original maintainer of the time zone database, which is now managed by the Internet Assigned Numbers Authority (IANA), currently led by Paul Eggert at UCLA.

Unless you absolutely need time stamps to track intervals of less than a day, I strongly urge you to convert all date-times to simple ISO8601 dates.

TZ
TZ

40.1 Data Types for Dates and Times

Data for dates and times comes in several formats. - POSIXct - calendar time (also called continuous time or count time, as it counts seconds) - POSIXlt - local time (also called list time) - Date - date-time

POSIX is probably unfamiliar to you - it stands for the Portable Operating System Interface for uniX used to make standards for UNIX across Operating systems.

POSIX Dates use the reference point of 1970-01-01 T 0:00:00 UTC (midnight at the start of the year 1970 in coordinated universal time) as zero seconds. All dates after that are counted in seconds in POSIXct. Dates prior to 1970 are counted in negative seconds. POSIXct is very memory-efficient for storing dates, but not very human-readable.

In the code chunk below, we will start with a ISO 8601 date, with a time zone. We will then print it out, and then show it as it is stored, are using the unclass function. Copy and run the code chunk to see the results.

date1 <- as.POSIXct("2018-01-16 04:34:45", 
                    tz = "America/Detroit")
date1
## [1] "2018-01-16 04:34:45 EST"
unclass(date1)
## [1] 1516095285
## attr(,"tzone")
## [1] "America/Detroit"

You get a nicely formatted ISO8601 date when you print the date, but when it is unclassed, you get a large integer, which corresponds to the number of seconds since midnight, UTC on January 1, 1970. POSIXct dates are stored as integer seconds.

40.2 Using POSIXlt

POSIXlt is another version of the POSIX standard, but in a more human-readable format. The ‘lt’ stands for local time, but it is often thought of as list time, as the data of POSIXlt is stored in a list format of human-readable items.

POSIXlt stores separate items in a list. The list includes 10 distinct items:

  • seconds (sec)
  • minutes (min)
  • hours (hour)
  • day of month (mday)
  • month (mon) - numeric, with January as zero, and December as 11
  • year (year) - since 1900 as zero, so the year 2018 = 118
  • weekday numeric (wday), with Sunday = 0, Saturday = 6
  • daylight savings (isdst), 0 for no, 1 for yes
  • time zone (zone)
  • Offset from UTC in seconds (gmtoff)

In the code chunk below, we will start with the same ISO 8601 date, with a time zone. We will then print it out, and then show it as it is stored, are using the unclass function. We will then extract the month and year. Copy and run the code chunk to see the results.

date2 <- as.POSIXlt(date1)
date2
## [1] "2018-01-16 04:34:45 EST"
unclass(date2)
## $sec
## [1] 45
## 
## $min
## [1] 34
## 
## $hour
## [1] 4
## 
## $mday
## [1] 16
## 
## $mon
## [1] 0
## 
## $year
## [1] 118
## 
## $wday
## [1] 2
## 
## $yday
## [1] 15
## 
## $isdst
## [1] 0
## 
## $zone
## [1] "EST"
## 
## $gmtoff
## [1] -18000
## 
## attr(,"tzone")
## [1] "America/Detroit" "EST"             "EDT"            
## attr(,"balanced")
## [1] TRUE
date2$mon
## [1] 0
date2$year
## [1] 118

Again, we get a nicely formatted ISO8601 date when you print the date, but when it is unclassed, you get a list of date items. When we extract the month and year, these can be surprising (January = 0, Year 2018 = 118). POSIXlt dates are stored as a list of date items.

40.3 Formatting dates

  • as.Date
  • strptime
  • formatting codes
  • the parsedate package

40.3.1 Code Chunk Icons

Code Chunk Icons
Icon Uses
Settings Gear

Allows you to

  • Name the code chunk

  • Set options for this chunk

Run Chunks Above (down arrow) Runs all of the preceding code chunks, including the setup chunk
Run Chunk (rightward arrow) Runs the entire current chunk

40.4 Including Plots

covid <- medicaldata::covid_testing
covid %>% 
  ggplot() +
  aes(x = pan_day, y = ct_result) +
  geom_point() +
  labs(title = "COVID Testing in First 100 Days of Pandemic",
       x = "Pandemic Day, 2020",
       y = "Cycle Threshold \n45 is a Negative Test")

40.5 Including Tables

covid %>% 
  count(demo_group, gender) %>% 
  gt() %>% 
  tab_header(title = "Demographics of COVID Testing",
             subtitle = "By Group and Gender") %>% 
  tab_source_note(source_note = "From CHOP, 2020") %>% 
  cols_label(demo_group = "Group",
             gender = "Gender",
             n = "Count")
Demographics of COVID Testing
By Group and Gender
Group Gender Count
client female 314
client male 290
misc adult female 1214
misc adult male 1227
other adult female 126
other adult male 97
patient female 6178
patient male 6077
unidentified male 1
From CHOP, 2020

40.6 Other languages in code chunks

40.7 Code Chunk Options

Code Chunk Options
Option Values Output
eval TRUE/FALSE Whether or not the code is run.
echo TRUE/FALSE Show or hide the code
include TRUE/FALSE Whether or not the resulting output of a code chunk is displayed in the document. FALSE means that the code will run, but will not display results. include = FALSE is often used for the setup chunk.
warning TRUE/FALSE Whether warnings generated from your code will be displayed in the document.
message TRUE/FALSE Whether messages generated from your code will be displayed in the document.
fig.align default, left, right, center Where on the page the output figure should align. Text options should be in quotes, like
f ig.align = "right"
fig.width default = 7 figure width in inches
fig .height default = 7 figure height in inches
error TRUE/FALSE If TRUE, will not stop building the document if there is an error in a code chunk.
cache TRUE/FALSE If TRUE, will store the results and not re-run the chunk. Helpful for long, slow calculations. But watch out for this if your data change and your results do not(!!).

40.8 Try Out Other Chunk Options

Try adding different chunk options, including

  • include = FALSE

  • eval = FALSE, echo = TRUE

  • eval=TRUE, echo = FALSE

40.9 The setup chunk

40.10 The Easy Button - Visual Markdown Editing

The mean cycle threshold in this study was


The standard deviation of the cycle threshold in this study was


:::

40.11 A Quick Quiz

  1. Which code chunk option hides the code?
  2. Which code chunk always comes first, and includes libraries and data import steps?
  3. What is the name of the code block (and the markup language it is written in), set off with 3 hyphens(—) at the very top of your Rmarkdown document, that tells pandoc how to format the final document?
  4. What symbols do you use to make text bold in Rmarkdown?
  5. Which {knitr} function do you use to add images to your document with a code chunk?