12.5 Lab 14: Time in R
The standard format for dates is 2011-01-14 03:01:22 GMT
(see difference between posixct and posixlt). GMT is the respective time zone. An overview of the time zones in R can be found here:
https://www.rdocumentation.org/packages/splusTimeDate/versions/2.5.0-137/topics/timeZoneList
The standard function to parse dates is as.Date()
in which we can use different codes:
%d
Day of the month (decimal number)%m
Month (decimal number)%b
Month (abbreviated)%B
Month (full name)%y
Year (2 digit)%Y
Year (4 digit)
Some examples:
as.Date('1915-6-16')
## [1] "1915-06-16"
as.Date('1990/02/17')
## [1] "1990-02-17"
as.Date('April 26, 2001',format='%B %d, %Y')
## [1] "2001-04-26"
as.Date('1/15/2001',format='%m/%d/%Y')
## [1] "2001-01-15"
Another one is strptime
# How could we convert the following? "Sat Jun 06 08:22:34 PDT 2009"
x <- "Fri, 14 Jan 2011 03:01:22 +0000"
strptime(x, "%a, %d %b %Y %H:%M:%S %z", tz = "GMT")
## [1] "2011-01-14 03:01:22 GMT"
x <- "Sat Jun 06 08:22:34 PDT 2009"
strptime(x, "%a %b %d %H:%M:%S %z %Y")
## [1] NA
x <- gsub("PDT", "", x)
strptime(x, "%a %b %d %H:%M:%S %Y", tz = "us/pacific")
## [1] "2009-06-06 08:22:34 PDT"
The package lubridate (examples taken from here) facilitates working with dates in R. First we import the dataset we already know.
library(DBI)
db <- dbConnect(RSQLite::SQLite(), "./www/tweets-sentiment-db.sqlite")
dbListTables(db)
## [1] "table_tweets"
dbListFields(db, "table_tweets")
## [1] "target" "ids" "date" "flag" "user" "text"
data <- dbGetQuery(db, "SELECT target, ids, date, user, text
FROM table_tweets
ORDER BY RANDOM()
LIMIT 10000")
names(data)
## [1] "target" "ids" "date" "user" "text"
# and we close the connection for now
# because we stored it in the data object
dbDisconnect(db)
It contains functions usable to parse dates such as ymd()
and mdy()
.
library(lubridate)
ymd(20101215)
## [1] "2010-12-15"
mdy("4/1/17")
## [1] "2017-04-01"
## heterogeneous formats
x <- c(20100101120101, "2009-01-02 12-01-02", "2009.01.03 12:01:03",
"2009-1-4 12-1-4",
"2009-1, 5 12:1, 5",
"200901-08 1201-08",
"2009 arbitrary 1 non-decimal 6 chars 12 in between 1 !!! 6",
"OR collapsed formats: 20090107 120107 (as long as prefixed with zeros)",
"Automatic wday, Thu, detection, 10-01-10 10:01:10 and p format: AM",
"Created on 10-01-11 at 10:01:11 PM")
ymd_hms(x)
## [1] "2010-01-01 12:01:01 UTC" "2009-01-02 12:01:02 UTC"
## [3] "2009-01-03 12:01:03 UTC" "2009-01-04 12:01:04 UTC"
## [5] "2009-01-05 12:01:05 UTC" "2009-01-08 12:01:08 UTC"
## [7] "2009-01-06 12:01:06 UTC" "2009-01-07 12:01:07 UTC"
## [9] "2010-01-10 10:01:10 UTC" "2010-01-11 22:01:11 UTC"
It contains functions to get and set components of a date-time, such as year()
, month()
, mday()
, hour()
, minute()
and second()
:
bday <- dmy("14/10/1979")
month(bday)
## [1] 10
wday(bday, label = TRUE)
## [1] Sun
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
year(bday) <- 2016
wday(bday, label = TRUE)
## [1] Fri
## Levels: Sun < Mon < Tue < Wed < Thu < Fri < Sat
It contains helper function to handle time zones:
time <- ymd_hms("2010-12-13 15:30:30")
time
## [1] "2010-12-13 15:30:30 UTC"
# Changes printing
with_tz(time, "America/Chicago")
## [1] "2010-12-13 09:30:30 CST"
# Changes time
force_tz(time, "America/Chicago")
## [1] "2010-12-13 15:30:30 CST"
And the functions duration()
to measure the time between two points, periods()
to accurately track clock times despite leap years, leap seconds, and day light savings time and intervals()
to provide a summary of the time information between two points.
duration(90, "seconds") # Separate period and units vectors
## [1] "90s (~1.5 minutes)"
duration(second = 90) # Units as arguments
## [1] "90s (~1.5 minutes)"
duration("2M 1sec") # parsing
## [1] "121s (~2.02 minutes)"
period(c(90, 5), c("second", "minute")) # Separate period and units vectors
## [1] "5M 90S"
period (second = 90, minute = 5)# Units as arguments
## [1] "5M 90S"