4.2 Date-time components
This section will focus on the accessor functions that let you get and set individual components of a date / datetime.
4.2.1 Accessing components
To pull out individual parts of the date with the accessor functions, use: year()
, month()
, mday()
(day of month), yday()
(day of year), wday()
(day of week), hour()
, minute()
, second()
:
datetime <- ymd_hms("2016-07-08 12:34:56")
year(datetime)
#> [1] 2016
month(datetime)
#> [1] 7
mday(datetime)
#> [1] 8
yday(datetime)
#> [1] 190
wday(datetime)
#> [1] 6
hour(datetime)
#> [1] 12
minute(datetime)
#> [1] 34
second(datetime)
#> [1] 56
For month()
and wday()
you can set label = TRUE
to return the abbreviated name of the month or day of the week and convert it to a factor. Set abbr = FALSE
to return the full name. This is useful when plotting in ggplot2 because you want a certain order
month(datetime, label = T)
#> [1] Jul
#> 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
wday(datetime, label = T, abbr = F)
#> [1] Friday
#> 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
通过 wday()
函数,我们可以知道在工作日出发的航班要多于周末出发的航班:
再看一个使用 minute()
函数获取分钟成分的例子。比如我们想知道出发时间的分钟数与平均到达延误时间的关系:
flights_dt %>%
mutate(minute = minute(dep_time)) %>%
group_by(minute) %>%
summarize(avg_delay = mean(arr_delay, na.rm = T)) %>%
ggplot(aes(minute, avg_delay))+
geom_line()
我们可以发现一个有趣的趋势,似乎在 20 ~ 30 分钟和第 50 ~ 60 分钟出发的航班的到达延误时间远远低于其他时间出发的航班。
4.2.2 Rounding
An alternative approach to plotting individual components is to round the date to a nearby unit of time, with round_date()
, floor_date()
and ceiling_date()
. Each function takes a vector of dates to adjust and then the name of the unit round down (floor), round up (ceiling), or round to. This, for example, allows us to plot the number of flights per week:
flights_dt %>%
transmute(dep_time,
week = floor_date(dep_time, "week")) %>%
ggplot(aes(week))+
geom_bar()
Note that unlike accessor functions, rounding functions still return a complte time unit, not individual components.
More examples:
x <- ymd_hms("2009-08-03 12:01:59.23")
round_date(x, ".5s")
#> [1] "2009-08-03 12:01:59 UTC"
round_date(x, "sec")
#> [1] "2009-08-03 12:01:59 UTC"
round_date(x, "second")
#> [1] "2009-08-03 12:01:59 UTC"
round_date(x, "minute")
#> [1] "2009-08-03 12:02:00 UTC"
round_date(x, "5 mins")
#> [1] "2009-08-03 12:00:00 UTC"
round_date(x, "hour")
#> [1] "2009-08-03 12:00:00 UTC"
round_date(x, "2 hours")
#> [1] "2009-08-03 12:00:00 UTC"
round_date(x, "day")
#> [1] "2009-08-04 UTC"
round_date(x, "week")
#> [1] "2009-08-02 UTC"
round_date(x, "month")
#> [1] "2009-08-01 UTC"
round_date(x, "bimonth") ## 舍入到1月、3月、5月、7月、9月和11月上
#> [1] "2009-09-01 UTC"
round_date(x, "quarter") == round_date(x, "3 months")
#> [1] TRUE
round_date(x, "halfyear")
#> [1] "2009-07-01 UTC"
round_date(x, "year")
#> [1] "2010-01-01 UTC"
4.2.3 Setting components
You can also use each accessor function to set the components of a date/time::
datetime <- ymd_hms("2016-07-08,12:34:56")
year(datetime) <- 2020
month(datetime) <- 11
mday(datetime) <- 05
hour(datetime) <- 01
datetime
#> [1] "2020-11-05 01:34:56 UTC"
Alternatively, rather than modifying in place, you can create a new date-time with update()
. This also allows you to set multiple values at once, the api is similar to make_datetime()
.
datetime <- ymd_hms("2016-07-08,12:34:56")
update(datetime,year = 2000, month = 11, mday = 05, hour = 01)
#> [1] "2000-11-05 01:34:56 UTC"
如果修改yday
,相当于同时修改 mday
和 month
:
datetime <- ymd_hms("2016-07-08,12:34:56")
update(datetime, yday = 1)
#> [1] "2016-01-01 12:34:56 UTC"
If values are too big, they will roll-over:
ymd("2015-02-01") %>%
update(mday = 30)
#> [1] "2015-03-02"
ymd("2015-02-01") %>%
update(hour = 400)
#> [1] "2015-02-17 16:00:00 UTC"
update()
函数还有一种比较巧妙的用法,比如我们想可视化一年中所有航班的的出发时间在一天中的分布:
flights_dt %>%
transmute(dep_hour = update(dep_time, yday = 1)) %>%
ggplot(aes(dep_hour)) +
geom_freqpoly(binwidth = 60 * 5) + # 1 bin per 5 minutes
scale_x_datetime(breaks = scales::breaks_width("3 hours"),
label = scales::label_date_short()) +
labs(title = "All flight dep time in a day")
如果不用 update()
,我们可能需要先用hour()、minute()、second()
获取三种成分,然后再用make_datetime()
对这三种成分进行合并。
4.2.4 Exercises
flights_dt %>%
mutate(month = month(dep_time, label = TRUE), # this means month is now a factor
dep_time = update(dep_time, yday = 1)) %>% # yday can be an arbitary number
ggplot(aes(dep_time)) +
geom_freqpoly(binwidth = 60 * 60) +
scale_x_time(labels = scales::label_time()) +
facet_wrap(vars(month), nrow = 4)
flights_dt %>%
mutate(weekday = wday(dep_time, label = T, abbr = T)) %>%
group_by(weekday) %>%
summarize(delay_prob = mean(arr_delay > 0, na.rm = T)) %>%
ggplot(aes(weekday,delay_prob)) +
geom_line(aes(group = 1))