Chapter 5 Data extraction, transformation, and loading: Overview

Topics:

Tidy data
tsibble objects for storing, manipulating, and visualizing time series data. Frequency of time series: the index parameter. key parameter(s).
Applying dplyr verbs to tsibble objects: filter, select, mutate, group_by, summarize
Periodicity. Seasonal periods.

Readings:

FPP, Section 2.1
Optional: To learn how to wrangle and visualize data using the Tidyverse packages, you may find it useful to go through the Tidyverse Fundamentals with R modules on Datacamp. Datacamp also offers a range of other learning modules for developing data science skills in R.

5.1 ETL strategy: Design your end-point data table(s)

Starting point: Multiple source files, mess, etc. This is real life as a data scientist!

What’s your desired end point?

Data ETL is a creative activity! (Your jobs are secure.)

Date + time	Series	Value_1	Value_2	Value_3
2020-02-01	“Virginia”	33.57	29	“friendly”
2020-02-01	“Idaho”	0.22	18	“hostile”
…	…	…	…	…
`index`	`key`
[date]	[fctr]	[dbl]	[int]	[fctr]

Then wrangle your data to get to your desired end point.

Recommended practices: