Chapter 10 Reading in data from source files

10.1 First, examine text files in a text editor

It’s generally a good idea to examine the contents of an unfamiliar text data file visually, in text editor, before starting to work on it. You can open smaller files inside RStudio. For files too big to open inside RStudio, you can use a dedicated text editor like Atom.

This code prints out the first ten lines of the text file “PetroStocks.csv”.

# Print out first ten rows of file "data/PetroStocks.csv"
readLines("data/PetroStocks.csv", n=10) %>% paste0(collapse="\n") %>% cat
## Back to Contents,Data 1: Petroleum Stocks,,,,,,,,,,,,,,,,,,,
## Sourcekey,WCRSTUS1,WCESTUS1,WCSSTUS1,WGTSTUS1,WGRSTUS1,WG4ST_NUS_1,WBCSTUS1,W_EPOOXE_SAE_NUS_MBBL,WKJSTUS1,WDISTUS1,WD0ST_NUS_1,WD1ST_NUS_1,WDGSTUS1,WRESTUS1,WPRSTUS1,W_EPPO6_SAE_NUS_MBBL,WUOSTUS1,WTTSTUS1,WTESTUS1,
## Date,Weekly U.S. Ending Stocks of Crude Oil  (Thousand Barrels),Weekly U.S. Ending Stocks excluding SPR of Crude Oil  (Thousand Barrels),Weekly U.S. Ending Stocks of Crude Oil in SPR  (Thousand Barrels),Weekly U.S. Ending Stocks of Total Gasoline  (Thousand Barrels),Weekly U.S. Ending Stocks of Reformulated Motor Gasoline  (Thousand Barrels),Weekly U.S. Ending Stocks of Conventional Motor Gasoline  (Thousand Barrels),Weekly U.S. Ending Stocks of Gasoline Blending Components  (Thousand Barrels),Weekly U.S. Ending Stocks of Fuel Ethanol  (Thousand Barrels),Weekly U.S. Ending Stocks of Kerosene-Type Jet Fuel  (Thousand Barrels),Weekly U.S. Ending Stocks of Distillate Fuel Oil  (Thousand Barrels),"Weekly U.S. Ending Stocks of Distillate Fuel Oil, 0 to 15 ppm Sulfur  (Thousand Barrels)","Weekly U.S. Ending Stocks of Distillate Fuel Oil, Greater than 15 to 500 ppm Sulfur  (Thousand Barrels)","Weekly U.S. Ending Stocks of Distillate Fuel Oil, Greater Than 500 ppm Sulfur  (Thousand Barrels)",Weekly U.S. Ending Stocks of Residual Fuel Oil  (Thousand Barrels),Weekly U.S. Propane and Propylene Ending Stocks Excluding Propylene at Terminal (Thousand Barrels),Weekly U.S. Ending Stocks of Other Oils (Excluding Fuel Ethanol)  (Thousand Barrels),Weekly U.S. Ending Stocks of Unfinished Oils  (Thousand Barrels),Weekly U.S. Ending Stocks of Crude Oil and Petroleum Products  (Thousand Barrels),Weekly U.S. Ending Stocks excluding SPR of Crude Oil and Petroleum Products  (Thousand Barrels),
## "Aug 20, 1982",609219,338764,270455,,,,,,33523,149415,,,,51168,,,119293,,,
## "Aug 27, 1982",608741,336138,272603,,,,,,33897,154589,,,,48544,,,119863,,,
## "Sep 24, 1982",612419,335586,276833,,,,,,34949,158684,,,,57813,,,117686,,,
## "Oct 01, 1982",612419,334786,277633,,,,,,33919,154461,,,,60828,,,118759,,,
## "Oct 08, 1982",613985,335260,278725,,,,,,32347,158242,,,,60381,,,119017,,,
## "Oct 15, 1982",607781,326979,280802,,,,,,33382,161578,,,,60929,,,117603,,,
## "Oct 22, 1982",617763,334370,283393,,,,,,34280,162934,,,,61688,,,115236,,,

What do you see?

10.2 Read in CSV files

Using read_csv(): Example:

# Data Source: https://www.eia.gov/petroleum/supply/weekly/
library(readr)

# Original code:
PetroStocksData <- readr::read_csv("data/PetroStocks.csv")
names(PetroStocksData) <- PetroStocksData[2,] # assign names to columns
PetroStocksData <- PetroStocksData[-1,]       # remove first unused row
PetroStocksData <- PetroStocksData[-1,]       # remove another unused row

10.2.1 Declaring data types

Declaring data types as you read in data.

Choose your data types!

Reference: R4DS Ch. 15, 16

### Revised using readr::read_csv() to skip empty rows and declare data types
### Declare column types within the read_csv() function call, to save yourself trouble later:
ps_tbl <- readr::read_csv("data/PetroStocks.csv", skip = 2, col_types = cols(.default = col_double(),
                                                                       "Date"   = col_character())) 

### Reference: https://readr.tidyverse.org/reference/read_delim.html