Chapter 13 Saving and loading data files

Your real analytic work can begin when you have a prepared, cleaned data file ready to load into memory. You probably don’t want to re-do all the data preparation steps each time you start work.

If you are working with a small data file, then it is probably not a problem to re-run the code to prepare your data file: it only takes a second. But especially if you are working with a larger file, re-doing the data prep for each work session is a hassle.

Instead, you may wish to separate your data preparation code into its own script, then save the resulting prepared file to disk. Then, when you sit down to do analytic work, you can load your prepared file directly into memory.

13.1 Fast file reading and writing: The arrow package

For saving and reading larger files, I recommend using the feather format supported by the arrow package. Arrow’s functions read_feather() and write_feather() work much faster than the corresponding read-write functions in base R. Arrow provides good support for work with large files, and plays well with dplyr.