2.9 Summary

We have started to explore the vast and diverse range of data sources. These are fundamental to data analysis since these provide the raw resources for down stream analysis. Very often challenges relate to integration and what is sometimes referred to as data wrangling i.e., reorganising the raw data into formats suitable for R. Typically we need to produce what is often called rectangular data i.e., data frames (tables) where rows are instances such as customer and the columns (vectors) are different variables (customer attributes). Each row must be the same width hence the data set is rectangular. Then we’ve looked at how data can be programmatically imported and exported as text files and briefly considered more complex settings such as databases or very large data sets (“Big Data”).

Please check your understanding of this chapter via this quick quiz (8 simple multiple choice questions).

Next week we move on to consider the difference between engineering and hacking, and the importance of understandable and reproducible data analysis.