4.4 Summary
In this chapter we have introduced the idea of exploratory data analysis (EDA) and shown how it is an iterative process to interact with a data set in order to find (and start to answer) questions. Although summary statistics such as the mean and median can be a starting point, it’s important not to neglect the dispersion or distribution of variables. Also we may start with univariate analysis it’s often interesting to see how variables co-vary. We have also seen that even simple visual techniques are very powerful in enabling the analyst to better understand the data.
By the end of this week you should be able to:
- explain the purpose(s) and nature of exploratory data analysis
- apply some simple techniques in R to describe and understand your data
- generate simple plots and diagrams to help visualise different variables in your data
- take and adapt ggplot examples to produce your own customised charts
The next chapter goes on to look at exploratory data analysis and visualisation to assist in the task of checking the quality of data.
4.4.1 Further Reading
- Grolemund, Garrett, and Hadley Wickham. 2018. “R for Data Science.” Chapter 7.
- Kabacoff, Robert. 2015. “R in Action: Data Analysis and Graphics with R”, 2nd ed. Manning. Chapter 6 and 7.1–7.2.
- Kabacoff, Robert. 2019. “Data Visualization with R.” https://rkabacoff.github.io/datavis/. Chapter 2.
Please check your understanding of this chapter via this quick quiz) based on 8 multiple choice questions. Some feedback is provided when incorrect answers are given.