11 Data in the sky at night

“The proportions and relations of things are just as much facts as the things themselves.”

— Dorothy L. Sayers

Background The diamonds dataset from the R package ggplot2 provides data on each of almost \(54000\) round diamonds that were offered for sale.

Questions Do the data need cleaning? How are the individual variables distributed? Are there any special features?

Sources The data were originally scraped from the web in 2007 by Hadley Wickham (Wickham (2007)) and he made them available in his R package ggplot2.. Many of the results in this chapter were found and written up by him then in an article that was rejected by a journal that should have known better.

Structure The dataset covers 53940 round diamonds with information on price, dimensions, and quality.