Chapter 2 Data Sets
Throughout the semester, we will use various data sets to teach important data science techniques.
(Computer Labs 1B+)
In many weeks, we will use the
penguins data set from the
palmerpenguins R package (Horst, Hill, and Gorman 2020).
This is an interesting data set on the characteristics of three species of penguin living on the Dream, Biscoe, and Torgersen islands in the Palmer archipelago, off the coast of Antarctica.
The three species of penguin are Gentoo Penguins, 2
Chinstrap Penguins, 3
and Adelie Penguins. 4
penguins data set contains measurements for different characteristics of these penguins - take a look at Table 2.1 below.
Namely, for each penguin, we have data on their species, the island on which they live, their bill length, bill depth and flipper length (all measured in mm), their body mass (measured in grams), their sex, and the year in which the recordings were made.
Over the course of the first four weeks, we will look at various data visualisation methods that can help us quickly and easily visually identify the differences between these species, using this data.
“Antarctica 2013: Journey to the Crystal Desert” by Christopher.Michel is licensed under CC BY 2.0↩︎
“Gentoo Penguins” by D-Stanley is licensed under CC BY 2.0↩︎
“Chinstrap Penguins” by D-Stanley is licensed under CC BY 2.0↩︎
“Adelie Penguin (Pygoscelis adeliae)” by Gregory ‘Slobirdr’ Smith is licensed under CC BY-SA 2.0↩︎