Chapter 2 Data Sets
Throughout the semester, we will use various data sets to teach important data science techniques.
2.1 Penguins
(Computer Labs 1B+)
In many weeks, we will use the penguins
data set from the palmerpenguins
R package (Horst, Hill, and Gorman 2020).
This is an interesting data set on the characteristics of three species of penguin living on the Dream, Biscoe, and Torgersen islands in the Palmer archipelago, off the coast of Antarctica.
The three species of penguin are Gentoo Penguins, 2
Chinstrap Penguins, 3
and Adelie Penguins. 4
The penguins
data set contains measurements for different characteristics of these penguins - take a look at Table 2.1 below.
species | island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | sex | year |
---|---|---|---|---|---|---|---|
Adelie | Torgersen | 39.1 | 18.7 | 181 | 3750 | male | 2007 |
Adelie | Dream | 42.3 | 21.2 | 191 | 4150 | male | 2007 |
Gentoo | Biscoe | 50.5 | 15.9 | 225 | 5400 | male | 2008 |
Gentoo | Biscoe | 46.9 | 14.6 | 222 | 4875 | female | 2009 |
Chinstrap | Dream | 50.6 | 19.4 | 193 | 3800 | male | 2007 |
Chinstrap | Dream | 50.7 | 19.7 | 203 | 4050 | male | 2009 |
Namely, for each penguin, we have data on their species, the island on which they live, their bill length, bill depth and flipper length (all measured in mm), their body mass (measured in grams), their sex, and the year in which the recordings were made.
Over the course of the first couple of weeks, we will look at various data visualisation methods that can help us quickly and easily visually identify the differences between these species, using this data.
References
“Antarctica 2013: Journey to the Crystal Desert” by Christopher.Michel is licensed under CC BY 2.0↩︎
“Gentoo Penguins” by D-Stanley is licensed under CC BY 2.0↩︎
“Chinstrap Penguins” by D-Stanley is licensed under CC BY 2.0↩︎
“Adelie Penguin (Pygoscelis adeliae)” by Gregory ‘Slobirdr’ Smith is licensed under CC BY-SA 2.0↩︎