Chapter 3 Stats used in eeb I
Many approaches and critical thinking heuristics in ecology & evolutionary biology (eeb) are relevant to other disciplines.
Learning outcomes
- Develop your data viz skills.
- Hone your critical thinking statistically by iterative plotting-modeling a dataset.
- Do a regression analysis.
Critical thinking
Clean simple graphics are powerful tools in statistics (and in scientific communication). Tufte (Tufte 2006) and others have shaped data scientists and statisticians in developing more libraries, new standards, and assumptions associated with graphical representations of data. Data viz must highlight the differences, show underlying data structures, and provide insights into the specific research project. R is infinitely customizable in all these respects. There are at least two major current paradigms (there are more these are the two dominant idea sets). Base R plots are simple, relatively flexible, and very easy. However, their grammar, i.e their rules of coding are not modern. Ggplot and related libraries invoke a new, formal grammar of graphics (Leland 2005) that is more logical, more flexible, but divergent from base R code. It is worth the time to understand the differences and know when to use each.
Evolution of plotting in statistics using R in particular went from base-R then onto lattice then to the ggvis universe with the most recent library being ggplot (Wickham 2016). Base-R is certainly useful in some contexts as is the lattice and lattice extra library. However, ggplot now encompasses all these capacities with a much simpler set of grammar (i.e. rules and order). Nonetheless, you should be able to read base-R code for plots and be able to do some as well. The philosophy or grammar of modern graphics is well articulated and includes the following key principles. The grammar of graphics layers primacy of ideas (simple first, then more complex) i.e. you build up your plots data are mapped to aesthetic attributes and geometric objects data first then statistics even in plots (Wickham 2010). This directly supports critical thinking statistically because it promotes depth (literally), precision, and also accuracy in the decisions you make to show your evidence.
Adventure time
Here are a deeper set of quantified life data. Explore whether movement predicts total sleep or its efficiency. Plot out some patterns first, then, do a regression.
Deeper dive: explore residuals and try the cooks.distance function for outliers.
library(tidyverse)
<- read_csv(url("https://ndownloader.figshare.com/files/28920729"))
life life
## # A tibble: 4,561 × 7
## simple_date year steps mins_asleep efficiency lagged_sleep lagged_efficiency
## <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2011-01-25 2011 13900 481 96 504 99
## 2 2011-01-26 2011 19229 478 96 481 96
## 3 2011-01-27 2011 13103 474 96 478 96
## 4 2011-01-28 2011 7374 491 96 474 96
## 5 2011-01-29 2011 19132 436 96 491 96
## 6 2011-01-30 2011 17157 447 98 436 96
## 7 2011-01-31 2011 19759 456 99 447 98
## 8 2011-02-01 2011 18157 455 98 456 99
## 9 2011-02-02 2011 8768 465 97 455 98
## 10 2011-02-03 2011 9150 411 98 465 97
## # … with 4,551 more rows