Chapter 3 Stats used in eeb I

Many approaches and critical thinking heuristics in ecology & evolutionary biology (eeb) are relevant to other disciplines.

Learning outcomes

  1. Develop your data viz skills.
  2. Hone your critical thinking statistically by iterative plotting-modeling a dataset.
  3. Do a regression analysis.

Critical thinking

Clean simple graphics are powerful tools in statistics (and in scientific communication). Tufte (Tufte 2006) and others have shaped data scientists and statisticians in developing more libraries, new standards, and assumptions associated with graphical representations of data. Data viz must highlight the differences, show underlying data structures, and provide insights into the specific research project. R is infinitely customizable in all these respects. There are at least two major current paradigms (there are more these are the two dominant idea sets). Base R plots are simple, relatively flexible, and very easy. However, their grammar, i.e their rules of coding are not modern. Ggplot and related libraries invoke a new, formal grammar of graphics (Leland 2005) that is more logical, more flexible, but divergent from base R code. It is worth the time to understand the differences and know when to use each.

Evolution of plotting in statistics using R in particular went from base-R then onto lattice then to the ggvis universe with the most recent library being ggplot (Wickham 2016). Base-R is certainly useful in some contexts as is the lattice and lattice extra library. However, ggplot now encompasses all these capacities with a much simpler set of grammar (i.e. rules and order). Nonetheless, you should be able to read base-R code for plots and be able to do some as well. The philosophy or grammar of modern graphics is well articulated and includes the following key principles. The grammar of graphics layers primacy of ideas (simple first, then more complex) i.e. you build up your plots data are mapped to aesthetic attributes and geometric objects data first then statistics even in plots (Wickham 2010). This directly supports critical thinking statistically because it promotes depth (literally), precision, and also accuracy in the decisions you make to show your evidence.

Adventure time

Here are a deeper set of quantified life data. Explore whether movement predicts total sleep or its efficiency. Plot out some patterns first, then, do a regression.

Deeper dive: explore residuals and try the cooks.distance function for outliers.

library(tidyverse)
life <- read_csv(url("https://ndownloader.figshare.com/files/28920729"))
life
## # A tibble: 4,561 × 7
##    simple_date  year steps mins_asleep efficiency lagged_sleep lagged_efficiency
##    <date>      <dbl> <dbl>       <dbl>      <dbl>        <dbl>             <dbl>
##  1 2011-01-25   2011 13900         481         96          504                99
##  2 2011-01-26   2011 19229         478         96          481                96
##  3 2011-01-27   2011 13103         474         96          478                96
##  4 2011-01-28   2011  7374         491         96          474                96
##  5 2011-01-29   2011 19132         436         96          491                96
##  6 2011-01-30   2011 17157         447         98          436                96
##  7 2011-01-31   2011 19759         456         99          447                98
##  8 2011-02-01   2011 18157         455         98          456                99
##  9 2011-02-02   2011  8768         465         97          455                98
## 10 2011-02-03   2011  9150         411         98          465                97
## # … with 4,551 more rows

Reflection questions

  1. When do you use regression versus correlation?
  2. How could you incorporate time into your plots or statistical models?
  3. Did the visualization highlight some of the criteria associated with critical thinking statistically more than others?