Chapter 2 Literate coding

Your code is a story too. Use your code and annotation of decisions (en)coded in your data manipulations, calculations, models, and plots to communicate clarity, logic, relevance, and depth. This story is not just for your collaborators - it is for you. Writing down your ideas and work down makes it more clear. It also reminds you later, even a week later, why you elected to make a particular decision in your workflow. Tidy data and tidy thinking make for better science.

Learning outcomes

  1. Practice writing code and using annotation.
  2. Consolidate your understanding of tidy data and critical thinking statistically.
  3. Do an ANOVA.

Critical thinking

Tidy data make your life easier. Data structures should match intuition and common sense. Data should have logical structure. Rows are are observations, columns are variables. Tidy data also increase the viability that others can use your data, do better science, reuse science, and help you and your ideas survive and thrive.

Literate coding (Knuth 1992) should capture a workflow that includes the wrangling you did to get your data ready. Literate code should be able to read by a human AND a machine. If data are already very clean in a spreadsheet, they can easily become a literate, logical dataframe. Nonetheless, you should still use annotation within the introductory code to explain the meta-data of your data to some extent and what you did pre-R to get it tidy. The philosophy here is very similar to the data viz lesson forthcoming that promotes critical thinking statistically through documented and described steps that are replicable and clear.

Adventure time

Many years ago in a galaxy far, far away, a student sowed seeds in the desert at different densities for their PhD research. Here are the data, and here is the publication too (Christopher J. Lortie and Turkington 2002). This student was not strong in the force, but it was a good adventure in beginning to understand the relative importance of significance biologically and statistically by exploring critical thinking. For your adventure, test whether a set of groups differ from one another. For instance, test whether transects, or years, or even the density of seeds planted differs in an outcome measure such as mean plant size.

Deeper dive: Check for homoscedasticity or do a post-hoc test.

density <- read_csv(url(""))
## # A tibble: 152 × 6
##     year transect seed_density_pe… final_plant_den… survivorship mean_plant_size
##    <dbl>    <dbl>            <dbl>            <dbl>        <dbl>           <dbl>
##  1  1998        1           0.0625               41        0.461           0.554
##  2  1998        1           0.0625               47        0.712           0.356
##  3  1998        1           0.0625               60        0.698           0.301
##  4  1998        1           0.25                 31        0.525           0.808
##  5  1998        1           0.25                 50        0.505           0.212
##  6  1998        1           0.25                 58        0.563           0.148
##  7  1998        1           1                    30        0.273           0.578
##  8  1998        1           1                    42        0.243           1.28 
##  9  1998        1           1                    73        0.619           0.719
## 10  1998        1           2                    46        0.263           0.652
## # … with 142 more rows

Reflection questions

  1. What is the difference between a t-test and an ANOVA?
  2. What is the difference between an ANOVA and GLM?
  3. What are some of the ways that these simple data can be further analyzed?
  4. When you explored annotation and describing your decisions and workflow for these data adventure, was it logical and clear to you if you ignored the R code?