BIOL 3272/5272 Applied Biostats
A book of (com)passion
I begin this writing in early Sept 2020, and am thinking about you, my students for the fall 2021 term. Currently much is uncertain — we are beginning a term in the wake of George Floyd’s murder [among many others], which has resulted in protests, uprisings, rioting and armed confrontation between protestors, armed citizens and police. We are in the midst of an uncertain economy, and a vicious presidential election. Most of us feel isolated as COVID has pushed most of us home, some of us are sick and/or have had family members fall ill. Who knows if classes will be happening in person or not in 2021? Global climate change is threatening our future etc. etc. etc.. You get it, there’s just so much happening, and so much uncertainty about the future of this class, the University of Minnesota, the United States, and the World.
Mid sept update: Now much of the west coast is ablaze.
Why do I bring this up? Well I know you’re dealing with a lot. Every year students are dealing with a lot — from jobs, to supporting family, to the everyday of being in college and living life, and this year there’s even more. I too have a lot — A 14 month-old daughter, research and life pressures, teaching etc. Yet, we are all trying to make the most of life in this era. We want to teach, learn, and grow.
What’s more, I believe this content is more important now than it has ever been, statistics is obsessed with the critical evaluation of claims in the face of data, and is therefore particularly useful in uncertain times. Given this focus, and given that you all have different energies, motivations and backgrounds, I am restructuring this course slightly from previous years. The biggest change is a continued deemphasis on math and programming – that doesn’t mean I’m eliminating these features, but rather that I am streamlining the required math and programming to what I believe are the essentials. For those who want more mathematical and/or computational details (wither because you want to push yourself or you need this to make sense of things), I am including a bunch of optional content and support.
I LOVE TEACHING THIS COURSE – the content is very important to me. I also care deeply about you. I want to make sure you get all you can / all you need from this course, while recognizing the many challenges we are all facing. One tangible thing I leave you with is this book, which I hope you find useful as you go on in your life. Another thing I leave you with is my concern for your well-being and understanding – please contact me with any suggestions about the pace / content / you of this course and/or any life updates which may change how and when you can complete the work.
Course philosophy / goals
Hi! I’m a statistician. You might know me from my greatest hits including, “Have you tried plotting the data?”, “You’re not adequately powered to answer that question”, and “Correlation is not causation (except when it is 😉)” https://t.co/MpEHfqwHY8— Lucy D’Agostino McGowan (@LucyStats) January 19, 2019
My motivating goal for this course is to empower you to produce, present, and critically evaluate statistical evidence — especially as applied to biological topics. You should know that stats models are only models and that models are imperfect abstractions of reality. Yu should be able to think about jhow a biological question could be formulated as a statistical question, present graphs which show how data speak to this question, be aware of any shortcomings of that model, and how statistical analysis of a data set can be brought back into our biological discussion.
“By the end of this course…
- Students should be statistical thinkers. Students will recognize that data are comprised of observations that partially reflect chance sampling, & that a major goal of statistics is to incorporate this idea of chance into our interpretation of observations. Thinking this way can be challenging because it is a fundamentally new way to think about the world. Once this is mastered, much of the material follows naturally. Until then, it’s more confusing.
- Students should think about probability quantitatively. That chance influences observations is CRITICAL to statistics (see above). Quantitatively translating these probabilities into distributions and associated statistical tests allows for mastery of the topic.
- Students should recognize how bias can influence our results. Not only are results influenced by chance, but factors outside of our focus can also drive results. Identifying subtle biases and non-independence is key to conducting and interpreting statistics.
- Students should become familiar with standard statistical tools / approaches and when to use them. Recognize how bias can influence our results. What is the difference between Bayesian and frequentist thinking? How can data be visualized effectively? What is the difference between statistical and real-world significance? How do we responsibly present/ interpret statistical results? We will grapple with & answer these questions over the term.
- Students should have familiarity with foundational statistical values and concepts. Students will gain an intuitive feel for the meaning of stats words like variance, standard error, p-value, t-statistic, and F-statistic, and will be able to read and interpret graphs, and how to translate linear models into sentences.
- Students should be able to conduct the entire process of data analysis in R. Students will be able to utilize the statistical language, R, to summarize, analyze, and combine data to make appropriate visualizations and to conduct appropriate statistical tests.
RStudio, and the
We will be using
R in this course, in the RStudio environment. My goal is to have you empowered to make figures, run analyses, and be well positioned for future work in
R, with as much fun and as little pain as possible. RStudio is a environment and the
tidyverse is a set of
R packages that makes
R’s powers more accessible without the need to learn a bunch of computer programming.
Some of you might have experience with
R and some may not. Some of this experience might be in
tidyverse or not. There will be ups and downs — the frustration of not understanding and/or it not working and the joy of small successes. Remember to be patient, forgiving and kind to yourself, your peers, and me. Ask for help from the internet, your friends, hailey, and yaniv.
Rversion 4.0.0 or above, and
tidyverseversion 1.3.0 or above.
- You can download these onto your computer (Make sure the
Ris version 4.0.0).
Next download/update RStudio from here these onto.
Finally open RStudio and type
library(tidyverse)to ensure this worked.
- Alternatively you can simply join the course via RStudioCloud. This could be desirable if you don’t want to or have trouble doing this. YB ADD DETAILS
What is this ‘book’ and how will we use it?
This ‘book’ functions as an extensive syllabus and course notes. I will embed youtube videos, app-based demonstrations and class exercises. As noted above, there are some thing I will include that will not be necessary for everyone, and I will clearly mark these sections.
I hope that this book provides clear and useful background for the course, and I advise you to regularly go through each book ‘chapter’ for the relevant week. In general, the text and videos will be somewhat redundant, with the videos being more useful and having more explanation, and the text being available for a quick summary and useful definitions / equations / examples / links. So approach thee as you see fit, but be sure you get familiar with the content BEFORE class.
Note that this ‘book’ is not the entirety of the course content, and is not an original peice of my own effort – in addition from lifting from a few other course online (with attribution), I also make heavy use of these texts:
- The Analysis of Biological Data Third Edition (Whitlock and Schluter 2020): This is the official book of this course, and is a standard biostats textbook, with many useful resources available online. The writing is great, as are the examples. Most of my material originates here (although I occasionally do things a bit differently). This book is officially optional, but students consistently ell me that it is extremely helpful. So, I highly recommend buying it. You can get the newest edition here, but any edition will be pretty useful.
- Calling Bullshit (Bergstrom and West 2020): This book is not technical, but points to the big picture concerns of statisticians. It is very practical and well written. I will occasionally assign readings from this book, and/or point you to videos on their website. All readings will be made available for you, but you might want to buy a physical copy.
- Fundamentals of Data Visualization (Wilke 2019): This book is free online, and is very helpful for thinking about graphing data. In my view, graphing is among the most important skills in statistical reasoning, so I reference it regularly.
- R for Data Science (Wickham and Grolemund 2017): This book is free online, and is very helpful for doing the sorts of things we do in R regularly. This is a great resource.
- I will introduce other resources as we go.
How will this term work / look?
Nobody knows! But I have a vision.
I aim to have a majority of the course content available before the term begins. As such you can plan around you likely conflicts.
Prep for ‘class’. This class is flipped with asynchronous content delivery and synchronous meetings.
Be sure to look over the assigned readings and/or videos, and complete the short low-stakes homework BEFORE each course.
During class time, I will address questions make announcements, and get you started on in class work. Hailey & I will bounce around your breakout rooms to provide help and check-in., If you cannot make the class, you could do this on your own time without help, but we do not recommend this as a class strategy.
The help of your classmates and the environment they create is one of the best parts of this class. Help each other.
In addition to low stakes work before and in class, there will be a few more intense assignments, some collaborative projects and a summative project as the term ends. There will be no ‘high-stakes’ in class timed tests.
## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## setting value ## version R version 4.0.2 (2020-06-22) ## os macOS High Sierra 10.13.6 ## system x86_64, darwin17.0 ## ui RStudio ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz America/Chicago ## date 2020-09-30 ## ## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21  CRAN (R 4.0.0) ## backports 1.1.9 2020-08-24  CRAN (R 4.0.2) ## blogdown * 0.20 2020-06-23  CRAN (R 4.0.2) ## bookdown 0.20 2020-06-23  CRAN (R 4.0.2) ## callr 3.4.4 2020-09-07  CRAN (R 4.0.2) ## cli 2.0.2 2020-02-28  CRAN (R 4.0.0) ## crayon 1.3.4 2017-09-16  CRAN (R 4.0.0) ## curl 4.3 2019-12-02  CRAN (R 4.0.0) ## desc 1.2.0 2018-05-01  CRAN (R 4.0.0) ## devtools 2.3.1 2020-07-21  CRAN (R 4.0.2) ## digest 0.6.25 2020-02-23  CRAN (R 4.0.0) ## ellipsis 0.3.1 2020-05-15  CRAN (R 4.0.0) ## emo * 0.0.0.9000 2020-09-18  Github (hadley/emo@3f03b11) ## evaluate 0.14 2019-05-28  CRAN (R 4.0.0) ## fansi 0.4.1 2020-01-08  CRAN (R 4.0.0) ## fs 1.5.0 2020-07-31  CRAN (R 4.0.2) ## generics 0.0.2 2018-11-29  CRAN (R 4.0.0) ## glue 1.4.2 2020-08-27  CRAN (R 4.0.2) ## highr 0.8 2019-03-20  CRAN (R 4.0.0) ## htmltools 0.5.0 2020-06-16  CRAN (R 4.0.2) ## httr 1.4.2 2020-07-20  CRAN (R 4.0.2) ## jsonlite 1.7.1 2020-09-07  CRAN (R 4.0.2) ## knitr * 1.29 2020-06-23  CRAN (R 4.0.2) ## lubridate 1.7.9 2020-06-08  CRAN (R 4.0.2) ## magrittr 1.5 2014-11-22  CRAN (R 4.0.0) ## memoise 1.1.0 2017-04-21  CRAN (R 4.0.2) ## pkgbuild 1.1.0 2020-07-13  CRAN (R 4.0.2) ## pkgload 1.1.0 2020-05-29  CRAN (R 4.0.0) ## prettyunits 1.1.1 2020-01-24  CRAN (R 4.0.0) ## processx 3.4.4 2020-09-03  CRAN (R 4.0.2) ## ps 1.3.4 2020-08-11  CRAN (R 4.0.2) ## purrr 0.3.4 2020-04-17  CRAN (R 4.0.0) ## R6 2.4.1 2019-11-12  CRAN (R 4.0.0) ## Rcpp 1.0.5 2020-07-06  CRAN (R 4.0.2) ## remotes 2.2.0 2020-07-21  CRAN (R 4.0.2) ## rlang 0.4.7 2020-07-09  CRAN (R 4.0.2) ## rmarkdown 2.3 2020-06-18  CRAN (R 4.0.2) ## rprojroot 1.3-2 2018-01-03  CRAN (R 4.0.0) ## rsconnect * 0.8.16 2019-12-13  CRAN (R 4.0.2) ## rstudioapi 0.11 2020-02-07  CRAN (R 4.0.0) ## sessioninfo 1.1.1 2018-11-05  CRAN (R 4.0.2) ## stringi 1.5.3 2020-09-09  CRAN (R 4.0.2) ## stringr 1.4.0 2019-02-10  CRAN (R 4.0.0) ## testthat 2.3.2 2020-03-02  CRAN (R 4.0.0) ## tweetrmd * 0.0.8 2020-09-13  Github (gadenbuie/tweetrmd@50bcdf8) ## usethis 1.6.1 2020-04-29  CRAN (R 4.0.2) ## withr 2.2.0 2020-04-20  CRAN (R 4.0.0) ## xfun 0.17 2020-09-09  CRAN (R 4.0.2) ## yaml 2.2.1 2020-02-01  CRAN (R 4.0.0) ## ##  /Library/Frameworks/R.framework/Versions/4.0/Resources/library
Bergstrom, C. T., and J. D. West. 2020. Calling Bullshit: The Art of Skepticism in a Data-Driven World. Random House Publishing Group. https://www.callingbullshit.org.
Whitlock, M. C., and D. Schluter. 2020. The Analysis of Biological Data. Macmillan Learning. https://whitlockschluter3e.zoology.ubc.ca/.
Wickham, H., and G. Grolemund. 2017. R for Data Science. O’Reilly Media. https://r4ds.had.co.nz/.
Wilke, C. O. 2019. Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures. O’Reilly Media. https://clauswilke.com/dataviz/.