R
R
Hi! I’m a statistician. You might know me from my greatest hits including, “Have you tried plotting the data?”, “You’re not adequately powered to answer that question”, and “Correlation is not causation (except when it is 😉)” https://t.co/MpEHfqwHY8
— Lucy D’Agostino McGowan (@LucyStats) January 19, 2019
My motivating goal for this course is to empower you to produce, present, and critically evaluate statistical evidence — especially as applied to biological topics. You should know that stats models are only models and that models are imperfect abstractions of reality. You should be able to think about how a biological question could be formulated as a statistical question, present graphs which show how data speak to this question, be aware of any shortcomings of that model, and how statistical analysis of a data set can be brought back into our biological discussion.
“By the end of this course…
Students should be statistical thinkers. Students will recognize that data are comprised of observations that partially reflect chance sampling, & that a major goal of statistics is to incorporate this idea of chance into our interpretation of observations. Thinking this way can be challenging because it is a fundamentally new way to think about the world. Once this is mastered, much of the material follows naturally. Until then, it’s more confusing.
Students should think about probability quantitatively. That chance influences observations is CRITICAL to statistics (see above). Quantitatively translating these probabilities into distributions and associated statistical tests allows for mastery of the topic.
Students should recognize how bias can influence our results. Not only are results influenced by chance, but factors outside of our focus can also drive results. Identifying subtle biases and non-independence is key to conducting and interpreting statistics.
Students should become familiar with standard statistical tools / approaches and when to use them. Recognize how bias can influence our results. What is the difference between Bayesian and frequentist thinking? How can data be visualized effectively? What is the difference between statistical and real-world significance? How do we responsibly present/ interpret statistical results? We will grapple with & answer these questions over the term.
Students should have familiarity with foundational statistical values and concepts. Students will gain an intuitive feel for the meaning of stats words like variance, standard error, p-value, t-statistic, and F-statistic, and will be able to read and interpret graphs, and how to translate linear models into sentences.
Students should be able to conduct the entire process of data analysis in R. Students will be able to utilize the statistical language, R, to summarize, analyze, and combine data to make appropriate visualizations and to conduct appropriate statistical tests.
R
, RStudio
, and the tidyverse
We will be using R in this course, in the RStudio environment. My goal is to have you empowered to make figures, run analyses, and be well positioned for future work in R, with as much fun and as little pain as possible. RStudio is an environment and the tidyverse is a set of R packages that makes R’s powers more accessible without the need to learn a bunch of computer programming.
Some of you might have experience with R and some may not. Some of this experience might be in tidyverse or not. There will be ups and downs — the frustration of not understanding and/or it not working and the joy of small successes. Remember to be patient, forgiving and kind to yourself, your peers, and me. Ask for help from the internet, your friends, Brooke, and Yaniv.
We will using R version 4.2.1 or above, and tidyverse version 1.3.2 or above.
You can download these onto your computer (Make sure the R
is version 4.2.1).