Chapter 19: Probability and NHST review

Let us consolidate our understanding.

This is an opportunity to review our understanding of the material to date. I am providing you with five options about how to do this. Each one is fun! Pick one that is fun to you, making sure that no option is picked more than twice at your group. sign up on this spreadsheet.

Be sure to complete the associated homework for your option.

Option 1: Resampling-based methods for biologists

For your first option, you can read “Resampling-based methods for biologists,” Fieberg, Vitense, and Johnson (2020) a fun paper by my colleague discussing the scientific and pedagogical reasons for teaching statistics through the concepts of permutation and bootstrapping. Also check out the NotebookLM made podcast, embedded below.

Read [Resampling-based methods for biologists](https://peerj.com/articles/9089.pdf)

Figure 1: Read Resampling-based methods for biologists

Figure 2: The accompanying quiz (for option 1) link

Option 2: Consider The Science of Doubt.

Watch the video below. As you do, reflect on the types of errors commonly associated with scientific research. You should be able to think critically about these and ask insightful questions regarding them.

Fraud
Incorrect models
Experimental design flaws
Communication errors
Statistical errors
HARKing (Hypothesizing After Results are Known)
Coding mistakes
Technical errors
Publication bias

Additionally, be prepared to discuss:

The “replication crisis,” and
Whether/why preregistration of studies is beneficial.

Figure 3: Watch this hour-long video on The Science of Doubt by Michael Whitlock.

A brief word on publication bias Scientists are overworked and have too much to do. They get more rewards for publishing statistically significant results, so those are usually higher on the to do list. This results in the file drawer effect in which non-significant results are less likely to be submitted for publication. Watch this video from calling bullshit for more on publication bias.

Figure 4: The accompanying quiz (for option 2) link

Option 3: The so called “reproducibility crisis”

Consider these two papers.

First this hugely impactfull paper, “Estimating the reproducibility of psychological science” by The Open Science Collaboration (2015). paper available here.
And a response to it “What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science” by Patil, Peng, and Leek (2016). Paper available here.

Read these two papers on the reproducibility crisis. [Link to Estimating Reproducuibility](https://scholar.google.com/scholar_url?url=https://www.science.org/doi/pdf/10.1126/science.aac4716%3Fcasa_token%3Do4U0Bsj6hQIAAAAA:fIZehG3uKxYm94VYRt2UEznTqFU8okUjIhW6wvtXbtT4HJM51ufz4bDIY5zlq03o4UuGwIJEAtrxjQ&hl=en&sa=T&oi=ucasa&ct=ucasa&ei=nT0PZ_CHKse16rQP4OTT0Qs&scisig=AFWwaearWdlmN_gw5c6BVykKnH36). [Link to what should researchers expect](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4968573/).

Figure 5: Read these two papers on the reproducibility crisis. Link to Estimating Reproducuibility. Link to what should researchers expect.

If you’re having too much fun and want more, there is a great related set of videos from calling bullshit: Science is amazing, but…, Reproducibility, A Replication Crisis, Publication Bias, and Science is not Bullshit.

Figure 6: The accompanying quiz (for option 4) link

Option 4: Work with a Personalized Tutor

Use your favorite large language model (LLM) to help deepen your understanding of the concepts we’ve covered so far. Here’s how you can approach this:

Provide the LLM with a prompt based on what you think we’ve covered.
Upload relevant chapters from my book (note: these models cannot read web links, so download the HTML files and then upload them directly).
You can also upload similar resources for additional context:
- I recommend Chapters 8-10 of Data Science in R: A Gentle Introduction. This material closely aligns with what we’ve covered, and it’s a book I really like.
- Similarly, Chapters 1-5 and 7-9 of Modern Dive: Statistical Inference via Data Science provide a perspective similar to that of this course so far.

Once you’ve provided the LLM with the necessary context, interact with it in a way that best supports your learning. You can:

Ask it to clarify concepts you’re struggling with.
Request summaries of key points from the chapters or course material.
Ask it to generate study guides.
Have it create study questions and even evaluate your responses to those questions.

Please spend at least one hour in conversation with the LLM, focusing on areas where you feel you need the most support. Note that this will work best f you actively engage with the LMM, ask follow on questions, persist untilyou really get it,ask for another explanation etc. Also this is an experiment - see what wokrs for you!…

After your session,

Share the full conversation with me, include the LMM used and the chapters you loaded into it (if any).
Summarize the key takeaways from the conversation. This could include clarifications you received, important points the LLM helped you understand, or study questions you found helpful. Also include a discussion of what – if anything – it got wrong, got you confused etc… We can’t yet guarantee that this will actually be good / right.
Submit a brief reflection on your interaction with the LLM, highlighting what you learned, any remaining questions, and how the session helped your understanding. Will you use this approach to study for exams in other classes?

Figure 7: The accompanying quiz (for option 4) link

Option 5: Develop exam questions.

Write about ten high-quality exam questions that assess different aspects of the course material without focusing on R programming knowledge. For each question, present the:

Rationale: Explain why you chose this question. What was your thought process in crafting it, and how does it tie into the course’s key concepts? Focus on testing students’ ability to think critically about the material, analyze concepts, and apply knowledge, rather than just rote memorization or incantation.
Concept Evaluated: Identify the specific concept or idea the question aims to test. What deeper understanding or skill does it draw upon from the course? Make sure your questions cover a wide range of topics or concepts discussed throughout the term.
Difficulty: Indicate how challenging you expect the question to be. Is it something that tests basic comprehension, or does it require a more advanced or analytical approach? Include a mix of easy, medium, and hard questions to differentiate students with varying levels of understanding.
Importance: Evaluate how critical this question is for assessing mastery of the course content. Does it test a foundational concept or a more nuanced understanding?
Example Answers: When providing examples of “Good,” “Ok,” and “Bad” answers, ensure there’s a clear distinction in the quality of the responses, showing what exactly makes one answer better than the others. Also the bad answers should not be cartoons - you should imagine a bad answer that you think about 10% of the students in this class would write.
- Good Answer: Provide an example of a strong response that reflects mastery of the material.
- Ok Answer: Give an example of a mediocre or incomplete answer that demonstrates partial understanding.
- Bad Answer: Illustrate what a weak or incorrect answer might look like, perhaps from someone who hasn’t adequately prepared.

When you’re done reflect on this exercise and if/how it helped you better understand the material.

Figure 8: The accompanying quiz (for option 5) link

References

Collaboration, Open Science. 2015. “Estimating the Reproducibility of Psychological Science.” Science 349 (6251): aac4716. https://doi.org/10.1126/science.aac4716.

Fieberg, John R, Kelsey Vitense, and Douglas H Johnson. 2020. “Resampling-Based Methods for Biologists.” PeerJ 8: e9089. https://doi.org/10.7717/peerj.9089.

Patil, Prasad, Roger D Peng, and Jeffrey T Leek. 2016. “What Should Researchers Expect When They Replicate Studies? A Statistical View of Replicability in Psychological Science.” Perspect Psychol Sci 11 (4): 539–44. https://doi.org/10.1177/1745691616646366.

Chapter 19: Probability and NHST review

Authors

Affiliations

Published

DOI