Preface
This text was written to be used in a second biostatistics course for Master of Public Health students; however, students in any field will find it useful. Students in many disciplines take an introductory statistics course, providing foundational competencies but perhaps not enough to use more advanced methods without additional training. There are a plethora of textbooks covering topics such as linear regression, logistic regression, and survival analysis aimed at those with a background in mathematical statistics and/or without a focus specifically on public health and/or without a focus on using R statistical software. The goal of this text is to provide a gentle introduction to regression methods, using R, that covers all the basics and a bit more, with examples drawn from public health data.
The text began in 2016 as course notes and evolved over time into what you see here. My hope is that what you learn from this book will give you the knowledge and skills to understand and carry out appropriate basic regression analyses, as well as the foundation and confidence to go deeper. When you are ready to go deeper, there are excellent texts that cover each of the methods covered herein, as well as R programming, in much greater detail (e.g., Faraway 2016; Fox 2015; Fox and Weisberg 2019; Harrell 2015; Klein and Moeschberger 2010; Kleinman and Horton 2014; Lohr 2021; Lumley 2010; van Buuren 2018; Weisberg 2014; H. Wickham, Çetinkaya-Rundel, and Grolemund 2023; Hadley Wickham 2019). Additionally, improvements over standard regression methods are available using hierarchical (multilevel, random coefficient) and related shrinkage estimation procedures such as parametric empirical-Bayes/semi-Bayes and penalized-likelihood methods (Efron 2013, 2023; Greenland 2000; Harrell 2015).
NOTE:
In the online version, references appear at the bottom of each page. However, some appear with no author, not because the author is the same as the previous author in the Reference list on this page, but because the author is the same as the previous author in the Reference list at the end of the book.
For example, in the Reference list on this page, Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models is by Julian J. Faraway, When Should Epidemiologic Regressions Use Random Coefficients? is by Sander Greenland, and Applied Linear Regression is by Sanford Weisberg. This issue does not occur in the print version as, there, the references only appear at the end. If anyone knows how to fix this issue for the HTML version, please let me know!