Introduction to Regression Methods for Public Health Using R
Welcome
Thanks for visiting!
Use
Preface
Software information and conventions
Acknowledgments
About the Author
1
Introduction
1.1
About this text
1.2
R and RStudio
1.3
Datasets
1.4
Functions
2
Overview of Regression Methods
2.1
Introduction
2.2
Why use regression?
2.3
Taxonomy of regression methods
3
Data Summarization
3.1
Examining the data
3.1.1
Detailed description of all variables in a dataset
3.2
Missing data options
3.2.1
Complete case analysis
3.2.2
Multiple imputation
3.3
Creating a “Table 1”
3.3.1
Overall
3.3.2
By outcome or exposure
3.3.3
Exporting to an external file
3.3.4
Adding p-values to Table 1
3.3.5
Should p-values be added to a Table 1?
3.4
Exercises
4
Simple Linear Regression
4.1
Introduction
4.2
Notation and interpretation
4.3
SLR model with a continuous predictor
4.3.1
Writing it up
4.3.2
Centering a continuous predictor
4.4
SLR model with a categorical predictor
4.4.1
Recoding as a factor
4.4.2
What happens if you fit the model without coding a categorical variable as a factor?
4.4.3
Re-leveling
4.4.4
Multiple DF Test for a Categorical Predictor
4.4.5
Special case: binary predictor
4.4.6
Writing it up
4.5
Predictions from the model
4.6
Confidence interval, confidence band, prediction interval
4.6.1
CIs for regression coefficients
4.6.2
CI for the mean outcome
4.6.3
CI for an individual observation
4.7
Fitting curves using polynomials
4.8
Assumptions
4.9
Exercises
5
Multiple Linear Regression
5.1
Introduction
5.2
Notation and interpretation
5.3
Complete case analysis dataset
5.4
Examine the data
5.4.1
Outcome
5.4.2
Continuous predictors
5.4.3
Categorical predictors
5.4.4
Overall description of the data
5.4.5
Collapsing sparse levels
5.4.6
Visualize the unadjusted relationships
5.5
Fitting the MLR model
5.5.1
Unadjusted
5.5.2
Adjusted
5.6
Residuals
5.6.1
Computing residuals
5.7
Visualizing the adjusted relationships
5.8
Types of predictor variables
5.8.1
Confounder
5.8.2
Mediator
5.8.3
Moderator
5.9
Interactions
5.9.1
Understanding interaction using stratification
5.9.2
Including an interaction in a regression model
5.9.3
Regression equation with no interaction
5.9.4
Regression equation with an interaction
5.9.4.1
Examining the equation for each gender
5.9.5
Visualize an interaction
5.9.6
Testing the difference between the slopes
5.9.7
Testing the significance of the slope at each level of a moderator
5.9.8
Interaction term has two interpretations
5.9.9
Types of interactions
5.9.9.1
Continuous
\(\times\)
categorical
5.9.9.2
Categorical
\(\times\)
categorical
5.9.9.3
Continuous
\(\times\)
continuous
5.9.10
Test of interaction
5.9.11
Overall test of a predictor involved in an interaction
5.9.12
When to include an interaction
5.10
Predictions
5.11
Confidence and prediction intervals
5.12
car::Anova()
vs.
anova()
vs.
gmodels::estimable()
vs.
predict()
5.13
Assumptions of a linear regression model
5.14
Checking the independence assumption
5.15
Checking the normality assumption
5.16
Checking the linearity assumption
5.17
Checking the constant variance assumption
5.18
Box-Cox outcome transformation
5.19
Collinearity
5.19.1
Variance inflation factors
5.20
Outliers
5.21
Influential observations
5.22
Confirmatory vs. exploratory
5.23
Multiple testing
5.23.1
Primary vs. secondary tests
5.24
Sensitivity analysis
5.24.1
Example: Sensitivity to collapsing a categorical predictor
5.24.2
Example: Sensitivity to outliers and influential observations
5.25
Generalization / extrapolation / interpolation / overfitting
5.26
Writing it up
5.27
The steps of multiple linear regression
5.28
Exercises
6
Binary Logistic Regression
6.1
Introduction
6.2
Odds and odds ratios
6.3
Estimating an OR using a 2
\(\times\)
2 table
6.4
Estimating an OR using logistic regression
6.4.1
OR associated with other than a 1-unit difference
6.4.2
Make sure you know what probability
glm()
is modeling
6.4.3
Adjusted OR
6.5
Prediction
6.6
Interactions
6.6.1
Overall test of a predictor involved in an interaction
6.6.2
Estimate OR at each level of the other variable
6.7
Separation
6.7.1
Quasi-complete separation
6.7.2
Complete separation
6.7.3
Diagnosing separation
6.7.4
Resolving separation
6.7.4.1
Filter
6.7.4.2
Collapse
6.7.4.3
Remove
6.7.4.4
Summary
6.8
Collinearity
6.9
Assumptions
6.10
Outliers
6.11
Influential observations
6.12
Generalizability / overfitting
6.13
Goodness of fit
6.13.1
Hosmer-Lemeshow test
6.13.2
Calibration plot
6.14
Writing it up
6.14.1
Writing up logistic regression results (no interaction)
6.14.2
Writing up logistic regression results (with an interaction)
6.15
Steps of binary logistic regression
6.16
Ordinal logistic regression
6.16.1
Ordinal model
6.16.2
Transforming a continuous outcome into an ordinal outcome
6.16.3
Fitting the model
6.16.4
Interpreting the coefficients
6.16.5
Adjusted
6.16.6
Proportional odds assumption
6.16.7
Prediction
6.17
Conditional logistic regression for matched data
6.18
Log-binomial regression to estimate a risk or prevalence ratio
6.19
Exercises
7
Survival Analysis
7.1
Introduction
7.2
Censoring
7.2.1
Non-informative censoring assumption
7.3
Survival function
7.4
Hazard function
7.5
Survival analysis dataset structure
7.6
Kaplan-Meier estimate of the survival function
7.6.1
Plotting the survival function
7.6.2
Computing and plotting the hazard function
7.6.3
Estimated event probability within a time interval
7.6.4
Median survival time
7.6.5
Comparing groups
7.7
Cox regression
7.8
Fitting the Cox regression model
7.8.1
Unadjusted
7.8.2
Adjusted
7.9
Prediction
7.10
Plotting the estimated survival function
7.11
Interactions
7.11.1
Overall test of a predictor involved in an interaction
7.11.2
Estimate HR at each level of the other variable
7.12
Separation
7.13
Time-varying predictors
7.14
Collinearity
7.15
Proportional hazards assumption
7.15.1
Adding a time interaction for a continuous predictor
7.15.2
Adding a time interaction for a categorical predictor
7.15.3
Stratifying by a categorical variable
7.16
Checking the independence assumption
7.17
Checking the linearity assumption
7.18
Outliers
7.19
Influential observations
7.20
Generalizability / overfitting
7.21
Writing it up
7.21.1
Writing up Cox regression results (assuming PH)
7.21.2
Writing up Cox regression results (relaxing PH)
7.22
Steps of survival analysis
7.23
Exercises
8
Analyzing Complex Survey Data
8.1
Introduction
8.1.1
NHANES survey design
8.1.2
NSDUH survey design
8.2
Specifying the survey design
8.3
Weighted descriptive statistics
8.3.1
Overall
8.3.2
By exposure or outcome
8.4
Weighted linear regression
8.4.1
Unadjusted
8.4.2
Visualize the unadjusted relationship
8.4.3
Adjusted
8.4.4
Prediction
8.4.5
Interactions
8.4.6
Diagnostics
8.5
Domain (subgroup) analysis
8.6
Weighted binary logistic regression
8.6.1
Prediction
8.6.2
Interactions
8.6.3
Diagnostics
8.7
Weighted survival analysis
8.7.1
Weighted Kaplan-Meier estimate of the survival function
8.7.2
Weighted log-rank test for comparing groups
8.7.3
Weighted Cox regression
8.7.3.1
Exclude cases with zero weights
8.7.3.2
Interaction
8.8
Summary of special cases
8.9
Exercises
9
Multiple Imputation of Missing Data
9.1
Introduction
9.2
MCAR, MAR, MNAR
9.3
The imputation model
9.4
Fitting the imputation model
9.4.1
What variables to include
9.4.2
Pre-processing
9.4.3
Visualize the pattern of missing data
9.4.4
Number of imputations
9.4.5
mice()
9.4.6
Examine the imputed values
9.4.7
Convergence
9.4.8
Back-transformation and derived variables
9.5
Descriptive statistics after MI
9.6
Linear regression after MI
9.6.1
Multiple degree of freedom tests
9.6.2
Predictions
9.6.3
Polynomial predictor transformations
9.6.4
Interactions
9.6.4.1
Interaction via stratification
9.6.4.2
Interaction via transform-then-impute
9.6.4.3
Estimate the effect of one variable at levels of the other
9.7
Logistic regression after MI
9.7.1
Multiple degree of freedom tests
9.7.2
Predictions
9.8
Cox regression after MI
9.8.1
Multiple degree of freedom tests
9.8.2
Predictions
9.9
Diagnostics
9.9.1
Examining a diagnostic plot across imputations
9.9.2
Pooling a diagnostic test over imputations
9.10
Exercises
Appendix
A
Datasets
A.1
NHANES (2017-2018)
A.2
United Nations Human Development Data (2020)
A.3
U.S. Natality (2018)
A.4
COVID-19 county-level data
A.5
NSDUH (2019)
A.6
BioLINCC teaching datasets
A.7
Opioid
References
Published with bookdown
Introduction to Regression Methods for Public Health Using R
Chapter 2
Overview of Regression Methods
In this chapter, you will learn:
A general definition of regression;
Reasons to use regression; and
How to distinguish between commonly used regression methods.