Chapter 2 Introduction

Trying to avoid math and formulas but they are needed in some places and in keeping with goal of being a bridge to more advanced texts are consistent with what you would see in those texts.

Treating regression like a black box, throw them all in and see what happens -> can lead to lots of mistakes. How is each variable coded? Are there interactions? Understanding the meaning of the terms in the model is essential to properly fitting and interpreting a regression model.

2.1 R and R Studio

  • Refer to the appendix
  • There are many resources for learning R programming
  • Using ? to get help
  • Sometimes I use base R, sometimes tidyverse() and sometimes both
  • Libraries, Using ::
  • Brief intro to using %>%
  • Using an R project

2.2 Datasets

In all the code used in the book, data are loaded from a folder called “Data” located in the same folder as my R project. If you download the data from [INSERT SITE], create an R project, and place the data there in a folder called “Data” you should be able to run all the code as-is.

List them here, but put full descriptions in the Appendix.

  • NHANES
    • 20% subset used in some cases
    • Explain how smoker and income were derived (put in Appendix)

Include code for downloading data and loading into R (some of them are SAS XPT files).