Preface
Stylistic Conventions
R code that is evaluated
R code in a script file (not evaluated)
I Learning R
1
Getting Started With R
1.1
What is R?
1.2
What is RStudio?
1.3
Who should learn R?
1.4
Why should I learn R?
1.5
Where can I get R?
1.6
How do I use R?
1.6.1
Using R on our computer
1.6.2
Does R have epidemiology programs?
1.6.3
How should I use these notes?
1.7
Just do it!
1.7.1
Using R as your calculator
1.7.2
Useful R concepts
1.7.3
Useful R functions
1.7.4
How do I get help?
1.7.5
Is there anything else that I need?
1.7.6
What’s ahead?
1.8
Exercises
1.8.1
1.8.2
1.8.3
1.8.4
1.8.5
1.8.6
1.8.7
1.8.8
1.8.9
1.8.10
1.8.11
1.8.12
2
Working with vectores, matrices, and arrays
2.1
Data objects in R
2.1.1
Atomic vs. recursive data objects
2.1.2
Assessing the structure of data objects
2.2
A vector is a collection of like elements
2.2.1
Understanding vectors
2.2.2
Creating vectors
2.2.3
Naming vectors
2.2.4
Indexing vectors
2.2.5
Replacing vector elements (by indexing and assignment)
2.2.6
Operating on vectors Operating on vectors is very common in epidemiology and statistics.
2.3
A matrix is a 2-dimensional table of like elements
2.3.1
Understanding matrices
2.3.2
Creating matrices
2.3.3
Naming matrix components
2.3.4
Indexing a matrix
2.3.5
Replacing matrix elements
2.3.6
Operating on a matrix
2.4
An array is a
n
-dimensional table of like elements
2.4.1
Understanding arrays
2.4.2
Creating arrays
2.4.3
Naming arrays
2.4.4
Indexing arrays
2.4.5
Replacing array elements
2.4.6
Operating on an array
2.5
Exercises
2.5.1
2.5.2
2.5.3
2.5.4
2.5.5
2.5.6
3
Working with lists and data frames
3.1
A list is a collection of like or unlike data objects
3.1.1
Understanding lists
3.1.2
Creating lists
3.1.3
Naming lists
3.1.4
Indexing lists
3.1.5
Replacing lists components
3.1.6
Operating on lists
3.2
A data frame is a list in a 2-dimensional tabular form
3.2.1
Understanding data frames and factors
3.2.2
Creating data frames
3.2.3
Naming data frames
3.2.4
Indexing data frames
3.2.5
Replacing data frame components
3.2.6
Operating on data frames
3.3
Managing data objects
3.4
Managing our workspace
3.5
Exercises
3.5.1
3.5.2
3.5.3
3.5.4
3.5.5
4
Managing epidemiologic data in R
4.1
Entering and importing data
4.1.1
Entering data
4.1.2
Importing data from a file
4.1.3
Importing data using a URL
4.2
Editing data
4.2.1
Text editor
4.2.2
The
data.entry
,
edit
, or
fix
functions
4.2.3
Vectorized approach
4.2.4
Text processing
4.3
Sorting data
4.4
Indexing (subsetting) data
4.4.1
Indexing
4.4.2
Using the
subset
function
4.5
Transforming data
4.5.1
Numerical transformation
4.5.2
Recoding vector values
4.5.3
Creating categorical variables (factors)
4.5.4
Recoding factor levels (categorical variables)
4.5.5
Use factors instead of dummy variables
4.5.6
Conditionally transforming the elements of a vector
4.6
Merging data
4.7
Executing commands from, and directing output to, a file
4.7.1
The
source
function
4.7.2
The
sink
and
capture.output
functions
4.8
Working with missing and “not available” values
4.8.1
Testing, indexing, replacing, and recoding
4.8.2
Importing missing values with the
read.table
function
4.8.3
Working with NA values in data frames and factors
4.8.4
Viewing number of missing values in tables
4.8.5
Setting default NA behaviors in statistical models
4.8.6
Working with finite, infinite, and NaN numbers
4.9
Working with dates and times
4.9.1
Date functions in the
base
package
4.9.2
Date functions in the
chron
and
survival
packages
4.10
Exporting data objects
4.10.1
Exporting to a generic ASCII text file
4.10.2
Exporting to R ASCII text file
4.10.3
Exporting to R binary file
4.10.4
Exporting to non-R ASCII text and binary files
4.11
Working with regular expressions
4.11.1
Single characters
4.11.2
Character class
4.11.3
Concatenation
4.11.4
Repetition
4.11.5
Alternation
4.11.6
Repetition > Concatenation > Alternation
4.11.7
Metacharacters
4.11.8
4.11.9
5
Programming and R functions
5.1
Basic programming
5.2
Intermediate programming
5.2.1
Control statements
5.2.2
Vectorized approach
5.2.3
Looping
5.3
Writing R functions
5.3.1
Arguments with default values
5.3.2
Passing optional arguments using the
...
function
5.4
Lexical scoping
5.4.1
Functions with a mutatable state
5.5
Recursive functions
5.6
Debugging and profiling R code
5.7
Example: Bootstrap of risk ratio data
5.8
Exercises
5.8.1
5.8.2
5.8.3
5.8.4
5.8.5
5.8.6
5.8.7
6
Graphical display of data
II Data Science
7
Population Health Approach
7.1
Introduction
7.2
Epidemiologic approach
7.3
Epidemiologic analyses for 2-by-2 tables
7.3.1
Cohort studies with risk data or prevalence data
7.4
Epidemiologic analyses for stratified 2-by-2 tables
7.4.1
Cohort studies with binomial (risk or prevalence) data
8
Epidemiologic display of data
9
Descriptive analysis
9.1
Period data
9.2
Cohort data
9.3
Spatial data
10
Predictive analysis
10.1
topic 1
10.2
topic 2
10.3
topic 3
11
Explanatory analysis
11.1
Causal graphical models
11.2
Estimating intervention effects
11.3
Counterfactual models
12
Prescriptive analysis
12.1
Linear and integer optimization
12.2
Decision analysis with Bayesian networks
12.3
Cost-effective analysis
13
Simulation analysis
13.1
Markov modeling of lifecourse events
13.2
Epidemic modeling with social networks
III Appendixes
14
Mathematics review
14.1
Fundamentals
14.2
Matrix alegrebra
14.3
Calculus
15
Probability review
15.1
Axioms and theorems
15.2
Probability distributions
15.3
Independence and conditional probability
15.4
Baye’s theorem
16
Available data sets
Latina Mothers and their Newborn
Oswego County (outbreak)
Western Collaborative Group Study (cohort)
Evans County (cohort)
Myocardial infarction case-control study
AIDS surveillance cases
Hepatitis B surveillance cases
Measles surveillance cases
West Nile virus surveillance cases, California 2004
University Group Diabetes Program
Novel influenza A (H1N1) pandemic
United States reported cases and deaths as of July 23, 2009
Bibliography
Population Health Data Science with R
Chapter 15
Probability review
15.1
Axioms and theorems
15.2
Probability distributions
15.3
Independence and conditional probability
15.4
Baye’s theorem