A Book Example
Preface
About the Authors
Stylistic Conventions
R code that is evaluated
R code in a script file (not evaluated)
I Learning R
1
Getting Started With R
1.1
What is R?
1.2
What is RStudio?
1.3
Who should learn R?
1.4
Why should I learn R?
1.5
Where can I get R?
1.6
How do I use R?
1.6.1
Using R on our computer
1.6.2
Does R have epidemiology programs?
1.6.3
How should I use these notes?
1.7
Just do it!
1.7.1
Using R as your calculator
1.7.2
Useful R concepts
1.7.3
Useful R functions
1.7.4
How do I get help?
1.7.5
Is there anything else that I need?
1.7.6
What’s ahead?
1.8
What are graphical models?
1.9
Precision and number types?
1.10
Exercises
2
Working with vectors, matrices, and arrays
2.1
Data objects in R
2.1.1
Atomic vs. recursive data objects
2.1.2
Assessing the structure of data objects
2.2
A vector is a collection of like elements
2.2.1
Understanding vectors
2.2.2
Creating vectors
2.2.3
Naming vectors
2.2.4
Indexing vectors
2.2.5
Replacing vector elements (by indexing and assignment)
2.2.6
Operating on vectors
2.2.7
Converting vectors into factors (categorical variables)
2.3
A matrix is a 2-dimensional table of like elements
2.3.1
Understanding matrices
2.3.2
Creating matrices
2.3.3
Naming matrix components
2.3.4
Indexing a matrix
2.3.5
Replacing matrix elements
2.3.6
Operating on a matrix
2.4
An array is a
n
-dimensional table of like elements
2.4.1
Understanding arrays
2.4.2
Creating arrays
2.4.3
Naming arrays
2.4.4
Indexing arrays
2.4.5
Replacing array elements
2.4.6
Operating on an array
2.5
Graphical models—the story behind the data
2.5.1
Causal graphs are a type of Bayesian network
2.6
Problem exercises
3
Working with lists and data frames
3.1
A list is a collection of like or unlike data objects
3.1.1
Understanding lists
3.1.2
Creating lists
3.1.3
Naming lists
3.1.4
Indexing lists
3.1.5
Replacing lists components
3.1.6
Operating on lists
3.2
A data frame is a list in a 2-dimensional tabular form
3.2.1
Understanding data frames and factors
3.2.2
Creating data frames
3.2.3
Naming data frames
3.2.4
Indexing data frames
3.2.5
Replacing data frame components
3.2.6
Operating on data frames
3.3
Managing data objects and workspace
3.4
Exercises
3.4.1
3.4.2
3.4.3
3.4.4
3.4.5
4
Managing epidemiologic data in R
4.1
Entering and importing data
4.1.1
Entering data
4.1.2
Importing data from a file
4.1.3
Importing data using a URL
4.2
Editing data
4.2.1
Text editor
4.2.2
The
data.entry
,
edit
, or
fix
functions
4.2.3
Vectorized approach
4.2.4
Text processing
4.3
Sorting data
4.4
Indexing (subsetting) data
4.4.1
Indexing
4.4.2
Using the
subset
function
4.5
Transforming data
4.5.1
Numerical transformation
4.5.2
Recoding vector values
4.5.3
Creating categorical variables (factors)
4.5.4
Recoding factor levels (categorical variables)
4.5.5
Use factors instead of dummy variables
4.5.6
Conditionally transforming the elements of a vector
4.6
Merging data
4.7
Executing commands from, and directing output to, a file
4.7.1
The
source
function
4.7.2
The
sink
and
capture.output
functions
4.8
Working with missing and “not available” values
4.8.1
Testing, indexing, replacing, and recoding
4.8.2
Importing missing values with the
read.table
function
4.8.3
Working with NA values in data frames and factors
4.8.4
Viewing number of missing values in tables
4.8.5
Setting default NA behaviors in statistical models
4.8.6
Working with finite, infinite, and NaN numbers
4.9
Working with dates and times
4.9.1
Date functions in the
base
package
4.9.2
Date functions in the
chron
and
survival
packages
4.10
Exporting data objects
4.10.1
Exporting to a generic ASCII text file
4.10.2
Exporting to R ASCII text file
4.10.3
Exporting to R binary file
4.10.4
Exporting to non-R ASCII text and binary files
4.11
Working with regular expressions
4.11.1
Single characters
4.11.2
Character class
4.11.3
Concatenation
4.11.4
Repetition
4.11.5
Alternation
4.11.6
Repetition > Concatenation > Alternation
4.11.7
Metacharacters
4.12
Problem exercises
5
Programming with R—An introduction
5.1
Basic programming
5.2
Intermediate programming
5.2.1
Control statements
5.2.2
Vectorized approach
5.2.3
Looping
5.3
Writing R functions
5.3.1
Arguments with default values
5.3.2
Passing optional arguments using the
...
function
5.4
Lexical scoping
5.4.1
Functions with a mutatable state
5.5
Recursive functions
5.6
Debugging and profiling R code
5.7
Example: Bootstrap of risk ratio data
5.8
Exercises
5.8.1
5.8.2
5.8.3
5.8.4
5.8.5
5.8.6
5.8.7
6
Displaying data in R—An introduction
II Population health data science
7
Population health approach
7.1
Introduction
7.2
Epidemiologic approach
7.3
Epidemiologic analyses for 2-by-2 tables
7.3.1
Cohort studies with risk data or prevalence data
7.4
Epidemiologic analyses for stratified 2-by-2 tables
7.4.1
Cohort studies with binomial (risk or prevalence) data
8
Understanding epidemiologic measures
9
Displaying of epidemiologic data in R
10
Conducting descriptive analysis
10.1
Period data
10.2
Cohort data
10.3
Spatial data
11
Conducting predictive analysis
11.1
topic 1
11.2
topic 2
11.3
topic 3
12
Conducting causal analysis
12.1
Causal graphical models
12.2
Estimating intervention effects
12.3
Counterfactual models
13
Gaining insights with simulations
13.1
Markov modeling of lifecourse events
13.2
Epidemic modeling with social networks
14
Optimizing decision quality (DQ)
14.1
Linear and integer optimization
14.2
Decision analysis with Bayesian networks
14.3
Cost-effectiveness analysis
Appendix
A
Rmarkdown review
B
Probability review
B.1
Axioms and theorems
B.2
Probability distributions
B.3
Independence and conditional probability
B.4
Baye’s theorem
C
Mathematics review
C.1
Fundamentals
C.2
Matrix algebra
C.3
Calculus
D
Available data sets
Latina Mothers and their Newborn
Oswego County (outbreak)
Western Collaborative Group Study (cohort)
Evans County (cohort)
Myocardial infarction case-control study
AIDS surveillance cases
Hepatitis B surveillance cases
Measles surveillance cases
West Nile virus surveillance cases, California 2004
University Group Diabetes Program
Novel influenza A (H1N1) pandemic
United States reported cases and deaths as of July 23, 2009
References
Published with bookdown
Population Health Data Science with R
A
Rmarkdown review