A Book Example
Preface
About the Authors
Stylistic Conventions
R code that is evaluated
R code in a script file (not evaluated)
I Learning R
1
Getting Started With R
1.1
What is R?
1.2
What is RStudio?
1.3
Who should learn R?
1.4
Why should I learn R?
1.5
Where can I get R?
1.6
How do I use R?
1.6.1
Using R on our computer
1.6.2
Does R have epidemiology programs?
1.6.3
How should I use these notes?
1.7
Just do it!
1.7.1
Using R as your calculator
1.7.2
Useful R concepts
1.7.3
Useful R functions
1.7.4
How do I get help?
1.7.5
Is there anything else that I need?
1.7.6
What’s ahead?
1.8
What are graphical models?
1.9
Precision and number types?
1.10
Problem exercises
1.10.1
Workspace
1.10.2
Math operations
1.10.3
Body mass index
1.10.4
Logarithm
1.10.5
Risk and risk odds
1.10.6
HIV transmission probabilities
1.10.7
Sourcing files
2
Working with vectors, matrices, and arrays
2.1
Data objects in R
2.1.1
Atomic vs. recursive data objects
2.1.2
Assessing the structure of data objects
2.2
A vector is a collection of like elements
2.2.1
Understanding vectors
2.2.2
Creating vectors
2.2.3
Naming vectors
2.2.4
Indexing vectors
2.2.5
Replacing vector elements (by indexing and assignment)
2.2.6
Operating on vectors
2.2.7
Converting vectors into factors (categorical variables)
2.3
A matrix is a 2-dimensional table of like elements
2.3.1
Understanding matrices
2.3.2
Creating matrices
2.3.3
Naming matrix components
2.3.4
Indexing a matrix
2.3.5
Replacing matrix elements
2.3.6
Operating on a matrix
2.4
An array is a
n
-dimensional table of like elements
2.4.1
Understanding arrays
2.4.2
Creating arrays
2.4.3
Naming arrays
2.4.4
Indexing arrays
2.4.5
Replacing array elements
2.4.6
Operating on an array
2.5
Graphical models—the story behind the data
2.5.1
Causal graphs are a type of Bayesian network
2.6
Problem exercises
3
Working with lists and data frames
3.1
A list is a collection of like or unlike data objects
3.1.1
Understanding lists
3.1.2
Creating lists
3.1.3
Naming lists
3.1.4
Indexing lists
3.1.5
Replacing lists components
3.1.6
Operating on lists
3.2
A data frame is a list in a 2-dimensional tabular form
3.2.1
Understanding data frames and factors
3.2.2
Creating data frames
3.2.3
Naming data frames
3.2.4
Indexing data frames
3.2.5
Replacing data frame components
3.2.6
Operating on data frames
3.3
Managing data objects and workspace
3.4
Exercises
3.4.1
3.4.2
3.4.3
3.4.4
3.4.5
4
Managing epidemiologic data in R
4.1
Entering and importing data
4.1.1
Entering data
4.1.2
Importing data from a file
4.1.3
Importing data using a URL
4.2
Editing data
4.2.1
Text editor
4.2.2
The
data.entry
,
edit
, or
fix
functions
4.2.3
Vectorized approach
4.2.4
Text processing
4.3
Sorting data
4.4
Indexing (subsetting) data
4.4.1
Indexing
4.4.2
Using the
subset
function
4.5
Transforming data
4.5.1
Numerical transformation
4.5.2
Recoding vector values
4.5.3
Creating categorical variables (factors)
4.5.4
Recoding factor levels (categorical variables)
4.5.5
Use factors instead of dummy variables
4.5.6
Conditionally transforming the elements of a vector
4.6
Merging data
4.7
Executing commands from, and directing output to, a file
4.7.1
The
source
function
4.7.2
The
sink
and
capture.output
functions
4.8
Working with missing and “not available” values
4.8.1
Testing, indexing, replacing, and recoding
4.8.2
Importing missing values with the
read.table
function
4.8.3
Working with NA values in data frames and factors
4.8.4
Viewing number of missing values in tables
4.8.5
Setting default NA behaviors in statistical models
4.8.6
Working with finite, infinite, and NaN numbers
4.9
Working with dates and times
4.9.1
Date functions in the
base
package
4.9.2
Date functions in the
chron
and
survival
packages
4.10
Exporting data objects
4.10.1
Exporting to a generic ASCII text file
4.10.2
Exporting to R ASCII text file
4.10.3
Exporting to R binary file
4.10.4
Exporting to non-R ASCII text and binary files
4.11
Working with regular expressions
4.11.1
Single characters
4.11.2
Character class
4.11.3
Concatenation
4.11.4
Repetition
4.11.5
Alternation
4.11.6
Repetition > Concatenation > Alternation
4.11.7
Metacharacters
4.12
Problem exercises
5
Programming and R functions
5.1
Basic programming
5.2
Intermediate programming
5.2.1
Control statements
5.2.2
Vectorized approach
5.2.3
Looping
5.3
Writing R functions
5.3.1
Arguments with default values
5.3.2
Passing optional arguments using the
...
function
5.4
Lexical scoping
5.4.1
Functions with a mutatable state
5.5
Recursive functions
5.6
Debugging and profiling R code
5.7
Example: Bootstrap of risk ratio data
5.8
Exercises
5.8.1
5.8.2
5.8.3
5.8.4
5.8.5
5.8.6
5.8.7
Appendix
A
Rmarkdown review
B
Probability review
C
Mathematics review
D
Glossary
E
Solutions
II Appendixes
F
Mathematics review
F.1
Fundamentals
F.2
Matrix algebra
F.3
Calculus
G
Probability review
G.1
Axioms and theorems
G.2
Probability distributions
G.3
Independence and conditional probability
G.4
Baye’s theorem
H
Available data sets
Latina Mothers and their Newborn
Oswego County (outbreak)
Western Collaborative Group Study (cohort)
Evans County (cohort)
Myocardial infarction case-control study
AIDS surveillance cases
Hepatitis B surveillance cases
Measles surveillance cases
West Nile virus surveillance cases, California 2004
University Group Diabetes Program
Novel influenza A (H1N1) pandemic
United States reported cases and deaths as of July 23, 2009
References
Published with bookdown
Population Health Data Science with R
E
Solutions