Data Analysis in R
Part I: Pre-Analysis
1
From Science to Data
1.1
Multiple Working Hypotheses
1.2
Falsifiability
1.3
Strong Inference
1.4
Experiments and Observations
1.5
Epistemological Domains
1.6
Error
1.7
Replication and Reproducibility
1.8
Recipe for Adult Science
2
Learning R
2.1
Introduction
2.2
Up and Running
2.2.1
Final thoughts before getting into R
2.3
Data Uploading and Manipulation
2.3.1
Data types
2.4
Objects
2.4.1
Vectors
2.4.2
Arrays
2.4.3
Data Frames
2.4.4
Lists
2.4.5
Factors
2.4.6
Changing/Coercing Objects
2.4.7
Inspecting Data/Objects
2.4.8
Sort Data
2.4.9
Subset Data
2.4.10
Tabulate Data
2.4.11
Logical Operators
2.5
Probability Distributions
2.5.1
Common Probability Distributions
2.5.2
Properties and Functions of Probability Distributions
2.5.3
Example Uses
2.6
Loops and Iterating Functions
2.6.1
Apply Functions
2.7
Plotting
2.7.1
High-level Plotting
2.7.2
Low-level Plotting
2.7.3
Plotting Panels
2.7.4
Graphical Parameters
2.8
Other Resources for Learning R
2.8.1
Online Interactive Training
2.8.2
R Packages
2.8.3
Books
2.8.4
Websites
3
Working with Data
3.1
Motivating Data Collection and Management
3.1.1
Historic data collection
3.1.2
Data recording today
3.1.3
A word about Excel
3.2
Data Recording suggestions
3.2.1
Tidy data
3.3
Data files organization
3.4
Metadata
3.5
Data Archiving Suggestions
3.6
Further Reading
4
Exploratory Data Analysis
4.1
Peng’s 10 Steps
4.2
Visually Exploring Your Data
4.2.1
Making Comparisons
4.3
Plot Types
4.3.1
Bar Plot
4.3.2
Histogram
4.3.3
Scatterplot
4.3.4
Boxplot
4.3.5
Line Plot
4.3.6
Other Considerations
4.4
Data Manipulation
4.4.1
dplyr
Verbs
5
Models and Estimation
5.1
Introduction
5.2
Models
5.3
Model Complexity
5.4
Estimation
5.4.1
Monte Carlo Estimation
5.4.2
Frequentist vs Bayesian
5.4.3
Why the frequentist popularity?
5.4.4
The problem with
p
-values
5.4.5
What we can agree on
5.5
The Mechanics of Estimation
5.5.1
Maximum Likelihood
5.5.2
Bayesian Estimation
5.6
Final Advice
Part II: Analysis
6
The Linear Model
6.1
Terms
6.2
Components of the Linear Model
6.2.1
The Response Component
6.2.2
The Stochastic Component
6.2.3
The Linear Component
6.3
Linear Models
6.3.1
Model of the Mean
6.3.2
t
-test
6.3.3
Simple Linear Regression
6.3.4
Analysis of Variance (ANOVA)
6.4
Pre-fitting: Before You Fit a Model
6.5
Post-fitting: After You Fit a Model
6.5.1
Post-hoc Means: Multiple Comparisons
6.6
ANCOVA: Analysis of Co-Variance
6.6.1
Interactions
7
Understanding ANOVA in R
7.1
Introduction
7.2
ANOVA Mechanics
7.3
Generate ANOVA Data
7.4
ANOVA using
lm()
7.5
ANOVA using
aov()
7.6
Multiple Comparisons
7.7
Means Parameterization of ANOVA
8
GLMs: Generalized Linear Models
8.1
Overview
8.2
Poisson linear regression
8.2.1
Poisson linear Regression Example
8.3
Binomial linear regression
8.3.1
Binomial Linear Regression Example
9
Random Effects
9.1
Introduction
9.1.1
A note on terminology
9.2
Variance
9.3
Fixed and random effects
9.3.1
Fixed Effects
9.3.2
Random Effects
9.4
When are random effects appropriate?
9.4.1
Partial pooling and shrinkage
9.5
PLD Example
9.6
Types of models with random effects
9.6.1
Mixed effects models
9.7
Should I Consider Random Effects?
10
Model Selection
10.1
Implicit and explicit model selection
10.2
Model Balance
10.3
Information criterion: The ICs
10.4
Cross validation
10.5
Comparison of AIC and CV
10.6
Which model selection should I use?
Part III: Post-Analysis
11
Visualizing Data
12
Mapping
13
Infographics
References
Steve Midway
Data Analysis in R
Chapter 13
Infographics
Content for this chapter is forthcoming.