Processing math: 100%
Data Science in R: A Gentle Introduction
Welcome!
Part I: Data exploration
1
Getting started in R
1.1
Download R and RStudio
1.2
First steps
Interacting with R
How you’ll get feedback
R as a calculator
R is case sensitive
1.3
Objects
1.4
Scripts
Creating and running scripts
A slightly more interesting script
Why can’t I just point and click?
1.5
Getting help
1.6
Libraries
Installing a library
Loading a library
Dealing with installation errors
2
Data
2.1
Importing a data set
2.2
The vocabulary of data
2.3
A short, simple data analysis
2.4
Importing data from the command line
3
Counting
3.1
Getting started: ACL Fest
3.2
Simple probabilities
Using xtabs alone
Using prop.table
Using pipes
3.3
Joint probabilities
3.4
Conditional probabilities
Study questions
4
Plots
4.1
The grammar of graphics
4.2
The five basic plots
Scatter plots
Line graphs
Histograms
Boxplots
Bar plots
4.3
Customizing plots
Changing titles and labels
Color scales
Font size
Flipping the
x
and
y
axes
Plotting cheat sheet
Study questions
5
Summaries
5.1
The typical value
5.2
Variation
5.3
Extremes and quantiles
5.4
z-scores
Study questions
6
Data wrangling
6.1
Key data verbs
group_by
filter
select
mutate
arrange
6.2
Complex summaries
Example 1: the five coldest months
Example 2: survival on the Titanic
Example 3: toy imports
6.3
Summary shortcuts
7
Basic regression
7.1
What is a regression model?
Simple regression models
Multiple regression models
7.2
Fitting regression models
7.3
Using and interpreting regression models
Summarizing a relationship
Making predictions
Making fair comparisons
Decomposing variation
Part II: Statistical inference
8
Statistical uncertainty
8.1
Sources of uncertainty
8.2
Real-world vs. statistical uncertainty
8.3
Sampling distributions
Example 1: dessert
What the sampling distribution tells us
Example 2: fishing
Summary
8.4
The truth about statistical uncertainty
Example 1: commuting
Example 2: dessert again
When is statistical inference useful?
Study questions
9
The bootstrap
9.1
The bootstrap sampling distribution
9.2
Bootstrapping summaries
Example 1: sample mean
Bootstrap standard errors and confidence intervals
The biggest bootstrapping gotcha
Example 2: sample proportion
9.3
Bootstrapping differences
Example 1: sleep hours by gender
Example 2: smoking and depression
9.4
Bootstrapping regression models
Example 1: sleep versus age
Statistical vs. practical significance
Example 2: West Campus rents
9.5
Bootstrapping usually, but not always, works
What “confidence” means
Example 1: sample mean
Example 2: the sample minimum
Closing advice
Study questions: the bootstrap
10
p-values
10.1
Example 1: did the Patriots cheat?
10.2
The four steps of hypothesis testing
10.3
Example 2: a disease cluster?
10.4
Interpreting p-values
11
Large-sample inference
Part III: Models
12
Experiments
13
Matching
14
Multiple regression
15
Prediction
16
Probability models
References
Published with bookdown
Data Science in R: A Gentle Introduction
Lesson 14
Multiple regression