Stat 340
Module 0: Welcome to Stat 340!
How to use this site/document/thing
I Lecture Notes
1
M1: MLR fundamentals
1.1
Categorical Predictors
1.1.1
Example: athlete data
1.1.2
Encoding a categorical predictor
1.1.3
Multiple levels
1.2
Multiple Linear Regression Fundamentals
1.2.1
The goal of MLR
1.2.2
The MLR equation
1.2.3
MLR interpretation
1.3
Interactions
1.3.1
What’s an interaction?
1.3.2
Athlete example
1.3.3
Multiple predictors
1.3.4
An interaction effect
1.4
MLR Conditions and Diagnostics
1.4.1
Some old friends
1.4.2
Added-variable plots
1.4.3
Multicollinearity
1.5
Correlation
1.5.1
Two quantitative variables
1.5.2
Correlation: the basics
1.5.3
Calculating correlation
1.5.4
Correlation caution
1.6
Simple Linear Regression Fundamentals
1.6.1
Describing a relationship
1.6.2
The linear regression equation
1.6.3
Interpreting coefficients
1.6.4
Regression caution
1.7
SLR: Least Squares
1.7.1
What is a “good” line?
1.7.2
Least squares fits
1.7.3
Slope and correlation
1.7.4
An important point (literally)
1.8
Goodness of Fit (SLR Edition)
1.8.1
The missing piece
1.8.2
The spread of the residuals
1.8.3
R-squared
1.8.4
P.S. (for the curious)
1.9
Regression Assumptions and Conditions (SLR Edition)
1.9.1
Linearity
1.9.2
Constant error variance
1.9.3
Normal errors
1.9.4
Transformations
1.10
Outliers and Special Points
1.10.1
Outliers
1.10.2
Leverage
1.10.3
Influence
1.10.4
Dealing with it
1.11
Inference for a Regression Slope
1.11.1
Return of the inference framework
1.11.2
Some notation
1.11.3
Hypotheses
1.11.4
Check conditions
1.11.5
Test stat and sampling distribution
1.11.6
Confidence interval
1.12
Confidence and Prediction Intervals
1.12.1
CIs and PIs for means
1.12.2
Prediction in regression
1.12.3
Confidence interval for average value
1.12.4
Prediction interval for individual value
1.12.5
Looking at many
\(x\)
values
2
M2: MLR with matrices
2.1
Random variables and distributions
2.1.1
Discrete RVs
2.1.2
Continuous RVs
2.2
Moments
2.2.1
First moment: Expected value (mean) – a measure of center
2.2.2
Second moment: Variance – a measure of spread
2.3
Joint distributions
2.3.1
Covariance and correlation: joint moments
2.4
Matrix form of the regression equation
2.5
Random vectors and matrices
2.5.1
Joint moments for vectors and matrices
2.6
General matrix fax and trix
2.6.1
Math for variances
2.6.2
Handy linear algebra concepts
3
M3: MLR estimation
3.1
The fitted regression model
3.2
The normal equations
3.3
Matrix algebra for the normal equations
3.4
Sums of squares
3.5
Least squares for simple linear regression, matrix edition
3.6
Three versions of
\(b_1\)
3.7
The hat matrix
3.8
But really, why squared?
3.9
Maximum likelihood
3.10
Maximum likelihood and least squares
4
M4: MLR inference
4.1
Intro to the
t
distribution
4.1.1
What’s the goal here?
4.1.2
Why not Normal?
4.1.3
What does the
\(t\)
do?
4.2
Inference framework I: hypothesis testing
4.2.1
Introduction
4.2.2
Frequentist inference: the big picture
4.2.3
The inference framework
4.3
Inference framework II: CIs and reporting
4.3.1
Where were we?
4.3.2
Confidence intervals in general
4.3.3
CIs and sampling distributions
4.3.4
Back to the inference framework
4.4
Hypothesis Testing: Tips and Troubles
4.4.1
Introduction
4.4.2
Choosing alpha
4.4.3
One-sided tests
4.4.4
Things not to say about confidence intervals
4.4.5
Things not to say about p-values
4.5
Inference for a Regression Slope: SLR edition
4.5.1
Return of the inference framework
4.5.2
Some notation
4.5.3
Hypotheses
4.5.4
Check conditions
4.5.5
Test stat and sampling distribution
4.5.6
Confidence interval
4.6
Moments of coefficient estimates, MLR matrix edition
4.7
The t test for regression, with details
4.7.1
Student’s
\(t\)
distribution
4.7.2
Okay, so what about the slope?
4.7.3
The t test in multiple regression
4.8
The overall F test for regression
4.8.1
Mean Squares
4.8.2
Null distribution time!
4.8.3
WTF is the F?
4.9
An optional historical side-note: Gosset and the t
4.10
Degrees of freedom
4.10.1
Introduction
4.10.2
Example time!
4.10.3
The benefits of extra df
4.11
Confidence and prediction intervals, one predictor
4.11.1
CIs and PIs for means
4.11.2
Prediction in regression
4.11.3
Confidence interval for average value
4.11.4
Prediction interval for individual value
4.11.5
Looking at many
\(x\)
values
4.12
CIs and PIs in multiple regression
4.12.1
CIs for
\(\beta\)
’s
4.12.2
CIs and PIs for points
4.12.3
Variance of predictions at a new point
5
M5: Model evaluation
5.1
Model selection criteria
5.1.1
\(t\)
-tests
5.1.2
Nested F-tests
5.1.3
SSE
5.1.4
\(R^2\)
5.1.5
Adjusted
\(R^2\)
5.1.6
AIC
5.1.7
BIC
5.1.8
Mallows’
\(C_p\)
5.2
Bias, variance, and estimators
5.2.1
Optional math moment: proof of the MSE
5.2.2
Okay we’re back
5.2.3
Predictions
6
M6: Structure of GLMs
6.1
Logistic Regression: Why and How?
6.1.1
Example: the Challenger O-ring data
6.1.2
A new variation: logistic regression
6.1.3
Back to the Challenger
6.1.4
Heads-up: interpreting coefficients
6.2
Interpreting Logistic Regression
6.2.1
Grad school dataset
6.2.2
Coefficient interpretation
6.2.3
Adding more predictors
6.3
Poisson Regression
6.3.1
The problem
6.3.2
Example: squirrels!
6.3.3
Predictions and distributions
6.3.4
Poisson regression models
6.3.5
R and interpretation
7
M7: GLM Inference and Prediction
7.1
Prediction and Residuals
7.1.1
Grad school admissions data
7.1.2
Getting predictions
7.1.3
Residuals
7.1.4
What about Poisson/count regression?
7.2
Prediction and ROC Curves
7.2.1
The story so far (review)
7.2.2
Predicting the response
7.2.3
Being wrong (or not)
7.2.4
The ROC curve
7.2.5
The AUC
7.3
Conditions for Logistic Regression
7.3.1
Why worry?
7.3.2
Checking conditions
7.4
GLM Inference Tests
7.4.1
Wald tests
7.4.2
Likelihood Ratio Tests
7.5
Deviance and Residuals
7.5.1
Pearson residuals
7.5.2
Deviance: a new approach
7.5.3
Deviance residuals
8
M8: GAMs
9
M9: Dimension Reduction
10
M10: Optimization Choices
10.1
Weighted and Reweighted Least Squares
10.1.1
Non-constant variance
10.1.2
How To Kind of Ignore Things, But Rigorously
10.1.3
IRLS for non-constant variance
10.2
Influential points: IRLS robust regression
10.2.1
Influential points
10.2.2
The robust regression procedure
11
M11: Classification
12
M12: Trees
II Reading Notes
Appendices
A
Syllabus
A.1
Course philosophy
A.2
Course objectives: what is this course for?
A.2.1
The big picture
A.2.2
Specific objectives
A.3
Course structure
A.3.1
The Day-to-Day
A.3.2
What You’ll Do
A.4
Practical matters
A.4.1
Expectations
A.4.2
Access and accommodations
A.4.3
Materials
A.4.4
Legal notes
A.5
Grades and Other Inconveniences
A.5.1
Assessments
A.5.2
Engagement credit details
A.5.3
Fine Print
A.5.4
Actual numbers
A.5.5
Special cases
A.5.6
Flexibility and exceptions
A.6
Assignment guide
A.6.1
Lectures and readings
A.6.2
Pre-class questions
A.6.3
In-class activities/labs
A.6.4
Practice problems
A.6.5
Target Assessments
A.6.6
The Project
B
Resource links
B.1
Contact me/the TAs
B.2
Other folks to talk to
B.3
FAQs
B.3.1
What week is it?
B.3.2
Prerequisites
B.3.3
Whyyyy?
B.3.4
Feedback/scoring questions
B.3.5
Getting stuck
B.3.6
Staying on track
B.3.7
Study buddies
B.3.8
Accommodations
B.3.9
Schedule conflicts
B.3.10
Sleep!
B.4
Helpful worksheets
B.4.1
Engagement worksheet
Published with bookdown
Stat 340 Notes: Fall 2021
B
Resource links