4 Day 4 (June 6)
4.1 Announcements
Tutoring for program R
- Dickens Hall room 108
- 12:30 - 1:30 Monday - Friday
Recommended reading
- Chapters 1 and 2 (pgs 1 - 28) in Linear Models with R
- Chapter 2 in Applied Regression and ANOVA Using SAS
Final project is posted
Assignment 2 is posted a due Wednesday June 12
Special in-class event on Friday!
4.2 Introduction to linear models
What is a model?
What is a linear model?
Most widely used model in science, engineering, and statistics
Vector form: y=β0+β1x1+β2x2+…+βpxp+ε
Matrix form: y=Xβ+ε
Which part of the model is the mathematical model
Which part of the model makes the linear model a “statistical” model
Visual
Which of the four below are a linear model y=β0+β1x1+β2x21+ε y=β0+β1x1+β2log(x1)+ε y=β0+β1eβ2x1+ε y=β0+β1x1+log(β2)x1+ε
Why study the linear model?
- Building block for more complex models (e.g., GLMs, mixed models, machine learning, etc)
- We know the most about it
4.3 Estimation
- Three options to estimate β
- Minimize a loss function
- Maximize a likelihood function
- Find the posterior distribution
- Each option requires different assumptions
4.4 Loss function approach
- Define a measure of discrepancy between the data and the mathematical model
- Find the values of β that make Xβ “closest” to y
- Visual
- Classic example argminβn∑i=1(yi−x′iβ)2 or in matrix form argminβ(y−Xβ)′(y−Xβ) which results in ^β=(X′X)−1X′y
- Three ways to do it in program R
- Using scalar calculus and algebra (kind of)
y <- c(0.16,2.82,2.24) x <- c(1,2,3) y.bar <- mean(y) x.bar <- mean(x) # Estimate the slope parameter beta1.hat <- sum((x-x.bar)*(y-y.bar))/sum((x-x.bar)^2) beta1.hat
## [1] 1.04
# Estimate the intercept parameter beta0.hat <- y.bar - sum((x-x.bar)*(y-y.bar))/sum((x-x.bar)^2)*x.bar beta0.hat
## [1] -0.34