3 Day 3 (June 7)
3.1 Announcements
Meet-and-greet tomorrow
A comment about email help!
If office hours times don’t work for you let me know
Recommended reading
- Chapters 1 and 2 (pgs 1 - 28) in Linear Models with R
- Chapter 2 in Applied Regression and ANOVA Using SAS
3.2 Intro to statistical modelling
Quick review of major ideas from Day 2 notes
Example: disease introduction
3.3 Matrix algebra
- Column vectors
- In R
<- matrix(c(1,2,3),nrow=3,ncol=1) y y
## [,1] ## [1,] 1 ## [2,] 2 ## [3,] 3
- Matrices
- In R
<- matrix(c(1,2,3,4,5,6),nrow=3,ncol=2,byrow=FALSE) X X
## [,1] [,2] ## [1,] 1 4 ## [2,] 2 5 ## [3,] 3 6
- Vector multiplication
- In R
t(y)%*%y
## [,1] ## [1,] 14
- Matrix by vector multiplication
- In R
t(X)%*%y
## [,1] ## [1,] 14 ## [2,] 32
- Matrix by matrix multiplication
- In R
t(X)%*%X
## [,1] [,2] ## [1,] 14 32 ## [2,] 32 77
- Matrix inversion
- In R
solve(t(X)%*%X)
## [,1] [,2] ## [1,] 1.4259259 -0.5925926 ## [2,] -0.5925926 0.2592593
- Determinant of a matrix
- In R
<- diag(1,3) I I
## [,1] [,2] [,3] ## [1,] 1 0 0 ## [2,] 0 1 0 ## [3,] 0 0 1
det(I)
## [1] 1
- Quadratic form
- Derivative of a quadratic form (Note is a symmetric matrix; e.g., )
- Other useful derivatives
3.4 Introduction to linear models
What is a model?
What is a linear model?
Most widely used model in science, engineering, and statistics
Vector form:
Matrix form:
Which part of the model is the mathematical model
Which part of the model makes the linear model a “statistical” model
Visual
Which of the four below are a linear model
Why study the linear model?
- Building block for more complex models (e.g., GLMs, mixed models, machine learning, etc)
- We know the most about it
3.5 Estimation
- Three options to estimate
- Minimize a loss function
- Maximize a likelihood function
- Find the posterior distribution
- Each option requires different assumptions
3.6 Loss function approach
Define a measure of discrepancy between the data and the mathematical model
- Find the values of that make “closest” to
- Visual
Classic example or in matrix form which results in -In R
<- c(0.16,2.82,2.24) y <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE) X solve(t(X)%*%X)%*%t(X)%*%y
## [,1] ## [1,] -0.34 ## [2,] 1.04
optim(par=c(0,0),method = c("Nelder-Mead"),fn=function(beta){t(y-X%*%beta)%*%(y-X%*%beta)})
## $par ## [1] -0.3399977 1.0399687 ## ## $value ## [1] 1.7496 ## ## $counts ## function gradient ## 61 NA ## ## $convergence ## [1] 0 ## ## $message ## NULL
lm(y~X-1)
## ## Call: ## lm(formula = y ~ X - 1) ## ## Coefficients: ## X1 X2 ## -0.34 1.04