8 Day 8 (June 18)
8.1 Announcements
Assignment 2 questions?
Holiday tomorrow and workday on Friday
Assignment 3 and Final project is posted
Recommended reading
- Chapters 2 and 3 (pgs 13 - 46) in Linear Models with R
Selected questions from journals
- “I’m having some trouble understanding the Nelder-Mead optimization method”
- “We’ve talked a lot about how important the error term is in linear statistical models to account for randomness or variability in a specific dataset. So, in the “lm” function in R Studio, and in the example we discussed using the deer dataset, how can we include the error term inside this lm function?”
- “When we split the data into training and testing sets, what ratio is generally used?”
- “Is it correct to use adjusted R-squared when comparing multiple models to determine whether adding a variable improves the model, and to use R-squared when evaluating a single model to describe how well it fits? When doing basic linear regression, is it more common to use adjusted R-squared or AIC?”
8.3 Maximum Likelihood Estimation
- Maximum likelihood estimation for the linear model
- Assume that y∼N(Xβ,σ2I)
- Write out the likelihood function L(β,σ2)=n∏i=11√2πσ2e−12σ2(yi−x′iβ)2.
- Maximize the likelihood function
- First take the natural log of L(β,σ2), which gives l(β,σ2)=−n2 log(2πσ2)−12σ2n∑i=1(yi−x′iβ)2.
- Recall that ∑ni=1(yi−x′iβ)2=(y−Xβ)′(y−Xβ)
- Maximize the log-likelihood function argmaxβ,σ2−n2 log(2πσ2)−12σ2(y−Xβ)′(y−Xβ)
- Visual for data y=(0.16,2.82,2.24)′ and x=(1,2,3)′
- Three ways to do it in program R
- Using matrix calculus and algebra
# Data y <- c(0.16,2.82,2.24) X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE) n <- length(y) # Maximum likelihood estimation using closed from estimators beta <- solve(t(X)%*%X)%*%t(X)%*%y sigma2 <- (1/n)*t(y-X%*%beta)%*%(y-X%*%beta) beta
## [,1] ## [1,] -0.34 ## [2,] 1.04
## [,1] ## [1,] 0.5832
- Using modern (circa 1970’s) optimization techniques
# Maximum likelihood estimation estimation using the Nelder-Mead algorithm y <- c(0.16,2.82,2.24) X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE) negloglik <- function(par){ beta <- par[1:2] sigma2 <- par[3] -sum(dnorm(y,X%*%beta,sqrt(sigma2),log=TRUE)) } optim(par=c(0,0,1),fn=negloglik,method = "Nelder-Mead")
## $par ## [1] -0.3397162 1.0398552 0.5831210 ## ## $value ## [1] 3.447978 ## ## $counts ## function gradient ## 150 NA ## ## $convergence ## [1] 0 ## ## $message ## NULL
- Using modern and user friendly statistical computing software
# Maximum likelihood estimation using lm function # Note: estimate of sigma2 is not the MLE! df <- data.frame(y = c(0.16,2.82,2.24),x = c(1,2,3)) m1 <- lm(y~x,data=df) coef(m1)
## (Intercept) x ## -0.34 1.04
## [1] 1.7496
## ## Call: ## lm(formula = y ~ x, data = df) ## ## Residuals: ## 1 2 3 ## -0.54 1.08 -0.54 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.3400 2.0205 -0.168 0.894 ## x 1.0400 0.9353 1.112 0.466 ## ## Residual standard error: 1.323 on 1 degrees of freedom ## Multiple R-squared: 0.5529, Adjusted R-squared: 0.1057 ## F-statistic: 1.236 on 1 and 1 DF, p-value: 0.4663