10 Day 10 (June 16)

10.1 Announcements

  • Read Ch. 3 in linear models with R

10.2 Maximum Likelihood Estimation

  • Maximum likelihood estimation for the linear model
    • Assume that yN(Xβ,σ2I)
    • Write out the likelihood function L(β,σ2)=i=1n12πσ2e12σ2(yixiβ)2.
    • Maximize the likelihood function
      1. First take the natural log of L(β,σ2), which gives l(β,σ2)=n2 log(2πσ2)12σ2i=1n(yixiβ)2.
      2. Recall that i=1n(yixiβ)2=(yXβ)(yXβ)
      3. Maximize the log-likelihood function argmaxβ,σ2n2 log(2πσ2)12σ2(yXβ)(yXβ)
    • Visual for data y=(0.16,2.82,2.24) and x=(1,2,3)
  • Three ways to do it in program R
    • Using matrix calculus and algebra
    # Data 
    y <- c(0.16,2.82,2.24)
    X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE)
    n <- length(y)
    
    # Maximum likelihood estimation using closed from estimators
    beta <- solve(t(X)%*%X)%*%t(X)%*%y 
    sigma2 <- (1/n)*t(y-X%*%beta)%*%(y-X%*%beta)
    
    beta
    ##       [,1]
    ## [1,] -0.34
    ## [2,]  1.04
    sigma2
    ##        [,1]
    ## [1,] 0.5832
    • Using modern (circa 1970’s) optimization techniques
    # Maximum likelihood estimation estimation using Nelder-Mead algorithm
    y <- c(0.16,2.82,2.24)
    X <- matrix(c(1,1,1,1,2,3),nrow=3,ncol=2,byrow=FALSE)
    
    negloglik <- function(par){
    beta <- par[1:2]
    sigma2 <- par[3]
    -sum(dnorm(y,X%*%beta,sqrt(sigma2),log=TRUE))
    }
    optim(par=c(0,0,1),fn=negloglik,method = "Nelder-Mead")
    ## $par
    ## [1] -0.3397162  1.0398552  0.5831210
    ## 
    ## $value
    ## [1] 3.447978
    ## 
    ## $counts
    ## function gradient 
    ##      150       NA 
    ## 
    ## $convergence
    ## [1] 0
    ## 
    ## $message
    ## NULL
    • Using modern and user friendly statistical computing software
    # Maximum likelihood estimation using lm function
    # Note: estimate of sigma2 is not the MLE! 
    df <- data.frame(y = c(0.16,2.82,2.24),x = c(1,2,3))
    m1 <- lm(y~x,data=df)
    
    coef(m1)
    ## (Intercept)           x 
    ##       -0.34        1.04
    summary(m1)$sigma^2
    ## [1] 1.7496
    summary(m1)
    ## 
    ## Call:
    ## lm(formula = y ~ x, data = df)
    ## 
    ## Residuals:
    ##     1     2     3 
    ## -0.54  1.08 -0.54 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)
    ## (Intercept)  -0.3400     2.0205  -0.168    0.894
    ## x             1.0400     0.9353   1.112    0.466
    ## 
    ## Residual standard error: 1.323 on 1 degrees of freedom
    ## Multiple R-squared:  0.5529, Adjusted R-squared:  0.1057 
    ## F-statistic: 1.236 on 1 and 1 DF,  p-value: 0.4663
  • Live example

10.3 Confidence intervals for paramters

  • Example

    y <- c(63, 68, 61, 44, 103, 90, 107, 105, 76, 46, 60, 66, 58, 39, 64, 29, 37,
    27, 38, 14, 38, 52, 84, 112, 112, 97, 131, 168, 70, 91, 52, 33, 33, 27,
    18, 14, 5, 22, 31, 23, 14, 18, 23, 27, 44, 18, 19)
    year <- 1965:2011
    df <- data.frame(y = y, year = year)
    plot(x = df$year, y = df$y, xlab = "Year", ylab = "Annual count", main = "",
    col = "brown", pch = 20, xlim = c(1965, 2040))

    • Is the population size really decreasing?
    • Write out the model we should use answer this question
    • How can we assess the uncertainty in our estimate of β1?
    • Confidence intervals in R
    m1 <- lm(y~year,data=df)
    coef(m1)
    ## (Intercept)        year 
    ##  2356.48797    -1.15784
    confint(m1,level=0.95)    
    ##                 2.5 %       97.5 %
    ## (Intercept) 929.80699 3783.1689540
    ## year         -1.87547   -0.4402103
    • Note β^N(β,σ2(XX)1) and let β1^N(β1,σβ12) where σβ12=σ2c(XX)1c and c(0,1,0,0,,0). In R we can extract σ2(XX)1 using vcov().
    vcov(m1)  
    ##             (Intercept)         year
    ## (Intercept) 501753.2825 -252.3792372
    ## year          -252.3792    0.1269513

    Note that

    diag(vcov(m1))^0.5
    ## (Intercept)        year 
    ## 708.3454542   0.3563023

    corresponds to the column Std. Error from summary()

    summary(m1)
    ## 
    ## Call:
    ## lm(formula = y ~ year, data = df)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -45.333 -20.597  -9.754  14.035 117.929 
    ## 
    ## Coefficients:
    ##              Estimate Std. Error t value Pr(>|t|)   
    ## (Intercept) 2356.4880   708.3455   3.327  0.00176 **
    ## year          -1.1578     0.3563  -3.250  0.00219 **
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 33.13 on 45 degrees of freedom
    ## Multiple R-squared:  0.1901, Adjusted R-squared:  0.1721 
    ## F-statistic: 10.56 on 1 and 45 DF,  p-value: 0.00219
    • When σβ1 is known β1^N(β1,σβ12) β1^β1N(0,σβ12) β1^β1σβ1N(0,1)
    • When σβ1 is estimated β1^β1σ^β1t(ν) where ν=np
    • Deriving the confidence interval for β1^ P(a<β1^β1σ^β1<b)=1α P(aσ^β1<β1^β1<bσ^β1)=1α P(β1^+aσ^β1<β1<β1^+bσ^β1)=1α P(β1^bσ^β1<β1<β1^aσ^β1)=1α
    • What is P(a<β1^β1σ^β1<b)=1α? P(a<z<b)=ab[z|ν]dz
    • “By hand” in R
    nu <- length(df$y) - 2
    a <- qt(p = 0.025,df = nu)
    a
    ## [1] -2.014103
    b <- qt(p = 0.975,df = nu)
    b
    ## [1] 2.014103
    beta1.hat <- coef(m1)[2]
    sigma.beta1.hat <- (diag(vcov(m1))^0.5)[2]
    
    beta1.hat - b*sigma.beta1.hat
    ##     year 
    ## -1.87547
    beta1.hat - a*sigma.beta1.hat
    ##       year 
    ## -0.4402103
    confint(m1)[2,]
    ##      2.5 %     97.5 % 
    ## -1.8754696 -0.4402103