4.2 Parameter Estimation

There are two model parameters to estimate: \(\hat{\beta}\) estimates the coefficient vector \(\beta\), and \(\hat{\sigma}\) estimates the variance of the residuals along the regression line.

Derive the coefficient estimators by minimizing the sum of squared residuals \(SSE = (y - X \hat{\beta})' (y - X \hat{\beta})\). The result is

\[\hat{\beta} = (X'X)^{-1}X'y.\]

The residual standard error (RSE) estimates the sample deviation around the population regression line. (Think of each value of \(X\) along the regression line as a subpopulation with mean \(y_i\) and variance \(\sigma^2\). This variance is assumed to be the same for all \(X\).)

\[\hat{\sigma} = \sqrt{(n-k-1)^{-1} e'e}.\]

The standard error for the coefficient estimators is the square root of the error variance divided by \((X'X)\).

\[SE(\hat{\beta}) = \sqrt{\hat{\sigma}^2 (X'X)^{-1}}.\]

4.2.0.1 Example

Dataset mtcars contains response variable fuel consumption mpg and 10 aspects of automobile design and performance for 32 automobiles. What is the relationship between the response variable and its predictors?

d <- mtcars %>%
  mutate(vs = factor(vs, labels = c("V", "S")),
         am = factor(am, labels = c("automatic", "manual")),
         cyl = ordered(cyl),
         gear = ordered(gear), 
         carb = ordered(carb))
glimpse(d)
## Rows: 32
## Columns: 11
## $ mpg  <dbl> 21, 21, 23, 21, 19, 18, 14, 24, 23, 19, 18, 16, 17, 15, 10, 10...
## $ cyl  <ord> 6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8, 8, 8, 8, 8, 8, 4, 4, 4, 4,...
## $ disp <dbl> 160, 160, 108, 258, 360, 225, 360, 147, 141, 168, 168, 276, 27...
## $ hp   <dbl> 110, 110, 93, 110, 175, 105, 245, 62, 95, 123, 123, 180, 180, ...
## $ drat <dbl> 3.9, 3.9, 3.8, 3.1, 3.1, 2.8, 3.2, 3.7, 3.9, 3.9, 3.9, 3.1, 3....
## $ wt   <dbl> 2.6, 2.9, 2.3, 3.2, 3.4, 3.5, 3.6, 3.2, 3.1, 3.4, 3.4, 4.1, 3....
## $ qsec <dbl> 16, 17, 19, 19, 17, 20, 16, 20, 23, 18, 19, 17, 18, 18, 18, 18...
## $ vs   <fct> V, V, S, S, V, S, V, S, S, S, S, V, V, V, V, V, V, S, S, S, S,...
## $ am   <fct> manual, manual, manual, automatic, automatic, automatic, autom...
## $ gear <ord> 4, 4, 4, 3, 3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3,...
## $ carb <ord> 4, 4, 1, 1, 2, 1, 4, 2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1,...

The data consists of 32 observations. A scatterplot matrix of the numeric variables shows the strongest individual association with mpg is from wt (corr = -0.87) followed by disp (corr = -0.85) and hp (corr = -0.78), drat is moderately correlated with mpg (corr = 0.68), and qsec is weakly correlated with mpg (corr = 0.42).

corrplot(cor(subset(d, select = c(mpg, disp, hp, drat, wt, qsec))), 
         type = "upper", 
         method = "number")

Many of the Predictor variables are strongly correlated with each other. Boxplots of the categorical variables shows differences in levels, although ordinal variables gear and and carb do not have a monotonic relationshiop with mpg.

p_list <- list()
for(i in c("cyl", "vs", "am", "gear", "carb")) {
  p <- ggplot(d, aes_string(x = i, y = "mpg")) + geom_boxplot()
  p_list <- c(p_list, list(p))
}
do.call("grid.arrange", c(p_list, ncol = 2))

I’ll drop the gear and carb predictors, and fit a population model to the remaining predictors.

m <- lm(mpg ~ ., data = d[,1:9])
summary(m)
## 
## Call:
## lm(formula = mpg ~ ., data = d[, 1:9])
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.998 -1.355 -0.311  1.199  4.110 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 19.54098   14.14642    1.38    0.181  
## cyl.L        0.34256    2.76483    0.12    0.903  
## cyl.Q        1.38843    1.11210    1.25    0.225  
## disp         0.00669    0.01351    0.49    0.626  
## hp          -0.02914    0.01718   -1.70    0.104  
## drat         0.58806    1.50311    0.39    0.699  
## wt          -3.15525    1.42023   -2.22    0.037 *
## qsec         0.52324    0.69013    0.76    0.456  
## vsS          1.23780    2.10606    0.59    0.563  
## ammanual     3.00091    1.85340    1.62    0.120  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.5 on 22 degrees of freedom
## Multiple R-squared:  0.877,	Adjusted R-squared:  0.826 
## F-statistic: 17.4 on 9 and 22 DF,  p-value: 0.0000000481

The summary() function shows \(\hat{\beta}\) as Estimate, \(SE({\hat{\beta}})\) as Std. Error, and \(\hat{\sigma}\) as Residual standard error. You can verify this by manually peforming these calculations using matrix algebra (see matrix algebra in r notes at R for Dummies). Here are the coefficient estimators, \(\hat{\beta} = (X'X)^{-1}X'y\).

X <- model.matrix(m)
y <- d$mpg

beta_hat <- solve(t(X) %*% X) %*% t(X) %*% y
round(beta_hat, 5)
##                [,1]
## (Intercept) 19.5410
## cyl.L        0.3426
## cyl.Q        1.3884
## disp         0.0067
## hp          -0.0291
## drat         0.5881
## wt          -3.1553
## qsec         0.5232
## vsS          1.2378
## ammanual     3.0009

Here is the residual standard error, \(\hat{\sigma} = \sqrt{(n-k-1)^{-1} \hat{e}'\hat{e}}\).

n <- nrow(X)
k <- ncol(X) - 1  # exclude the intercept term
y_hat <- X %*% beta_hat
sse <- sum((y - y_hat)^2)
rse <- sqrt(sse / (n - k - 1))
cat("Residual standard error: ", round(rse, 3), " on ", (n - k - 1), " degrees of freedom.")
## Residual standard error:  2.5  on  22  degrees of freedom.

Use the residual standard errors to derive the standard errors of the coefficients, \(SE(\hat{\beta}) = \sqrt{\hat{\sigma}^2 (X'X)^{-1}}\).

se_beta_hat <- sqrt(diag(rse^2 * solve(t(X) %*% X)))
matrix(round(se_beta_hat, 5), dimnames = list(names(se_beta_hat), "Std. Error"))
##             Std. Error
## (Intercept)     14.146
## cyl.L            2.765
## cyl.Q            1.112
## disp             0.014
## hp               0.017
## drat             1.503
## wt               1.420
## qsec             0.690
## vsS              2.106
## ammanual         1.853