Chapter 9 Advanced Regression and Nonparametric Approaches
9.1 Ridge and Lasso Regression
The lm.ridge()
and command, in the MASS
package, can be used to fit a ridge regression model.
We first need to standardize each quantitative variable. This is done using the scale()
command in R.
Train_sc <- Train %>% mutate_if(is.numeric, scale)
9.1.1 Fitting a Ridge Regression Model
We can perform ridge regression using the lm.ridge()
command in the MASS
package.
library(MASS)
M_Ridge1 <- lm.ridge(data=Train_sc, price~., lambda = 1)
head(M_Ridge1$coef)
## id property_typeCondominium property_typeHouse
## 0.036320094 -0.013032610 -0.006899487
## property_typeTownhouse property_typeOther room_typePrivate room
## -0.018020782 0.058907892 -0.288091484
9.1.2 Cross Validation with Ridge
We perform cross-validation to determine the optimal value of \(\lambda\). The command 10^seq(-3, 3, length = 100)
defines values between 0 and 1000 for \(\lambda\), which will be tested in cross-validation.
This requires the glmnet
package.
library(glmnet)
control = trainControl("repeatedcv", number = 5, repeats=5)
l_vals = 10^seq(-3, 3, length = 100)
set.seed(11162020)
AirBnB_ridge <- train(price ~., data = Train_sc, method = "glmnet", trControl=control ,
tuneGrid=expand.grid(alpha=0, lambda=l_vals))
Identify the optimal \(\lambda\).
AirBnB_ridge$bestTune$lambda
## [1] 0.4641589
Plot of RMSPE for each value of \(\lambda\).
lambda <- AirBnB_ridge$results$lambda
RMSPE <- AirBnB_ridge$results$RMSE
ggplot(data=data.frame(lambda, RMSPE), aes(x=lambda, y=RMSPE))+
geom_line() + xlim(c(0,2)) + ylim(c(0.75, 0.82)) +
ggtitle("Ridge Regression Cross Validation Results")
9.1.3 Cross-validation with Lasso Regression
For lasso regression, set alpha=1
.
control = trainControl("repeatedcv", number = 5, repeats=5)
l_vals = 10^seq(-3, 3, length = 100)
set.seed(11162020)
AirBnB_lasso <- train(price ~., data = Train_sc, method = "glmnet", trControl=control ,
tuneGrid=expand.grid(alpha=1, lambda=l_vals))
Identify the optimal \(\lambda\).
AirBnB_lasso$bestTune$lambda
## [1] 0.04977024
Plot of RMSPE for each value of \(\lambda\).
lambda <- AirBnB_lasso$results$lambda
RMSPE <- AirBnB_lasso$results$RMSE
ggplot(data=data.frame(lambda, RMSPE), aes(x=lambda, y=RMSPE))+geom_line() +
xlim(c(0,0.2)) + ylim(c(0.75, 0.82)) +
ggtitle("Lasso Regression Cross Validation Results")
9.2 Decision Trees
We use the rpart
package to grow trees, and the rpart.plot
package to visualize them.
library(rpart)
library(rpart.plot)
The cp
parameter is a complexity parameter that determines the depth of the tree. The smaller the value of cp
, the deeper the tree.
9.2.1 Decision Tree Example
tree <- rpart(price~., data=Train, cp=0.02)
rpart.plot(tree, box.palette="RdBu", shadow.col="gray", nn=TRUE, cex=1, extra=1)
We can use cross-validation to determine the optimal value of cp
.
cp_vals = 10^seq(-3, 3, length = 100)
set.seed(11162020)
AirBnB_Tree <- train(data=Train_sc, price ~ ., method="rpart", trControl=control,
tuneGrid=expand.grid(cp=cp_vals))
AirBnB_Tree$bestTune
## cp
## 13 0.005336699