Chapter 9 Advanced Regression and Nonparametric Approaches

9.1 Ridge and Lasso Regression

The lm.ridge() and command, in the MASS package, can be used to fit a ridge regression model.

We first need to standardize each quantitative variable. This is done using the scale() command in R.

Train_sc <- Train %>% mutate_if(is.numeric, scale)

9.1.1 Fitting a Ridge Regression Model

We can perform ridge regression using the lm.ridge() command in the MASS package.

library(MASS)
M_Ridge1 <- lm.ridge(data=Train_sc, price~., lambda = 1)
head(M_Ridge1$coef) ## id property_typeCondominium property_typeHouse ## 0.036320094 -0.013032610 -0.006899487 ## property_typeTownhouse property_typeOther room_typePrivate room ## -0.018020782 0.058907892 -0.288091484 9.1.2 Cross Validation with Ridge We perform cross-validation to determine the optimal value of $$\lambda$$. The command 10^seq(-3, 3, length = 100) defines values between 0 and 1000 for $$\lambda$$, which will be tested in cross-validation. This requires the glmnet package. library(glmnet) control = trainControl("repeatedcv", number = 5, repeats=5) l_vals = 10^seq(-3, 3, length = 100) set.seed(11162020) AirBnB_ridge <- train(price ~., data = Train_sc, method = "glmnet", trControl=control , tuneGrid=expand.grid(alpha=0, lambda=l_vals)) Identify the optimal $$\lambda$$. AirBnB_ridge$bestTune$lambda ## [1] 0.4641589 Plot of RMSPE for each value of $$\lambda$$. lambda <- AirBnB_ridge$results$lambda RMSPE <- AirBnB_ridge$results$RMSE ggplot(data=data.frame(lambda, RMSPE), aes(x=lambda, y=RMSPE))+ geom_line() + xlim(c(0,2)) + ylim(c(0.75, 0.82)) + ggtitle("Ridge Regression Cross Validation Results") 9.1.3 Cross-validation with Lasso Regression For lasso regression, set alpha=1. control = trainControl("repeatedcv", number = 5, repeats=5) l_vals = 10^seq(-3, 3, length = 100) set.seed(11162020) AirBnB_lasso <- train(price ~., data = Train_sc, method = "glmnet", trControl=control , tuneGrid=expand.grid(alpha=1, lambda=l_vals)) Identify the optimal $$\lambda$$. AirBnB_lasso$bestTune$lambda ## [1] 0.04977024 Plot of RMSPE for each value of $$\lambda$$. lambda <- AirBnB_lasso$results$lambda RMSPE <- AirBnB_lasso$results$RMSE ggplot(data=data.frame(lambda, RMSPE), aes(x=lambda, y=RMSPE))+geom_line() + xlim(c(0,0.2)) + ylim(c(0.75, 0.82)) + ggtitle("Lasso Regression Cross Validation Results") 9.2 Decision Trees We use the rpart package to grow trees, and the rpart.plot package to visualize them. library(rpart) library(rpart.plot) The cp parameter is a complexity parameter that determines the depth of the tree. The smaller the value of cp, the deeper the tree. 9.2.1 Decision Tree Example tree <- rpart(price~., data=Train, cp=0.02) rpart.plot(tree, box.palette="RdBu", shadow.col="gray", nn=TRUE, cex=1, extra=1) We can use cross-validation to determine the optimal value of cp. cp_vals = 10^seq(-3, 3, length = 100) set.seed(11162020) AirBnB_Tree <- train(data=Train_sc, price ~ ., method="rpart", trControl=control, tuneGrid=expand.grid(cp=cp_vals)) AirBnB_Tree$bestTune
##             cp
## 13 0.005336699