Chapter 9 Advanced Regression and Nonparametric Approaches
9.1 Ridge and Lasso Regression
The lm.ridge()
and command, in the MASS
package, can be used to fit a ridge regression model.
We first need to standardize each quantitative variable. This is done using the scale()
command in R.
<- Train %>% mutate_if(is.numeric, scale) Train_sc
9.1.1 Fitting a Ridge Regression Model
We can perform ridge regression using the lm.ridge()
command in the MASS
package.
library(MASS)
<- lm.ridge(data=Train_sc, price~., lambda = 1)
M_Ridge1 head(M_Ridge1$coef)
## id property_typeCondominium property_typeHouse
## 0.036320094 -0.013032610 -0.006899487
## property_typeTownhouse property_typeOther room_typePrivate room
## -0.018020782 0.058907892 -0.288091484
9.1.2 Cross Validation with Ridge
We perform cross-validation to determine the optimal value of \(\lambda\). The command 10^seq(-3, 3, length = 100)
defines values between 0 and 1000 for \(\lambda\), which will be tested in cross-validation.
This requires the glmnet
package.
library(glmnet)
= trainControl("repeatedcv", number = 5, repeats=5)
control = 10^seq(-3, 3, length = 100)
l_vals
set.seed(11162020)
<- train(price ~., data = Train_sc, method = "glmnet", trControl=control ,
AirBnB_ridge tuneGrid=expand.grid(alpha=0, lambda=l_vals))
Identify the optimal \(\lambda\).
$bestTune$lambda AirBnB_ridge
## [1] 0.4641589
Plot of RMSPE for each value of \(\lambda\).
<- AirBnB_ridge$results$lambda
lambda <- AirBnB_ridge$results$RMSE
RMSPE ggplot(data=data.frame(lambda, RMSPE), aes(x=lambda, y=RMSPE))+
geom_line() + xlim(c(0,2)) + ylim(c(0.75, 0.82)) +
ggtitle("Ridge Regression Cross Validation Results")
9.1.3 Cross-validation with Lasso Regression
For lasso regression, set alpha=1
.
= trainControl("repeatedcv", number = 5, repeats=5)
control = 10^seq(-3, 3, length = 100)
l_vals
set.seed(11162020)
<- train(price ~., data = Train_sc, method = "glmnet", trControl=control ,
AirBnB_lasso tuneGrid=expand.grid(alpha=1, lambda=l_vals))
Identify the optimal \(\lambda\).
$bestTune$lambda AirBnB_lasso
## [1] 0.04977024
Plot of RMSPE for each value of \(\lambda\).
<- AirBnB_lasso$results$lambda
lambda <- AirBnB_lasso$results$RMSE
RMSPE ggplot(data=data.frame(lambda, RMSPE), aes(x=lambda, y=RMSPE))+geom_line() +
xlim(c(0,0.2)) + ylim(c(0.75, 0.82)) +
ggtitle("Lasso Regression Cross Validation Results")
9.2 Decision Trees
We use the rpart
package to grow trees, and the rpart.plot
package to visualize them.
library(rpart)
library(rpart.plot)
The cp
parameter is a complexity parameter that determines the depth of the tree. The smaller the value of cp
, the deeper the tree.
9.2.1 Decision Tree Example
<- rpart(price~., data=Train, cp=0.02)
tree rpart.plot(tree, box.palette="RdBu", shadow.col="gray", nn=TRUE, cex=1, extra=1)
We can use cross-validation to determine the optimal value of cp
.
= 10^seq(-3, 3, length = 100)
cp_vals set.seed(11162020)
<- train(data=Train_sc, price ~ ., method="rpart", trControl=control,
AirBnB_Tree tuneGrid=expand.grid(cp=cp_vals))
$bestTune AirBnB_Tree
## cp
## 13 0.005336699