10 Regresión Spline adaptativa multivariante (MARS)
Paquetes de esta sección
if(!require(ISLR)){install.packages("earth")}
if(!require(ISLR)){install.packages("caret")}
if(!require(ISLR)){install.packages("AmesHousing")}
En las clases pasadas hemos revisado extensiones de la regresión linear (nls, regresión polinómica, entre otras).
Existen otras variaciones como la regresión Ridge, LASSO y Elastic NET (algunas se verán en el módulo de Aprendizaje Automático).
10.1 Introducción
En estadística, MARS es una forma de regresión lineal introducida por Jorome Friedan en 1991.
MARS es una técnica de regresión no paramétrica y puede ser vista como una extensión de los modelos lineales que automáticamente no linealidades e interacciones entre variables.
El término MARS está protegido por derechos de autor y pertenece a Salford Systems.
Para evitar violentar esos derechos, las implementaciones abiertas de MARS se suelen llamar Earth (El paquete
earth
en R, por ejemplo).La Regresión Spline adaptativa multivariante (Multivariate adaptive regression spline - MARS)
10.1.1 ¿Por qué usar modelos MARS?
MARS es ideal para usuarios que prefieren obtener resultados similares a la regresión tradicional mientras capturan no linealidades e interacciones necesarias.
MARS revela patrones importantes en los datos que otras técnicas suelen fallar en revelar.
MARS construye su modelo uniendo pedazos de líneas rectas que mantienen su propia pendiente.
Esto permite que se detecte cualquier patrón en los datos.
Se puede utilizar para cuando se tiene variables de respuesta cuantitativa y cualitativa.
MARS realiza (todo automático y con gran velocidad):
- selección de variables.
- transformación de variables.
- detección de interacciones.
- testeo
Áreas donde ha mostrado ser una técnica exitosa
- Predicción de demanda de electricidad de companías generadoras.
- Relacionar puntajes de satisfacción del cliente con las especificaciones técnicas del producto.
- Modelización en sistemas de información geográfica.
- MARS es una técnica de regresión muy versátil y es una herramienta necesaria en nuestra caja de herramientas en Analítica de Datos.
10.2 Ejemplo 1
Cargamos los datos:
library(earth)
load("~/Documents/Consultorias&Cursos/DataLectures/banckfull.RData")
Construimos el modelo basado en los datos:
<- earth(y~age+job+marital+education+default+balance+housing+
mars +contact+day+month+duration+campaign+pdays+previous+poutcome,
loandata=bankfull,pmethod="backward",nprune=20, nfold=10)
Notemos los argumentos usando en la función:
pmethod
: Es el método para podar las variables regresoras. Las opciones sonbackward
,forward
,cv
(se necesita especificarnfold
), yexhaustive
.nprune
: Numero máximo de funciones base que se usan.
En resumen, para plantear el modelo, necesitamos 3 elementos:
- Definir el modelo (como en cualquier regresión)
- Definir el método de testeo (
pmethod
) - Número de funciones base (
nprune
) y de interacciones (degree
)
Veamos el resumen:
summary(mars,digit=3)
## Call: earth(formula=y~age+job+marital+education+default+balance+housin...),
## data=bankfull, pmethod="backward", nprune=20, nfold=10)
##
## coefficients
## (Intercept) 0.7775
## housingyes -0.0408
## loanyes -0.0294
## contactunknown -0.0713
## monthdec 0.1876
## monthjun 0.0519
## monthmar 0.3301
## monthoct 0.1916
## monthsep 0.1789
## poutcomesuccess 0.3809
## h(age-27) 0.0072
## h(54-age) 0.0087
## h(duration-375) 0.0003
## h(1080-duration) -0.0004
## h(duration-1080) -0.0004
## h(2-campaign) 0.0268
## h(pdays-53) -0.0020
## h(349-pdays) -0.0016
## h(pdays-349) 0.0061
## h(pdays-425) -0.0044
##
## Selected 20 of 22 terms, and 13 of 42 predictors (nprune=20)
## Termination condition: RSq changed by less than 0.001 at 22 terms
## Importance: duration, poutcomesuccess, monthmar, housingyes, monthoct, ...
## Number of terms at each degree of interaction: 1 19 (additive model)
## GCV 0.0707 RSS 3192 GRSq 0.315 RSq 0.316 CVRSq 0.314
##
## Note: the cross-validation sd's below are standard deviations across folds
##
## Cross validation: nterms 22.10 sd 1.37 nvars 13.90 sd 1.60
##
## CVRSq sd ClassRate sd MaxErr sd
## 0.314 0.015 0.901 0.004 -1.32 1.16
El gráfico de resultado:
plotd(mars)
El GCV (generalized cross validation) es
\[ GCV = \frac{RSS}{N\times (1-Num.Par.Efectivos/N)^2} \]
donde RSS es la suma de cuadrados de los residuos medidos en los datos de entrenamiento y N es el número de observaciones.
\[ Num.Par.Efectivos = NumeroTerminosMARS + Penalidad\times (NumeroTerminosMARS-1)/2 \]
La penalidad es alrededor de 2 o 3, pero se puede elegir la penalidad.
10.2.1 Output
El objeto de resultado es un earth.object
que contiene mucha información (ver help(earth.object
).
str(mars)
## List of 39
## $ rss : num 3192
## $ rsq : num 0.316
## $ gcv : num 0.0707
## $ grsq : num 0.315
## $ bx : num [1:45211, 1:20] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:20] "(Intercept)" "h(duration-1080)" "h(1080-duration)" "poutcomesuccess" ...
## $ dirs : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:22] "(Intercept)" "h(duration-1080)" "h(1080-duration)" "poutcomesuccess" ...
## .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## $ cuts : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 54 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:22] "(Intercept)" "h(duration-1080)" "h(1080-duration)" "poutcomesuccess" ...
## .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## $ selected.terms : num [1:20] 1 2 3 4 5 6 7 8 9 11 ...
## $ prune.terms : num [1:22, 1:22] 1 1 1 1 1 1 1 1 1 1 ...
## $ fitted.values : num [1:45211, 1] 0.0261 -0.0314 -0.074 -0.0597 0.0452 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr "yes"
## $ residuals : num [1:45211, 1] -0.0261 0.0314 0.074 0.0597 -0.0452 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr "yes"
## $ coefficients : num [1:20, 1] 0.777457 -0.000382 -0.000402 0.380944 0.330111 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:20] "(Intercept)" "h(duration-1080)" "h(1080-duration)" "poutcomesuccess" ...
## .. ..$ : chr "yes"
## $ rss.per.response : num 3192
## $ rsq.per.response : num 0.316
## $ gcv.per.response : num 0.0707
## $ grsq.per.response : num 0.315
## $ rss.per.subset : num [1:22] 4670 3880 3497 3433 3378 ...
## $ gcv.per.subset : num [1:22] 0.1033 0.0858 0.0774 0.076 0.0747 ...
## $ leverages : num [1:45211] 0.000243 0.000165 0.000299 0.000194 0.00025 ...
## $ pmethod : chr "backward"
## $ nprune : num 20
## $ penalty : num 2
## $ nk : num 85
## $ thresh : num 0.001
## $ termcond : int 4
## $ weights : NULL
## $ call : language earth(formula = y ~ age + job + marital + education + default + balance + housing + loan + contact + day + m| __truncated__ ...
## $ namesx : chr [1:16] "age" "job" "marital" "education" ...
## $ modvars : num [1:16, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:16] "age" "job" "marital" "education" ...
## .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## $ terms :Classes 'terms', 'formula' language y ~ age + job + marital + education + default + balance + housing + loan + contact + day + month + duration | __truncated__
## .. ..- attr(*, "variables")= language list(y, age, job, marital, education, default, balance, housing, loan, contact, day, month, duration, campai| __truncated__
## .. ..- attr(*, "factors")= int [1:17, 1:16] 0 1 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:17] "y" "age" "job" "marital" ...
## .. .. .. ..$ : chr [1:16] "age" "job" "marital" "education" ...
## .. ..- attr(*, "term.labels")= chr [1:16] "age" "job" "marital" "education" ...
## .. ..- attr(*, "order")= int [1:16] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(y, age, job, marital, education, default, balance, housing, loan, contact, day, month, duration, campai| __truncated__
## .. ..- attr(*, "dataClasses")= Named chr [1:17] "factor" "numeric" "factor" "factor" ...
## .. .. ..- attr(*, "names")= chr [1:17] "y" "age" "job" "marital" ...
## $ xlevels :List of 9
## ..$ job : chr [1:12] "admin." "blue-collar" "entrepreneur" "housemaid" ...
## ..$ marital : chr [1:3] "divorced" "married" "single"
## ..$ education: chr [1:4] "primary" "secondary" "tertiary" "unknown"
## ..$ default : chr [1:2] "no" "yes"
## ..$ housing : chr [1:2] "no" "yes"
## ..$ loan : chr [1:2] "no" "yes"
## ..$ contact : chr [1:3] "cellular" "telephone" "unknown"
## ..$ month : chr [1:12] "apr" "aug" "dec" "feb" ...
## ..$ poutcome : chr [1:4] "failure" "other" "success" "unknown"
## $ levels : chr [1:2] "no" "yes"
## $ cv.list :List of 10
## ..$ fold1 :List of 29
## .. ..$ rss : num 2863
## .. ..$ rsq : num 0.319
## .. ..$ gcv : num 0.0705
## .. ..$ grsq : num 0.317
## .. ..$ dirs : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1064)" "h(1064-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 55 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1064)" "h(1064-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:21] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40670, 1] 0.0185 -0.0314 -0.0793 -0.0578 0.0394 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:21, 1] 0.416972 -0.000379 -0.000395 0.382794 -0.040878 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:21] "(Intercept)" "h(duration-1064)" "h(1064-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2863
## .. ..$ rsq.per.response : num 0.319
## .. ..$ gcv.per.response : num 0.0705
## .. ..$ grsq.per.response: num 0.317
## .. ..$ rss.per.subset : num [1:22] 4203 3485 3138 3082 3035 ...
## .. ..$ gcv.per.subset : num [1:22] 0.1033 0.0857 0.0772 0.0758 0.0747 ...
## .. ..$ leverages : num [1:40670] 0.000271 0.000181 0.00034 0.000214 0.000286 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 1
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold2 :List of 29
## .. ..$ rss : num 2868
## .. ..$ rsq : num 0.318
## .. ..$ gcv : num 0.0706
## .. ..$ grsq : num 0.316
## .. ..$ dirs : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1123)" "h(1123-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 52 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1123)" "h(1123-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:22] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40709, 1] 0.0437 -0.044 -0.0581 -0.0548 -0.0334 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:22, 1] 0.362388 -0.00038 -0.000397 0.400803 -0.042748 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1123)" "h(1123-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2868
## .. ..$ rsq.per.response : num 0.318
## .. ..$ gcv.per.response : num 0.0706
## .. ..$ grsq.per.response: num 0.316
## .. ..$ rss.per.subset : num [1:23] 4203 3490 3132 3074 3025 ...
## .. ..$ gcv.per.subset : num [1:23] 0.1033 0.0857 0.0769 0.0755 0.0743 ...
## .. ..$ leverages : num [1:40709] 0.000337 0.000192 0.000225 0.000243 0.000182 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 2
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold3 :List of 29
## .. ..$ rss : num 2870
## .. ..$ rsq : num 0.317
## .. ..$ gcv : num 0.0707
## .. ..$ grsq : num 0.316
## .. ..$ dirs : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1076)" "h(1076-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 53 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1076)" "h(1076-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:22] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40685, 1] -0.0529 0.0272 -0.035 0.0527 -0.065 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:22, 1] 0.368647 -0.000394 -0.000393 0.380727 0.347012 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1076)" "h(1076-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2870
## .. ..$ rsq.per.response : num 0.317
## .. ..$ gcv.per.response : num 0.0707
## .. ..$ grsq.per.response: num 0.316
## .. ..$ rss.per.subset : num [1:23] 4203 3491 3153 3090 3041 ...
## .. ..$ gcv.per.subset : num [1:23] 0.1033 0.0858 0.0775 0.076 0.0748 ...
## .. ..$ leverages : num [1:40685] 0.000239 0.000293 0.000182 0.000258 0.000316 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 3
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold4 :List of 29
## .. ..$ rss : num 2857
## .. ..$ rsq : num 0.32
## .. ..$ gcv : num 0.0704
## .. ..$ grsq : num 0.319
## .. ..$ dirs : num [1:27, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:27] "(Intercept)" "h(duration-1063)" "h(1063-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:27, 1:42] 0 0 0 0 0 0 0 0 0 54 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:27] "(Intercept)" "h(duration-1063)" "h(1063-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:25] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40670, 1] 0.0229 -0.0379 -0.0549 -0.0655 0.044 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:25, 1] 1.012913 -0.000343 -0.000406 0.369763 -0.049337 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:25] "(Intercept)" "h(duration-1063)" "h(1063-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2857
## .. ..$ rsq.per.response : num 0.32
## .. ..$ gcv.per.response : num 0.0704
## .. ..$ grsq.per.response: num 0.319
## .. ..$ rss.per.subset : num [1:27] 4203 3496 3158 3100 3051 ...
## .. ..$ gcv.per.subset : num [1:27] 0.1033 0.086 0.0777 0.0763 0.075 ...
## .. ..$ leverages : num [1:40670] 0.000279 0.000192 0.000228 0.000228 0.000305 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 4
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold5 :List of 29
## .. ..$ rss : num 2855
## .. ..$ rsq : num 0.321
## .. ..$ gcv : num 0.0703
## .. ..$ grsq : num 0.319
## .. ..$ dirs : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1119)" "h(1119-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 55 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1119)" "h(1119-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:22] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40722, 1] 0.0246 -0.0309 -0.0817 -0.0568 0.0356 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:22, 1] 0.828384 -0.000402 -0.000398 0.378355 -0.039122 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1119)" "h(1119-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2855
## .. ..$ rsq.per.response : num 0.321
## .. ..$ gcv.per.response : num 0.0703
## .. ..$ grsq.per.response: num 0.319
## .. ..$ rss.per.subset : num [1:23] 4204 3486 3133 3077 3030 ...
## .. ..$ gcv.per.subset : num [1:23] 0.1032 0.0856 0.077 0.0756 0.0744 ...
## .. ..$ leverages : num [1:40722] 0.000274 0.000181 0.000349 0.000213 0.000298 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 5
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold6 :List of 29
## .. ..$ rss : num 2866
## .. ..$ rsq : num 0.318
## .. ..$ gcv : num 0.0705
## .. ..$ grsq : num 0.317
## .. ..$ dirs : num [1:25, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:25] "(Intercept)" "h(duration-1080)" "h(1080-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:25, 1:42] 0 0 0 0 0 0 0 0 0 53 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:25] "(Intercept)" "h(duration-1080)" "h(1080-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:24] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40730, 1] 0.0455 -0.0401 -0.0562 -0.0481 0.0286 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:24, 1] 0.492045 -0.000384 -0.000388 0.377399 0.32602 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:24] "(Intercept)" "h(duration-1080)" "h(1080-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2866
## .. ..$ rsq.per.response : num 0.318
## .. ..$ gcv.per.response : num 0.0705
## .. ..$ grsq.per.response: num 0.317
## .. ..$ rss.per.subset : num [1:25] 4204 3505 3159 3096 3049 ...
## .. ..$ gcv.per.subset : num [1:25] 0.1032 0.0861 0.0776 0.076 0.0749 ...
## .. ..$ leverages : num [1:40730] 0.000335 0.000192 0.000225 0.000243 0.000294 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 6
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold7 :List of 29
## .. ..$ rss : num 2870
## .. ..$ rsq : num 0.317
## .. ..$ gcv : num 0.0707
## .. ..$ grsq : num 0.316
## .. ..$ dirs : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1130)" "h(1130-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 54 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1130)" "h(1130-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:21] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40663, 1] 0.0371 -0.0437 -0.0607 -0.0548 0.0282 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:21, 1] 0.43009 -0.00039 -0.0004 0.37795 0.33266 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:21] "(Intercept)" "h(duration-1130)" "h(1130-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2870
## .. ..$ rsq.per.response : num 0.317
## .. ..$ gcv.per.response : num 0.0707
## .. ..$ grsq.per.response: num 0.316
## .. ..$ rss.per.subset : num [1:23] 4203 3493 3154 3095 3044 ...
## .. ..$ gcv.per.subset : num [1:23] 0.1034 0.0859 0.0776 0.0761 0.0749 ...
## .. ..$ leverages : num [1:40663] 0.000304 0.000187 0.000221 0.000225 0.000292 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 7
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold8 :List of 29
## .. ..$ rss : num 2876
## .. ..$ rsq : num 0.316
## .. ..$ gcv : num 0.0708
## .. ..$ grsq : num 0.314
## .. ..$ dirs : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1123)" "h(1123-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:23, 1:42] 0 0 0 0 0 0 0 0 0 54 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:23] "(Intercept)" "h(duration-1123)" "h(1123-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:22] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40696, 1] 0.0391 -0.0427 -0.0573 0.0323 -0.0322 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:22, 1] 0.966117 -0.000395 -0.000399 0.375818 0.328871 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1123)" "h(1123-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2876
## .. ..$ rsq.per.response : num 0.316
## .. ..$ gcv.per.response : num 0.0708
## .. ..$ grsq.per.response: num 0.314
## .. ..$ rss.per.subset : num [1:23] 4203 3488 3150 3092 3041 ...
## .. ..$ gcv.per.subset : num [1:23] 0.1033 0.0857 0.0774 0.076 0.0747 ...
## .. ..$ leverages : num [1:40696] 0.000333 0.000189 0.000219 0.000288 0.000179 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 8
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold9 :List of 29
## .. ..$ rss : num 2861
## .. ..$ rsq : num 0.319
## .. ..$ gcv : num 0.0705
## .. ..$ grsq : num 0.318
## .. ..$ dirs : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1091)" "h(1091-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 53 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1091)" "h(1091-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:21] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40681, 1] 0.0287 -0.0326 -0.079 -0.0598 0.0367 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:21, 1] 0.411886 -0.00039 -0.000403 0.383504 0.32747 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:21] "(Intercept)" "h(duration-1091)" "h(1091-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2861
## .. ..$ rsq.per.response : num 0.319
## .. ..$ gcv.per.response : num 0.0705
## .. ..$ grsq.per.response: num 0.318
## .. ..$ rss.per.subset : num [1:22] 4203 3483 3133 3076 3029 ...
## .. ..$ gcv.per.subset : num [1:22] 0.1033 0.0856 0.077 0.0756 0.0745 ...
## .. ..$ leverages : num [1:40681] 0.000272 0.000184 0.00034 0.00022 0.000286 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 9
## .. ..- attr(*, "class")= chr "earth"
## ..$ fold10:List of 29
## .. ..$ rss : num 2879
## .. ..$ rsq : num 0.315
## .. ..$ gcv : num 0.0709
## .. ..$ grsq : num 0.314
## .. ..$ dirs : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 1 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1070)" "h(1070-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ cuts : num [1:22, 1:42] 0 0 0 0 0 0 0 0 0 54 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:22] "(Intercept)" "h(duration-1070)" "h(1070-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ selected.terms : num [1:21] 1 2 3 4 5 6 7 8 9 10 ...
## .. ..$ fitted.values : num [1:40673, 1] 0.028 -0.0304 -0.0765 -0.0585 0.0415 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : chr "yes"
## .. ..$ coefficients : num [1:21, 1] 0.959857 -0.000361 -0.000413 0.375263 0.327172 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:21] "(Intercept)" "h(duration-1070)" "h(1070-duration)" "poutcomesuccess" ...
## .. .. .. ..$ : chr "yes"
## .. ..$ rss.per.response : num 2879
## .. ..$ rsq.per.response : num 0.315
## .. ..$ gcv.per.response : num 0.0709
## .. ..$ grsq.per.response: num 0.314
## .. ..$ rss.per.subset : num [1:22] 4204 3508 3166 3107 3056 ...
## .. ..$ gcv.per.subset : num [1:22] 0.1034 0.0863 0.0779 0.0764 0.0752 ...
## .. ..$ leverages : num [1:40673] 0.000269 0.000184 0.000334 0.000217 0.000281 ...
## .. ..$ pmethod : chr "backward"
## .. ..$ nprune : NULL
## .. ..$ penalty : num 2
## .. ..$ nk : num 85
## .. ..$ thresh : num 0.001
## .. ..$ termcond : int 4
## .. ..$ weights : NULL
## .. ..$ call : language earth(x = infold.x, y = infold.y, weights = infold.weights, wp = wp, subset = subset, pmethod = if (pmethod | __truncated__ ...
## .. ..$ namesx : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ modvars : num [1:42, 1:42] 1 0 0 0 0 0 0 0 0 0 ...
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. .. .. ..$ : chr [1:42] "age" "jobblue-collar" "jobentrepreneur" "jobhousemaid" ...
## .. ..$ levels : num [1:2] 0 1
## .. ..$ icross : int 1
## .. ..$ ifold : int 10
## .. ..- attr(*, "class")= chr "earth"
## $ cv.nterms.selected.by.gcv: Named num [1:11] 21 22 22 25 22 24 21 22 21 21 ...
## ..- attr(*, "names")= chr [1:11] "fold1" "fold2" "fold3" "fold4" ...
## $ cv.nvars.selected.by.gcv : Named num [1:11] 13 13 13 18 13 15 13 14 13 14 ...
## ..- attr(*, "names")= chr [1:11] "fold1" "fold2" "fold3" "fold4" ...
## $ cv.groups : int [1:45211, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "cross" "fold"
## $ cv.rsq.tab : num [1:11, 1:2] 0.302 0.305 0.314 0.334 0.294 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:11] "fold1" "fold2" "fold3" "fold4" ...
## .. ..$ : chr [1:2] "yes" "mean"
## $ cv.maxerr.tab : num [1:11, 1:2] -1.15 -1.32 -1.1 1.16 -1.08 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : chr [1:11] "fold1" "fold2" "fold3" "fold4" ...
## .. ..$ : chr [1:2] "yes" "max"
## $ cv.class.rate.tab : num [1:11, 1:2] 0.902 0.896 0.904 0.908 0.896 ...
## ..- attr(*, "dimnames")=List of 2
## .. ..$ : NULL
## .. ..$ : chr [1:2] "yes" "mean"
## - attr(*, "class")= chr "earth"
De todos este conjunto, vamos a destacar 3 elementos
- Importancia de las variables
- Funciones base (modelo resultado)
- Curvas y superficie (contribución)
Importancia de las variables
library(caret)
varImp( mars )
## Overall
## duration 100.000000
## poutcomesuccess 68.109084
## monthmar 45.171762
## housingyes 40.087272
## monthoct 35.114270
## contactunknown 31.401977
## monthsep 27.823303
## age 24.185852
## monthjun 21.090675
## pdays 16.010587
## monthdec 14.461722
## campaign 12.631608
## loanyes 5.779968
Funciones Base
$coefficients mars
## yes
## (Intercept) 0.7774569240
## h(duration-1080) -0.0003818948
## h(1080-duration) -0.0004020631
## poutcomesuccess 0.3809444003
## monthmar 0.3301108826
## housingyes -0.0407997273
## monthoct 0.1916481210
## contactunknown -0.0712999709
## monthsep 0.1788583816
## h(54-age) 0.0087017318
## h(duration-375) 0.0003026388
## monthjun 0.0518693660
## h(2-campaign) 0.0268377535
## monthdec 0.1876019796
## h(pdays-349) 0.0061454449
## h(349-pdays) -0.0015968138
## h(age-27) 0.0071639964
## h(pdays-53) -0.0020353430
## h(pdays-425) -0.0043865936
## loanyes -0.0293712807
Curvas y superficie
plotmo( mars, all1 = T )
## plotmo grid: age job marital education default balance housing loan
## 39 blue-collar married secondary no 448 yes no
## contact day month duration campaign pdays previous poutcome
## cellular 16 may 180 2 -1 0 unknown
plot(mars$fitted.values~bankfull$duration)
10.2.2 Tu turno
Sobre los datos ames_train
ajusta un modelo MARS que tenga como variable dependiente al precio de venta Sale_Price
.
library(rsample)
# Create training (70%) and test (30%) sets for the AmesHousing::make_ames() data.
# Use set.seed for reproducibility
set.seed(123)
<- initial_split(AmesHousing::make_ames(), prop = .7, strata = "Sale_Price")
ames_split <- training(ames_split)
ames_train <- testing(ames_split) ames_test