Ejercicios Día 2
Datos perdidos
0.5 Carga de librerias
library(dplyr) # se usa para manejo de datos, esta incluida en tidyverse
library(visdat) # visualizacion de datos exploratorios para hacer gráficos a partir de ggplot2
library(naniar) # para tratar datos perdidos
library(simputation) # tiene modelos de imputacion para imputar datos perdidos
0.6 Carga de datos
# Carga de datos
<- read.csv(url("https://raw.githubusercontent.com/dataoptimal/posts/master/data%20cleaning%20with%20R%20and%20the%20tidyverse/telecom.csv"))
mis_dataset head(mis_dataset)
## customerID MonthlyCharges TotalCharges PaymentMethod Churn
## 1 7590-VHVEG 29.85 109.9 Electronic check yes
## 2 5575-GNVDE 56.95 na Mailed check yes
## 3 3668-QPYBK NA 108.15 -- yes
## 4 7795-CFOCW 42.30 1840.75 Bank transfer no
## 5 9237-HQITU 70.70 <NA> Electronic check no
## 6 9305-CDSKC NaN 820.5 -- yes
## Rows: 10
## Columns: 5
## $ customerID <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 9237-HQ…
## $ MonthlyCharges <dbl> 29.85, 56.95, NA, 42.30, 70.70, NaN, 89.10, NA, 104.80,…
## $ TotalCharges <fct> 109.9, na, 108.15, 1840.75, NA, 820.5, 1949.4, N/A, 304…
## $ PaymentMethod <fct> Electronic check, Mailed check, --, Bank transfer, Elec…
## $ Churn <fct> yes, yes, yes, no, no, yes, no, yes, no, no
## customerID MonthlyCharges TotalCharges PaymentMethod Churn
## 1 7590-VHVEG 29.85 109.90 Electronic check yes
## 2 5575-GNVDE 56.95 NA Mailed check yes
## 3 3668-QPYBK NA 108.15 <NA> yes
## 4 7795-CFOCW 42.30 1840.75 Bank transfer no
## 5 9237-HQITU 70.70 NA Electronic check no
## 6 9305-CDSKC NaN 820.50 <NA> yes
## Rows: 10
## Columns: 5
## $ customerID <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 9237-HQ…
## $ MonthlyCharges <dbl> 29.85, 56.95, NA, 42.30, 70.70, NaN, 89.10, NA, 104.80,…
## $ TotalCharges <dbl> 109.90, NA, 108.15, 1840.75, NA, 820.50, 1949.40, NA, 3…
## $ PaymentMethod <fct> Electronic check, Mailed check, NA, Bank transfer, Elec…
## $ Churn <fct> yes, yes, yes, no, no, yes, no, yes, no, no
0.7 Representación gráfica de los datos perdidos
0.8 Casos perdidos por variable
0.9 Imputación Simple
## TotalCharges TotalChargesImpMean TotalChargesImpMedian
## 1 109.90 109.900 109.90
## 2 NA 1175.671 820.50
## 3 108.15 108.150 108.15
## 4 1840.75 1840.750 1840.75
## 5 NA 1175.671 820.50
## 6 820.50 820.500 820.50
0.10 Imputación Multivariada
## TotalCharges TotalChargesImpMean TotalChargesImpMedian
## 1 109.9000 109.900 109.90
## 2 828.0137 1175.671 820.50
## 3 108.1500 108.150 108.15
## 4 1840.7500 1840.750 1840.75
## 5 1748.1025 1175.671 820.50
## 6 820.5000 820.500 820.50