Ejercicios Día 2


Datos perdidos


0.5 Carga de librerias

library(dplyr)   # se usa para manejo de datos, esta incluida en tidyverse
library(visdat) # visualizacion de datos exploratorios para hacer gráficos a partir de ggplot2
library(naniar) # para tratar datos perdidos
library(simputation) # tiene modelos de imputacion para imputar datos perdidos 

0.6 Carga de datos

# Carga de datos
mis_dataset <- read.csv(url("https://raw.githubusercontent.com/dataoptimal/posts/master/data%20cleaning%20with%20R%20and%20the%20tidyverse/telecom.csv"))
head(mis_dataset)
##   customerID MonthlyCharges TotalCharges    PaymentMethod Churn
## 1 7590-VHVEG          29.85        109.9 Electronic check   yes
## 2 5575-GNVDE          56.95           na     Mailed check   yes
## 3 3668-QPYBK             NA       108.15               --   yes
## 4 7795-CFOCW          42.30      1840.75    Bank transfer    no
## 5 9237-HQITU          70.70         <NA> Electronic check    no
## 6 9305-CDSKC            NaN        820.5               --   yes
## Rows: 10
## Columns: 5
## $ customerID     <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 9237-HQ…
## $ MonthlyCharges <dbl> 29.85, 56.95, NA, 42.30, 70.70, NaN, 89.10, NA, 104.80,…
## $ TotalCharges   <fct> 109.9, na, 108.15, 1840.75, NA, 820.5, 1949.4, N/A, 304…
## $ PaymentMethod  <fct> Electronic check, Mailed check, --, Bank transfer, Elec…
## $ Churn          <fct> yes, yes, yes, no, no, yes, no, yes, no, no
##   customerID MonthlyCharges TotalCharges    PaymentMethod Churn
## 1 7590-VHVEG          29.85       109.90 Electronic check   yes
## 2 5575-GNVDE          56.95           NA     Mailed check   yes
## 3 3668-QPYBK             NA       108.15             <NA>   yes
## 4 7795-CFOCW          42.30      1840.75    Bank transfer    no
## 5 9237-HQITU          70.70           NA Electronic check    no
## 6 9305-CDSKC            NaN       820.50             <NA>   yes
## Rows: 10
## Columns: 5
## $ customerID     <fct> 7590-VHVEG, 5575-GNVDE, 3668-QPYBK, 7795-CFOCW, 9237-HQ…
## $ MonthlyCharges <dbl> 29.85, 56.95, NA, 42.30, 70.70, NaN, 89.10, NA, 104.80,…
## $ TotalCharges   <dbl> 109.90, NA, 108.15, 1840.75, NA, 820.50, 1949.40, NA, 3…
## $ PaymentMethod  <fct> Electronic check, Mailed check, NA, Bank transfer, Elec…
## $ Churn          <fct> yes, yes, yes, no, no, yes, no, yes, no, no

0.7 Representación gráfica de los datos perdidos

0.8 Casos perdidos por variable

0.9 Imputación Simple

##   TotalCharges TotalChargesImpMean TotalChargesImpMedian
## 1       109.90             109.900                109.90
## 2           NA            1175.671                820.50
## 3       108.15             108.150                108.15
## 4      1840.75            1840.750               1840.75
## 5           NA            1175.671                820.50
## 6       820.50             820.500                820.50

0.10 Imputación Multivariada

##   TotalCharges TotalChargesImpMean TotalChargesImpMedian
## 1     109.9000             109.900                109.90
## 2     828.0137            1175.671                820.50
## 3     108.1500             108.150                108.15
## 4    1840.7500            1840.750               1840.75
## 5    1748.1025            1175.671                820.50
## 6     820.5000             820.500                820.50