Chapter 11 Exploratory Data Analyses

Phase I of my “statistics” is usually termed “Data Exploration” or “Exploratory Data Analysis”. The goal of this step is to gain valuable insights through the data so that one can know what is going on with the data, which part needs to be cleaned, what new features can be built, build hypotheses to be tested during the model creation/validation phase, or even just knowing some fun facts about the data (src).

A few of my favorite packages to get a glimpse of the data are

  1. SmartEDA
  2. DataExplorer
  3. summarytools
  4. dataMaid
  5. janitor and here

11.1 Creating Report with DataExplorer

The DataExplorer package allows you to get a preliminary look at your data. It will check for missing data

  create_report(
    df.fa, # the name of your dataframe 
    #y = 'heart_disease',
    output_dir   = 'output', # where do you want it to be saved relative to your project directory
    output_file  = 'data_explorer_fa_report.html', # the filename for the report
    report_title = 'DTI (FA) Data Description' # the Title of your report
  )
SmartEDA::ExpNumStat(tbl.desc, round = 1)
        
       ExpNumStat(
         tbl.desc,
         by = "GA",
         gp = "Group",
         Qnt = c(.1, .9),
         Outlier = TRUE,
         round = 1
       )
       
       ExpNumViz(tbl.desc, target = 'Group')
       
       
       summarytools::dfSummary(
         tbl.desc,
         varnumbers = FALSE,
         round.digits = 2,
         plain.ascii = FALSE,
         style = "grid",
         graph.magnif = .33,
         valid.col = FALSE,
         tmp.img.dir = "img"
       )