• 1 NOTE: This version of the book is no longer updated, and will be taken down in the next month or so. The new version may be found at this link
  • Welcome to IDEAR
    • 1.1 The State of the Book
    • 1.2 Book Outline
    • 1.3 Other Sources
  • 2 Introduction to R
    • 2.1 Why Does This Book Exist?
    • 2.2 What is R?
    • 2.3 What is coding?
    • 2.4 Conventions of the book
    • 2.5 Things You’ll Need
    • 2.6 Introduction to RStudio
    • 2.7 Your First Program
    • 2.8 The iris Dataset
    • 2.9 Graphing with R
    • 2.10 Exercises
      • 2.10.1 Calculate the following:
  • 3 Visualizing Your Data
    • 3.1 What is a Visualization?
    • 3.2 The Tidyverse Package
    • 3.3 ggplot2
      • 3.3.1 Functions in ggplot
      • 3.3.2 Changing Aesthetics
      • 3.3.3 Facetting
    • 3.4 Diamonds
      • 3.4.1 Visualizing Large Datasets
      • 3.4.2 Axis Transformations
    • 3.5 Other Popular Geoms
      • 3.5.1 Histograms
      • 3.5.2 Bar Charts
      • 3.5.3 Jittered Points
      • 3.5.4 Boxplot
    • 3.6 Designing Good Graphics
    • 3.7 Saving Your Graphics
    • 3.8 More Resources
    • 3.9 Exercises
      • 3.9.1 Graph the following:
      • 3.9.2 Use a new dataset:
      • 3.9.3 Looking ahead:
  • 4 R Functions and Workflow
    • 4.1 Workflow
      • 4.1.1 Scripts
      • 4.1.2 Notebooks
    • 4.2 Memory, Objects, and Names
    • 4.3 Dataframes
    • 4.4 Oddballs
    • 4.5 R Studio Tips and Tricks
    • 4.6 R Functions and Workflow Exercises
      • 4.6.1 Do the following:
  • 5 Data Wrangling
    • 5.1 Thinking with Data
    • 5.2 The Data Analytics Model
    • 5.3 Wrangle
      • 5.3.1 Tidy Data
      • 5.3.2 Tidying Data
      • 5.3.3 Separating Values
    • 5.4 The Pipe
    • 5.5 Data Transformations
      • 5.5.1 Mutate
      • 5.5.2 Tibbles
      • 5.5.3 Subsetting Data
      • 5.5.4 Filtering with the Tidyverse
      • 5.5.5 Working with Groups
    • 5.6 Missing Values
      • 5.6.1 Explicit Missing Values
      • 5.6.2 Implicit Missing Values
    • 5.7 Count Data
      • 5.7.1 Work with other datasets:
  • 6 Introduction to Data Analysis
    • 6.1 Exploratory Data Analysis
      • 6.1.1 Sidenote
      • 6.1.2 The EDA Framework
    • 6.2 gapminder
    • 6.3 Summarizing Data
      • 6.3.1 Sidenote
    • 6.4 Visualizing Data
    • 6.5 Analyzing Patterns
    • 6.6 Exercises
  • 7 Modeling Data
    • 7.1 Why Model?
    • 7.2 Linear Models
    • 7.3 Model Predictions
    • 7.4 Classification
    • 7.5 Logistic Models
    • 7.6 Evaluating and Comparing Models
      • 7.6.1 Confusion Matrices
    • 7.7 Conclusion
    • 7.8 Exercises
  • 8 Achieving Graphical Excellence
    • 8.1 Introduction
    • 8.2 Getting Started
    • 8.3 Themes
    • 8.4 Colors
      • 8.4.1 Viridis
      • 8.4.2 Color Brewer
      • 8.4.3 Other Packages
      • 8.4.4 Making Your Own
    • 8.5 Labels
    • 8.6 Animation
    • 8.7 Specialized Visualizations
      • 8.7.1 Stacked Area Plots
      • 8.7.2 ggridges
      • 8.7.3 Maps
      • 8.7.4 Circular Charts
    • 8.8 Rearranging Groups
    • 8.9 Further Reading
  • 9 Functions and Scripting
    • 9.1 Writing Functions
      • 9.1.1 Our First Function
      • 9.1.2 Returns
      • 9.1.3 More Complicated Functions
    • 9.2 About Names…
    • 9.3 Conditional Statements
    • 9.4 Stops
    • 9.5 Function Dependencies
    • 9.6 Saving and Loading Functions
    • 9.7 Loops
    • 9.8 Mapping Functions
    • 9.9 More Information
    • 9.10 Exercises
  • 10 More Complicated Analyses
    • 10.1 Other Datasets
      • 10.1.1 Importing Your Own Data
      • 10.1.2 Exporting Data
      • 10.1.3 Data Exploration
      • 10.1.4 Modeling Winners
    • 10.2 Logistic Models
    • 10.3 Modelling Metrics
      • 10.3.1 Pseudo-R2
      • 10.3.2 Area Under the ROC Curve (AUC)
      • 10.3.3 Model Comparisons
    • 10.4 More Complicated Analyses
      • 10.4.1 Model Selection
    • 10.5 Relational Data
      • 10.5.1 Inner Join
      • 10.5.2 Left Join
      • 10.5.3 Right Join
      • 10.5.4 Full Join
      • 10.5.5 Semi Join
      • 10.5.6 Anti Join
      • 10.5.7 Specifying Key Columns
      • 10.5.8 Merging Multiple Dataframes
      • 10.5.9 Binding Dataframes
    • 10.6 Exercises
  • 11 Playing Nicely With Others
    • 11.1 R Markdown
      • 11.1.1 Kable
    • 11.2 LaTex
    • 11.3 Git(Hub)
      • 11.3.1 My First Repository
    • 11.4 Commenting Code
    • 11.5 Further Reading
  • 12 Working with Text
    • 12.1 Working with Stringr
    • 12.2 Regular Expressions
    • 12.3 Case Study
    • 12.4 Further Reading
    • 12.5 Exercises
  • 13 Working with Dates and Times
    • 13.1 Dates in R
    • 13.2 Converting To Dates
    • 13.3 Extracting From Dates
    • 13.4 Math with Dates
    • 13.5 Time Zones
  • 14 What Next
    • 14.1 Machine Learning Methods
    • 14.2 Leaflet Maps
    • 14.3 FlexDashboard
    • 14.4 Bookdown
    • 14.5 Blogdown
    • 14.6 Shiny
  • 15 Basic Statistics (Using R)
    • 15.1 Purpose of the Unit
    • 15.2 Definitions
      • 15.2.1 Data Concepts
      • 15.2.2 Statistical Terms
      • 15.2.3 Models and Tests
      • 15.2.4 How We’ll Compare Models
    • 15.3 Exercises
  • 16 Other Resources
    • Infographics
    • Courses
    • Textbooks
    • Blog Links
    • Data Sources
    • Graphing Aids
  • 17 Frequently Asked Questions
    • 17.1 Why R?
    • 17.2 Why is my code broken?
    • 17.3 Why is (X package) named that?
  • 18 Changelog
    • 18.1 Version 1.1.0
    • 18.2 Version 1.0.1
    • 18.3 Version 1.0.0

Introduction to Data Exploration and Analysis with R

Introduction to Data Exploration and Analysis with R

Michael Mahoney

2019-06-27

1 NOTE: This version of the book is no longer updated, and will be taken down in the next month or so. The new version may be found at this link