Loading [MathJax]/jax/output/CommonHTML/jax.js
  • A Minimal Book Example
  • About
  • 1 Introduction to R
    • 1.1 What We Talk About When We Talk About R
      • 1.1.1 Jupyter Notebooks & Google Colab
      • 1.1.2 Deepnote
      • 1.1.3 RStudio
      • 1.1.4 RStudio Cloud
    • 1.2 R Packages
      • 1.2.1 Tidyverse
    • 1.3 Reading This Book
      • 1.3.1 Code Examples
      • 1.3.2 Function Syntax
      • 1.3.3 Warnings
  • I DESCRIPTION & DATA WRANGLING
  • 2 The Basics of R
    • 2.1 The Building Blocks of Data
    • 2.2 Data Types
    • 2.3 Data Structures
      • 2.3.1 Atomic Vectors
      • 2.3.2 Matrices
      • 2.3.3 Lists
      • 2.3.4 Data Frames
  • 3 Wrangling Data
    • 3.1 A Brief Note on Packages
    • 3.2 Reading in Data
    • 3.3 Data Frame Basics
    • 3.4 Fixing Variable Types
      • 3.4.1 Fixing Numeric Variables
      • 3.4.2 Fixing Factor Variables
      • 3.4.3 Fixing Date Variables
    • 3.5 Manipulating Data
      • 3.5.1 Sorting Data
      • 3.5.2 Filtering Rows
      • 3.5.3 Selecting Columns
      • 3.5.4 Creating New Columns
    • 3.6 Combining Steps with the Pipe
    • 3.7 Joining Data
  • 4 Exploring Data
    • 4.1 Summary Statistics for Quantitative Variables
      • 4.1.1 Correlation
      • 4.1.2 Summarizing Quantitative Variables with tidyverse
    • 4.2 Summary Statistics for Categorical Variables
    • 4.3 Visualization
      • 4.3.1 Histograms
      • 4.3.2 Boxplots
      • 4.3.3 Side-by-Side Box Plots
      • 4.3.4 Scatter Plots
      • 4.3.5 Bar Plots
  • II INFERENCE
  • 5 Statistical Inference
    • 5.1 Samples and Populations
    • 5.2 Confidence Intervals
      • 5.2.1 Confidence Intervals for Proportions
      • 5.2.2 Confidence Intervals for Means
    • 5.3 Hypothesis Testing
      • 5.3.1 Formulating Hypotheses
      • 5.3.2 The Logic of Hypothesis Testing
      • 5.3.3 The P-Value
      • 5.3.4 Type I and Type II Errors
      • 5.3.5 Choosing the Appropriate Test
      • 5.3.6 One-Sample Hypothesis Testing
      • 5.3.7 Two-Sample Hypothesis Testing
      • 5.3.8 Hypothesis Testing with More Than Two Samples
  • 6 Regression Modeling
    • 6.1 Linear Regression
      • 6.1.1 Simple Linear Regression
      • 6.1.2 Multiple Linear Regression
      • 6.1.3 Dummy Variables
      • 6.1.4 Transformations
      • 6.1.5 Interactions
      • 6.1.6 Model Diagnostics
    • 6.2 Logistic Regression
      • 6.2.1 Why Not Linear Regression?
      • 6.2.2 Simple Logistic Regression
      • 6.2.3 Multiple Logistic Regression
    • 6.3 Regression Model Building
  • 7 Causal Inference
    • 7.1 Observational Studies
    • 7.2 Randomized Experiments
      • 7.2.1 Designing an Experiment
    • 7.3 Power Analysis
  • III PREDICTION & MACHINE LEARNING
  • 8 Machine Learning: Foundations
    • 8.1 Introduction to Supervised Machine Learning
      • 8.1.1 Classification
      • 8.1.2 Regression
    • 8.2 To Explain or To Predict?
    • 8.3 k-Nearest Neighbors (kNN)
      • 8.3.1 Calculating Distance
      • 8.3.2 Normalizing Data
      • 8.3.3 kNN for Regression
      • 8.3.4 Applying kNN in R
    • 8.4 The Bias-Variance Tradeoff
      • 8.4.1 Train & Validation Sets
      • 8.4.2 k-Fold Cross Validation
      • 8.4.3 Holdout Sets
    • 8.5 Evaluation Metrics
      • 8.5.1 Classification
      • 8.5.2 Regression
  • 9 Supervised Machine Learning: Additional Algorithms
    • 9.1 Regularized Regression Models
      • 9.1.1 Ridge
      • 9.1.2 Lasso
      • 9.1.3 Elastic Net
    • 9.2 CART Models
      • 9.2.1 Classification Trees
      • 9.2.2 Regression Trees
    • 9.3 Random Forest
      • 9.3.1 Feature Importance
      • 9.3.2 Tuning Hyperparameters
    • 9.4 XGBoost
  • 10 Neural Networks
  • 11 Unsupervised Machine Learning
    • 11.1 Dimensionality Reduction
      • 11.1.1 Principal Component Analysis (PCA)
      • 11.1.2 t-SNE
    • 11.2 Clustering
      • 11.2.1 k-Means Clustering
      • 11.2.2 Hierarchical Clustering
      • 11.2.3 DBSCAN
  • 12 Natural Language Processing (NLP)
  • Resources
    • R Cheat Sheets
    • Further Reading
      • Books
      • Textbooks
    • Data Science Websites
  • References
  • APPENDIX
  • A Development Environments
    • A.1 Local RStudio
      • A.1.1 Installing RStudio
      • A.1.2 Getting Started
      • A.1.3 Reading in Data
    • A.2 RStudio Cloud
      • A.2.1 Getting Started
      • A.2.2 Creating Projects
      • A.2.3 Reading in Data
    • A.3 Local Jupyter Notebooks
      • A.3.1 Installing Jupyter
      • A.3.2 Getting Started
      • A.3.3 Reading in Data
    • A.4 Google Colab
    • A.5 Deepnote
  • B Quick Start
    • B.1 Linear Regression
    • B.2 Estimating β of a Stock
  • C Programming Concepts
    • C.1 Conditional Statements
      • C.1.1 for Loops
      • C.1.2 if/else Statements
    • C.2 Functions
  • Published with bookdown

Practical Data Skills

Resources

R Cheat Sheets

  • Base R

  • Caret (Machine Learning)

  • Data Import

  • Data Transformation

  • RStudio IDE

Further Reading

Books

  • Competing in the Age of AI: Strategy and Leadership When Algorithms and Networks Run the World

  • The Power of Experiments: Decision-Making in a Data-Driven World

  • Prediction Machines: The Simple Economics of Artificial Intelligence

Textbooks

  • Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions

  • Data Science for Business

  • An Introduction to Statistical Learning

  • R for Data Science

Data Science Websites

  • Harvard Data Science Review

  • KDnuggets

  • Machine Learning Mastery

  • Stack Overflow