Processing math: 100%
A Minimal Book Example
About
1
Introduction to R
1.1
What We Talk About When We Talk About R
1.1.1
Jupyter Notebooks & Google Colab
1.1.2
Deepnote
1.1.3
RStudio
1.1.4
RStudio Cloud
1.2
R Packages
1.2.1
Tidyverse
1.3
Reading This Book
1.3.1
Code Examples
1.3.2
Code Exercises
1.3.3
Function Syntax
1.3.4
Warnings
I DESCRIPTION & DATA WRANGLING
2
The Basics of R
2.1
The Building Blocks of Data
2.2
Data Types
2.3
Data Structures
2.3.1
Atomic Vectors
2.3.2
Matrices
2.3.3
Lists
2.3.4
Data Frames
2.4
Exercises
3
Wrangling Data
3.1
A Brief Note on Packages
3.1.1
VIDEO: R Packages
3.2
Reading in Data
3.2.1
VIDEO: Reading in Data
3.3
Data Frame Basics
3.4
Fixing Variable Types
3.4.1
Fixing Numeric Variables
3.4.2
Fixing Factor Variables
3.4.3
Fixing Date Variables
3.5
Sorting Data
3.6
Filtering Rows
3.7
Selecting Columns
3.8
Creating New Columns
3.8.1
Helper Functions
3.9
Combining Steps with the Pipe
3.10
VIDEO: Working with Data Frames
3.11
Joining Data
3.11.1
VIDEO: Joining Data
4
Exploring Data
4.1
Summary Statistics for Quantitative Variables
4.1.1
Correlation
4.1.2
VIDEO: Summarizing Quantitative Variables
4.1.3
Summarizing Quantitative Variables with
tidyverse
4.2
Summary Statistics for Categorical Variables
4.2.1
VIDEO: Summarizing Categorical Variables
4.3
Visualization
4.3.1
Histograms
4.3.2
Boxplots
4.3.3
Side-by-Side Box Plots
4.3.4
Scatter Plots
4.3.5
Bar Plots
4.3.6
VIDEO: Visualization
II INFERENCE
5
Statistical Inference
5.1
Samples and Populations
5.1.1
VIDEO: Samples and Populations
5.2
Confidence Intervals
5.2.1
Confidence Intervals for Proportions
5.2.2
Confidence Intervals for Means
5.2.3
VIDEO: Confidence Intervals
5.3
Hypothesis Testing
5.3.1
Formulating Hypotheses
5.3.2
The Logic of Hypothesis Testing
5.3.3
The P-Value
5.3.4
Type I and Type II Errors
5.3.5
VIDEO: P-Values and Hypothesis Testing
5.3.6
Choosing the Appropriate Test
5.3.7
One-Sample Hypothesis Testing
5.3.8
Two-Sample Hypothesis Testing
5.3.9
Hypothesis Testing with More Than Two Samples
5.3.10
A/B Testing
6
Regression Modeling
6.1
Linear Regression
6.1.1
Simple Linear Regression
6.1.2
Multiple Linear Regression
6.1.3
Dummy Variables
6.1.4
Transformations
6.1.5
Interactions
6.2
Logistic Regression
6.2.1
Why Not Linear Regression?
6.2.2
Simple Logistic Regression
6.2.3
Multiple Logistic Regression
6.2.4
VIDEO: Logistic Regression
7
Causal Inference
III PREDICTION & MACHINE LEARNING
8
Supervised Machine Learning
8.1
k-Nearest Neighbors (kNN)
8.2
The Bias-Variance Tradeoff
8.2.1
Train-Test-Holdout
8.2.2
k
-Fold Cross Validation
8.3
Regression Modeling (Revised)
8.3.1
Regularization
8.4
CART Models
8.4.1
Decision Trees
8.4.2
Regression Trees
8.5
Random Forest Models
8.6
Neural Networks
9
Unsupervised Machine Learning
9.1
Dimensionality Reduction
9.1.1
Principal Component Analysis (PCA)
9.1.2
t-SNE
9.2
Clustering
9.2.1
k-Means Clustering
9.2.2
Hierarchical Clustering
9.2.3
DBSCAN
10
Natural Language Processing (NLP)
Resources
R Cheat Sheets
Further Reading
Books
Textbooks
Data Science Websites
APPENDIX
A
Development Environments
A.1
Local RStudio
A.1.1
Installing RStudio
A.1.2
Getting Started
A.1.3
Reading in Data
A.2
RStudio Cloud
A.2.1
Getting Started
A.2.2
Creating Projects
A.2.3
Reading in Data
A.3
Jupyter Notebooks
A.3.1
Installing Jupyter
A.3.2
Getting Started
A.3.3
Reading in Data
A.4
Google Colab
A.5
Deepnote
B
Quick Start
B.1
Linear Regression
B.2
Estimating
β
of a Stock
C
Probability Distributions
C.1
Random Variables
C.2
The Normal Distribution
C.2.1
Properties of a Normal Distribution
C.2.2
Finding Normal Probabilities
C.2.3
Simulating Normal Data
C.3
The Binomial Distribution
C.3.1
Binomial Calculations
D
Programming Concepts
D.1
Conditional Statements
D.1.1
for
Loops
D.1.2
if/else
Statements
D.2
Functions
References
Published with bookdown
Practical Data Skills
Chapter 8
Supervised Machine Learning
8.1
k-Nearest Neighbors (kNN)
8.2
The Bias-Variance Tradeoff
8.2.1
Train-Test-Holdout
8.2.2
k
-Fold Cross Validation
8.3
Regression Modeling (Revised)
8.3.1
Regularization
8.4
CART Models
8.4.1
Decision Trees
8.4.2
Regression Trees
8.5
Random Forest Models
8.6
Neural Networks