A Minimal Book Example
About
1
Introduction to R
1.1
What We Talk About When We Talk About R
1.1.1
Jupyter Notebooks & Google Colab
1.1.2
Deepnote
1.1.3
RStudio
1.1.4
RStudio Cloud
1.2
R Packages
1.2.1
Tidyverse
1.3
Reading This Book
1.3.1
Code Examples
1.3.2
Function Syntax
1.3.3
Warnings
I DESCRIPTION & DATA WRANGLING
2
The Basics of R
2.1
The Building Blocks of Data
2.2
Data Types
2.3
Data Structures
2.3.1
Atomic Vectors
2.3.2
Matrices
2.3.3
Lists
2.3.4
Data Frames
3
Wrangling Data
3.1
A Brief Note on Packages
3.2
Reading in Data
3.3
Data Frame Basics
3.4
Fixing Variable Types
3.4.1
Fixing Numeric Variables
3.4.2
Fixing Factor Variables
3.4.3
Fixing Date Variables
3.5
Manipulating Data
3.5.1
Sorting Data
3.5.2
Filtering Rows
3.5.3
Selecting Columns
3.5.4
Creating New Columns
3.6
Combining Steps with the Pipe
3.7
Joining Data
4
Exploring Data
4.1
Summary Statistics for Quantitative Variables
4.1.1
Correlation
4.1.2
Summarizing Quantitative Variables with
tidyverse
4.2
Summary Statistics for Categorical Variables
4.3
Visualization
4.3.1
Histograms
4.3.2
Boxplots
4.3.3
Side-by-Side Box Plots
4.3.4
Scatter Plots
4.3.5
Bar Plots
II INFERENCE
5
Statistical Inference
5.1
Samples and Populations
5.2
Confidence Intervals
5.2.1
Confidence Intervals for Proportions
5.2.2
Confidence Intervals for Means
5.3
Hypothesis Testing
5.3.1
Formulating Hypotheses
5.3.2
The Logic of Hypothesis Testing
5.3.3
The P-Value
5.3.4
Type I and Type II Errors
5.3.5
Choosing the Appropriate Test
5.3.6
One-Sample Hypothesis Testing
5.3.7
Two-Sample Hypothesis Testing
5.3.8
Hypothesis Testing with More Than Two Samples
6
Regression Modeling
6.1
Linear Regression
6.1.1
Simple Linear Regression
6.1.2
Multiple Linear Regression
6.1.3
Dummy Variables
6.1.4
Transformations
6.1.5
Interactions
6.1.6
Model Diagnostics
6.2
Logistic Regression
6.2.1
Why Not Linear Regression?
6.2.2
Simple Logistic Regression
6.2.3
Multiple Logistic Regression
6.3
Regression Model Building
7
Causal Inference
7.1
Observational Studies
7.2
Randomized Experiments
7.2.1
Designing an Experiment
7.3
Power Analysis
III PREDICTION & MACHINE LEARNING
8
Machine Learning: Foundations
8.1
Introduction to Supervised Machine Learning
8.1.1
Classification
8.1.2
Regression
8.2
To Explain or To Predict?
8.3
k-Nearest Neighbors (kNN)
8.3.1
Calculating Distance
8.3.2
Normalizing Data
8.3.3
kNN for Regression
8.3.4
Applying kNN in R
8.4
The Bias-Variance Tradeoff
8.4.1
Train & Validation Sets
8.4.2
\(k\)
-Fold Cross Validation
8.4.3
Holdout Sets
8.5
Evaluation Metrics
8.5.1
Classification
8.5.2
Regression
9
Supervised Machine Learning: Additional Algorithms
9.1
Regularized Regression Models
9.1.1
Ridge
9.1.2
Lasso
9.1.3
Elastic Net
9.2
CART Models
9.2.1
Classification Trees
9.2.2
Regression Trees
9.3
Random Forest
9.3.1
Feature Importance
9.3.2
Tuning Hyperparameters
9.4
XGBoost
10
Neural Networks
11
Unsupervised Machine Learning
11.1
Dimensionality Reduction
11.1.1
Principal Component Analysis (PCA)
11.1.2
t-SNE
11.2
Clustering
11.2.1
k-Means Clustering
11.2.2
Hierarchical Clustering
11.2.3
DBSCAN
12
Natural Language Processing (NLP)
Resources
R Cheat Sheets
Further Reading
Books
Textbooks
Data Science Websites
References
APPENDIX
A
Development Environments
A.1
Local RStudio
A.1.1
Installing RStudio
A.1.2
Getting Started
A.1.3
Reading in Data
A.2
RStudio Cloud
A.2.1
Getting Started
A.2.2
Creating Projects
A.2.3
Reading in Data
A.3
Local Jupyter Notebooks
A.3.1
Installing Jupyter
A.3.2
Getting Started
A.3.3
Reading in Data
A.4
Google Colab
A.5
Deepnote
B
Quick Start
B.1
Linear Regression
B.2
Estimating
\(\beta\)
of a Stock
C
Programming Concepts
C.1
Conditional Statements
C.1.1
for
Loops
C.1.2
if/else
Statements
C.2
Functions
Published with bookdown
Practical Data Skills
Chapter 12
Natural Language Processing (NLP)
[In Progress]