My Data Science Notes
Intro
1
Probability
1.1
Principles
1.2
Discrete Distributions
1.2.1
Bernoulli
1.2.2
Binomial
1.2.3
Poission
1.2.4
Multinomial
1.2.5
Negative-Binomial
1.2.6
Geometric
1.2.7
Hypergeometric
1.2.8
Gamma
1.3
Continuous Distributions
1.3.1
Normal
1.4
Join Distributions
1.5
Likelihood
2
Statistical Tests
2.1
Chi-Square Test
2.2
One-Way Tables
2.2.1
Chi-Square Goodness-of-Fit Test
2.2.2
Proportion Test
2.3
Two-Way Tables
2.3.1
Chi-Square Independence Test
2.3.2
Residuals Analysis
2.3.3
Difference in Proportions
2.3.4
Relative Risk
2.3.5
Odds Ratio
2.3.6
Partitioning Chi-Square
2.3.7
Correlation
2.4
K-Way Tables
2.4.1
Odds Ratio
2.4.2
Chi-Square Independence Test
2.5
Continuous Variable Analysis
2.5.1
Correlation
3
Experiment Design
3.1
Single Factor
3.2
Blocking
3.3
Nested
3.4
Split Plot
PART 2: Supervised Machine Learning
4
Ordinary Least Squares
4.1
Linear Regression Model
4.2
Parameter Estimation
4.3
Model Assumptions
4.3.1
Linearity
4.3.2
Multicollinearity
4.3.3
Normality
4.3.4
Equal Variances
4.4
Prediction
4.5
Inference
4.5.1
t
-Test
4.5.2
F
-Test
4.6
Interpretation
4.7
Model Validation
4.7.1
Accuracy Metrics
4.7.2
Cross-Validation
4.7.3
Gain Curve
4.8
OLS Reference
5
Generalized Linear Models
5.1
Logistic Regression
5.2
Multinomial Logistic Regression
5.3
Ordinal Logistic Regression
5.3.1
Assumptions
5.3.2
Modeling
5.3.3
Case Study
5.4
Poisson Regression
6
Multivariate Statistical Analysis
6.1
Background
6.2
MANOVA
6.3
Repeated Measures
6.4
LDA
6.5
PCA
6.6
Factor Analysis
6.7
Canonical Correlation
6.8
Cluster Analysis
7
Regularization
7.1
Ridge
7.2
Lasso
7.3
Elastic Net
Model Summary
8
Decision Trees
8.1
Classification Tree
8.1.1
Measuring Performance
8.1.2
Training with Caret
8.2
Regression Tree
8.2.1
Training with Caret
8.3
Bagged Trees
8.3.1
Bagged Classification Tree
8.3.2
Bagging Regression Tree
8.4
Random Forests
8.5
Gradient Boosting
8.6
Summary
8.6.1
Classification Trees
8.6.2
Regression Trees
9
Non-linear Models
9.1
Splines
9.2
MARS
9.3
GAM
10
Support Vector Machines
10.1
Maximal Margin Classifier
10.2
Support Vector Classifier
10.3
Support Vector Machines
10.4
Example
10.5
Using Caret
11
Principal Components Analysis
12
Text Mining
12.1
Tidy Text
12.2
Bag of Words
12.3
Sentiment Analysis
12.3.1
N-Grams
12.3.2
Converting to and from non-tidy formats
12.3.3
Example
12.4
Topic Modeling
12.5
Appendix: String Manipulation
12.5.1
stringr package
12.5.2
Regular Expressions
12.6
Reference Links
13
Survival Analysis
13.1
Basic Concepts
13.2
Survival Curve Estimation
13.2.1
Kaplan-Meier
13.2.2
Weibull
13.2.3
Cox
Appendix
Publishing to BookDown
Shiny Apps
Packages
Create a package
13.2.4
Document Functions with roxygen
Create Data
Create Vignette
References
Published with bookdown
My Data Science Notes
6.7
Canonical Correlation