My Data Science Notes
Intro
1
Probability
1.1
Principles
1.2
Discrete Distributions
1.2.1
Bernoulli
1.2.2
Binomial
1.2.3
Poission
1.2.4
Multinomial
1.2.5
Negative-Binomial
1.2.6
Geometric
1.2.7
Hypergeometric
1.2.8
Gamma
1.3
Continuous Distributions
1.3.1
Normal
1.4
Join Distributions
1.5
Likelihood
2
Statistical Tests
2.1
Chi-Square Test
2.2
One-Way Tables
2.2.1
Chi-Square Goodness-of-Fit Test
2.2.2
Proportion Test
2.3
Two-Way Tables
2.3.1
Chi-Square Independence Test
2.3.2
Residuals Analysis
2.3.3
Difference in Proportions
2.3.4
Relative Risk
2.3.5
Odds Ratio
2.3.6
Partitioning Chi-Square
2.3.7
Correlation
2.4
K-Way Tables
2.4.1
Odds Ratio
2.4.2
Chi-Square Independence Test
2.5
Continuous Variable Analysis
2.5.1
Correlation
3
Experiment Design
3.1
Single Factor
3.2
Blocking
3.3
Nested
3.4
Split Plot
PART 2: Supervised Machine Learning
4
Ordinary Least Squares
4.1
Linear Regression Model
4.2
Parameter Estimation
4.3
Model Assumptions
4.3.1
Linearity
4.3.2
Multicollinearity
4.3.3
Normality
4.3.4
Equal Variances
4.4
Prediction
4.5
Inference
4.5.1
t
-Test
4.5.2
F
-Test
4.6
Interpretation
4.7
Model Validation
4.7.1
Accuracy Metrics
4.7.2
Cross-Validation
4.7.3
Gain Curve
4.8
OLS Reference
5
Generalized Linear Models
5.1
Logistic Regression
5.2
Multinomial Logistic Regression
5.3
Ordinal Logistic Regression
5.3.1
Assumptions
5.3.2
Modeling
5.3.3
Case Study
5.4
Poisson Regression
6
Multivariate Statistical Analysis
6.1
Background
6.2
MANOVA
6.3
Repeated Measures
6.4
LDA
7
Regularization
7.1
Ridge
7.2
Lasso
7.3
Elastic Net
Model Summary
8
Decision Trees
8.1
Classification Tree
8.1.1
Measuring Performance
8.1.2
Training with Caret
8.2
Regression Tree
8.2.1
Training with Caret
8.3
Bagged Trees
8.3.1
Bagged Classification Tree
8.3.2
Bagging Regression Tree
8.4
Random Forests
8.5
Gradient Boosting
8.6
Summary
8.6.1
Classification Trees
8.6.2
Regression Trees
9
Non-linear Models
9.1
Splines
9.2
MARS
9.3
GAM
10
Support Vector Machines
10.1
Maximal Margin Classifier
10.2
Support Vector Classifier
10.3
Support Vector Machines
10.4
My svm Example
10.5
Using Caret
PART 3: Unupervised Machine Learning
11
Dimensionality Reduction
11.1
PCA
11.2
t-SNE
11.3
SVD
12
Cluster Analysis
12.1
K-Means
12.2
HCA
13
Text Mining
13.1
Tidy Text
13.2
Bag of Words
13.3
Sentiment Analysis
13.3.1
N-Grams
13.3.2
Converting to and from non-tidy formats
13.3.3
Example
13.4
Topic Modeling
13.5
Appendix: String Manipulation
13.5.1
stringr package
13.5.2
Regular Expressions
13.6
Reference Links
14
Survival Analysis
14.1
Basic Concepts
14.2
Survival Curve Estimation
14.2.1
Kaplan-Meier
14.2.2
Weibull
14.2.3
Cox
Appendix
Publishing to BookDown
Shiny Apps
Packages
Create a package
14.2.4
Document Functions with roxygen
Create Data
Create Vignette
References
Published with bookdown
My Data Science Notes
11.1
PCA