My Data Science Notes
Intro
1
Probability
1.1
Principles
1.2
Discrete Distributions
1.2.1
Bernoulli
1.2.2
Binomial
1.2.3
Poission
1.2.4
Multinomial
1.2.5
Negative-Binomial
1.2.6
Geometric
1.2.7
Hypergeometric
1.2.8
Gamma
1.3
Continuous Distributions
1.3.1
Normal
1.4
Join Distributions
1.5
Likelihood
2
Categorical Analysis - Nonmodel
2.1
Chi-Square Test
2.2
One-Way Tables
2.2.1
Chi-Square Goodness-of-Fit Test
2.2.2
Proportion Test
2.3
Two-Way Tables
2.3.1
Chi-Square Independence Test
2.3.2
Residuals Analysis
2.3.3
Difference in Proportions
2.3.4
Relative Risk
2.3.5
Odds Ratio
2.3.6
Partitioning Chi-Square
2.3.7
Correlation
2.4
K-Way Tables
2.4.1
Odds Ratio
2.4.2
Chi-Square Independence Test
3
Continuous Variable Analysis
3.0.1
Correlation
4
Experiment Design
4.1
Single Factor
4.2
Blocking
4.3
Nested
4.4
Split Plot
PART 2: Supervised Machine Learning
4.5
Linear Regression Model
4.6
Parameter Estimation
4.7
Model Assumptions
4.7.1
Linearity
4.7.2
Multicollinearity
4.7.3
Normality
4.7.4
Equal Variances
4.8
Prediction
4.9
Inference
4.9.1
t
-Test
4.9.2
F
-Test
4.10
Interpretation
4.11
Model Validation
4.11.1
Accuracy Metrics
4.11.2
Cross-Validation
4.11.3
Gain Curve
4.12
Reference
5
Generalized Linear Models
5.1
Logistic Regression
5.2
Multinomial Logistic Regression
5.3
Ordinal Logistic Regression
5.3.1
Assumptions
5.3.2
Modeling
5.3.3
Case Study
5.4
Poisson Regression
6
Multivariate Statistical Analysis
6.1
Background
6.2
MANOVA
6.3
Repeated Measures
6.4
LDA
6.5
PCA
6.6
Factor Analysis
6.7
Canonical Correlation
6.8
Cluster Analysis
7
Classification
8
Regularization
8.1
Ridge
8.2
Lasso
8.3
Elastic Net
Model Summary
9
Decision Trees
9.1
Classification Tree
9.1.1
Measuring Performance
9.1.2
Training with Caret
9.2
Regression Tree
9.2.1
Training with Caret
9.3
Bagged Trees
9.3.1
Bagged Classification Tree
9.3.2
Bagging Regression Tree
9.4
Random Forests
9.5
Gradient Boosting
9.6
Summary
9.6.1
Classification Trees
9.6.2
Regression Trees
9.7
Reference
10
Non-linear Models
10.1
Splines
10.2
MARS
10.3
GAM
11
Support Vector Machines
11.1
Maximal Margin Classifier
11.2
Support Vector Classifier
11.3
Support Vector Machines
11.4
Example
11.5
Using Caret
12
Principal Components Analysis
13
Text Mining
13.1
N-Grams
13.2
Converting to and from non-tidy formats
13.3
Example
13.4
stringr package
13.5
Regular Expressions
14
Survival Analysis
14.1
Basic Concepts
14.2
Survival Curve Estimation
14.2.1
Kaplan-Meier
14.2.2
Weibull
14.2.3
Cox
Appendix
Publishing to BookDown
Shiny Apps
Packages
Create a package
14.2.4
Document Functions with roxygen
Create Data
Create Vignette
References
Published with bookdown
My Data Science Notes
Chapter 7
Classification