My Data Science Notes
Intro
1
Probability
1.1
Principles
1.2
Discrete Distributions
1.2.1
Bernoulli
1.2.2
Binomial
1.2.3
Poission
1.2.4
Multinomial
1.2.5
Negative-Binomial
1.2.6
Geometric
1.2.7
Hypergeometric
1.2.8
Gamma
1.3
Continuous Distributions
1.3.1
Normal
1.4
Join Distributions
1.5
Likelihood
2
Categorical Analysis - Nonmodel
2.1
Chi-Square Test
2.2
One-Way Tables
2.2.1
Chi-Square Goodness-of-Fit Test
2.2.2
Proportion Test
2.3
Two-Way Tables
2.3.1
Chi-Square Independence Test
2.3.2
Residuals Analysis
2.3.3
Difference in Proportions
2.3.4
Relative Risk
2.3.5
Odds Ratio
2.3.6
Partitioning Chi-Square
2.3.7
Correlation
2.4
K-Way Tables
2.4.1
Odds Ratio
2.4.2
Chi-Square Independence Test
3
Continuous Variable Analysis
3.0.1
Correlation
4
Experiment Design
4.1
Single Factor
4.2
Blocking
4.3
Nested
4.4
Split Plot
PART 2: Supervised Machine Learning
4.5
Linear Regression Model
4.6
Parameter Estimation
4.7
Model Assumptions
4.7.1
Linearity
4.7.2
Multicollinearity
4.7.3
Normality
4.7.4
Equal Variances
4.8
Prediction
4.9
Inference
4.9.1
t
-Test
4.9.2
F
-Test
4.10
Interpretation
4.11
Model Validation
4.11.1
Accuracy Metrics
4.11.2
Cross-Validation
4.11.3
Gain Curve
4.12
Reference
5
Generalized Linear Models
5.1
Logistic Regression
5.2
Multinomial Logistic Regression
5.3
Ordinal Logistic Regression
5.3.1
Assumptions
5.3.2
Modeling
5.3.3
Case Study
5.4
Poisson Regression
6
Multivariate Statistical Analysis
6.1
Background
6.2
MANOVA
6.3
Repeated Measures
6.4
LDA
6.5
PCA
6.6
Factor Analysis
6.7
Canonical Correlation
6.8
Cluster Analysis
7
Classification
8
Regularization
9
Decision Trees
9.1
Classification Tree
9.1.1
Confusion Matrix
9.1.2
ROC Curve
9.1.3
Caret Approach
9.2
Regression Trees
9.2.1
Caret Approach
9.3
Bagging
9.4
Random Forests
9.5
Gradient Boosting
9.6
Summary
9.7
Reference
10
Non-linear Models
10.1
Splines
10.2
MARS
10.3
GAM
11
Support Vector Machines
11.1
Maximal Margin Classifier
11.2
Support Vector Classifier
11.3
Support Vector Machines
11.4
Example
11.5
Using Caret
12
Principal Components Analysis
13
Text Mining
14
Survival Analysis
14.1
Survival Theory
14.2
Survival Curve Estimation
14.2.1
Kaplan-Meier
14.2.2
Weibull
14.2.3
Cox
14.3
Survival Curve Estimation
14.4
Proportional Hazards Model
14.4.1
Xelox
14.4.2
pancreatic
14.4.3
prostateSurvival
14.4.4
pharmacoSmoking
14.4.5
hepatoCellular
14.4.6
GBSG2
14.4.7
Unemp
Appendix
Publishing to BookDown
Shiny Apps
Packages
Create a package
14.4.8
Document Functions with roxygen
Create Data
Create Vignette
References
Published with bookdown
My Data Science Notes
14.3
Survival Curve Estimation
When