Mastering Statistics with R
Welcome
Prerequisites
Who am I ?
Acknowledgement
Progress of this book
Part I: Foundations
1
Probability Concept
1.1
Introduction to Probability
1.1.1
What is probability ?
1.1.2
Basic Mathematic
1.1.3
Set Theory
1.1.4
History of probability
1.1.5
Definitions of Probability
1.1.6
Conditional Probability and Independence
1.1.7
Bayes’ Theorem
1.2
Random variables
1.2.1
Random variables and probability functions
1.2.2
Expected values and Variance
1.2.3
Transformation of random variables
1.2.4
Families of distributions
1.3
Multivariate random variables
1.3.1
Joint distributions
1.3.2
Change of variables
1.3.3
Families of multivariate distributions
1.4
System of Moments
1.5
Limit Theorem
1.5.1
Some inequality
1.5.2
Law of large numbers
2
Basic Statistic
2.1
Descriptive Statistics
2.1.1
Frequency distribution
2.1.2
Measures of statistical characteristics
2.1.3
Exploratory data analysis (EDA)
2.2
Sampling
2.2.1
Random sampling methods (probability sampling techniques)
2.2.2
Nonrandom sampling methods (non-probability sampling techniques)
2.2.3
Other sampling methods
2.3
Estimation
2.3.1
Point Estimation
2.3.2
Interval Estimation
2.4
Testing Hypotheses
2.4.1
Null hypothesis vs. alternative hypothesis
2.4.2
The Neyman-Pearson Lemma
2.5
Some statistical test
2.5.1
Parametric statistical test
2.5.2
Non-parametric statistical test
2.6
Analysis of Variance (ANOVA)
2.6.1
Levene’s test
2.6.2
Bartlett’s test
2.6.3
One-way ANOVA
2.6.4
Two-way ANOVA
2.6.5
Welch’s ANOVA
2.6.6
Kruskal–Wallis test
2.6.7
Friedman test
2.6.8
Normality Test
2.7
Correlation Analysis and Linear Regression
3
Mathematical Statistics
3.1
Limiting Distributions
3.1.1
Converge in probability
3.1.2
Converge in distribution
Part II: Data Collecting and Processing
4
Data Collection
5
Data Processing
5.1
Data Cleaning
5.2
Handling Missing Data
5.3
Normalization & Standardization
5.4
Feature Engineering
6
Data Management
6.1
Database
6.2
SQL
Part III: Methodology - Beginner
7
Probability Models
7.1
Stochastic Process
7.1.1
Random Walk
7.1.2
Poisson Process
7.1.3
Markov Process
7.1.4
Wiener Process
7.1.5
Lévy Process
7.2
Markov Chain
7.2.1
Semi-Markov Chain
7.2.2
Hidden Markov Models
7.2.3
Mover-Stayer Models
7.3
Ergodic Theory
7.4
Stochastic Calculus
7.4.1
Ito’s Lemma
8
Regression Analysis
8.1
Ordinary Least Squares (OLS)
8.1.1
Variable Selection
8.1.2
Model Selection
8.2
Fixed Effect and Random Effect
8.3
Analysis of Covariance (ANCOVA)
8.4
Logistic Regression
8.5
Fractional Model
9
Categorical Data Analysis
10
Multivariate Analysis
10.1
Multivariate distributions
10.2
General Linear Model
10.3
Multivariate Analysis of Variance (MANOVA)
10.4
Multivariate Analysis of Covariance (MANCOVA)
10.5
Structural Equation Modeling (SEM)
10.6
Dimension Reduction Method
10.6.1
t-SNE
10.6.2
DBSCAN
10.6.3
Locally Linear Embedding
10.6.4
Laplacian Eigenmaps
10.6.5
ISOMAP
10.6.6
Uniform Manifold Approximation and Projection (UMAP)
10.7
Clustering Method
10.7.1
K-means, K-medoids
10.7.2
KNN
10.7.3
Principal Component Analysis (PCA)
10.7.4
Principal Co-ordinates Analysis (PCoA)
10.7.5
Multidimensional Scaling (MDS)
10.7.6
Self-organizing map (SOM)
10.7.7
Spectral clustering
10.7.8
Quantum clustering
10.7.9
Partial Least Squares Discriminant Analysis (PLS-DA)
10.7.10
Unweighted Paired-Group Method Using Arithmetic Means (UPGMA)
10.8
Factor Analysis
10.8.1
Kaiser–Meyer–Olkin test
10.8.2
Questionnaire
10.9
Canonical-correlation Analysis (CCA)
10.10
Analysis of Similarities (ANOSIM)
11
Time Series Analysis
11.1
Time Series Decomposition
11.2
ACF and PACF
11.3
White Noise
11.4
Autoregressive (AR)
11.5
Moving Average (MA)
11.6
Kalman Filter and Savitzky–Golay filter
11.7
ARMA, ARIMA, SARIMA, SARFIMA
11.8
Granger causality
11.9
VAR
11.10
GARCH
11.11
Factor Model
11.12
Some advanced topics
11.12.1
Lag regression
11.12.2
Mixed-frequency data
Part IV: Methodology - Advanced
12
Generalized Linear Models
12.1
Weighted Least Square (WLS) and Generalized Least Square (GLS)
12.1.1
Rootogram
12.2
Complex Linear Model
12.3
Generalized Estimating Equation (GEE)
12.4
Hierarchical Linear Model
12.4.1
Instrumental variable
12.5
Multilevel Model
12.6
Quantile Regression
12.7
Non-parametric model
12.7.1
LOESS
13
Spatial Statistics
13.1
Point-referenced Data
13.1.1
Gaussian Process
13.1.2
Exploratory data analysis
13.1.3
Models for spatial dependence
13.1.4
Kriging (Spatial prediction)
13.2
Areal/Lattice Data
13.2.1
Spatial autocorrelation
13.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
13.3
Point Pattern Data
13.3.1
Poisson processes
13.3.2
Cox processes
13.3.3
K-functions
13.4
Other Topics
13.4.1
Spatio-temporal models
13.4.2
Frequency domain methods
13.4.3
Deep Kriging
14
Functional Data Analysis
15
Bayesian Analysis
15.1
Laplace Approximation and BIC
16
High Dimensional Data
Part V: Methodology - Others
17
Extreme value theory
18
Directional Statistics
18.1
Circular Regression
Part V: Industrial Statistics
19
Quality Control
19.1
History
19.2
7 tools
19.3
ARL
19.4
\(R\)
chart
19.5
\(s\)
chart
19.6
\(\bar{X}\)
chart
19.7
\(p\)
chart
19.8
CUSUM
19.9
EWMA
19.10
Sequential probability ratio test
20
Reliability Analysis
21
Design of Experiments
21.1
Latin hypercube
21.2
Sequential design
21.3
Space-filling design
21.4
Active learning (Optimal experimental design)
21.5
Online machine learning
Part VI: Biostatistic
22
Survial Analysis
22.1
Unobserved data
22.2
Survival Function and Hazard Function
22.3
Kaplan–Meier Estimator
22.4
Log-rank Test
22.5
Proportional Hazards Model
22.6
Accelerated Failure Time (AFT) Model
22.7
Nelson–Aalen Estimator
22.8
Turnbull-Frydman Estimator
22.9
Restricted Median Survival Time (RMST)
22.10
Firth’s penalized logistic regression
22.11
Competing Risks
23
Biostatistical Data Analysis
23.1
p-value correction
23.1.1
Bofferoni
23.1.2
Tukey’s HSD
23.1.3
Fisher
23.1.4
False Discovery Rate (FDR)
23.1.5
Q-value
23.1.6
E-value
23.2
Trend Tests
23.2.1
Cochran-Armitage test
23.2.2
Jonckheere’s trend test
23.3
Propensity score
23.4
PLINK
23.5
Polygenic Risk Score
23.6
RNA-seq Analysis
23.7
Metabolomics Analysis
23.7.1
SMART
23.7.2
pareto normalization
23.8
Permutational multivariate analysis of variance (PERMANOVA)
23.9
PERMDISP
24
Causal Inference
24.1
DAG
25
Statistical Designs and Analyses in Clinical Trials
25.1
Phase I
25.2
Phase II
25.3
Phase III
25.4
\(\alpha\)
spending function
Part VII: Other Applications
26
Social science
26.1
27
Psychometrics
27.1
Item Response Theory
28
Industry
28.1
Degradation data
Part VI: Computational Statistics
29
Statistical Learning
29.1
Root finding
29.1.1
Newton’s method (Newton–Raphson algorithm)
29.1.2
Gauss–Newton algorithm
29.1.3
Gradient Descent
29.1.4
Conjugate gradient method
29.1.5
Nelder–Mead method
29.1.6
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
29.2
Information Criteria
29.2.1
AIC
29.2.2
BIC
29.3
Desicion Tree and Random Forest
29.4
Bagging
29.5
Boosting
29.5.1
Gradient Boost Desicion Tree (GBDT)
29.5.2
XGBoost
29.5.3
LightGBM
29.5.4
CATBoost
29.5.5
RUSBoost
30
Statistical Computing
30.1
Generate random variables
30.1.1
Inverse transform method
30.1.2
Accept-Rejection method
30.2
Variance reduction
30.3
Monte-Carlo and Markov chain (MCMC)
30.4
EM algorithm
30.5
Back-fitting algorithm
30.6
Particle Swarm Optimization (PSO)
Part IV: Deal with Computer Science
31
Data Structure
31.1
Linked list
31.2
Satck
31.3
Queue
31.4
Tree
32
Algorithm
32.1
Graph and tree traversal algorithms
32.1.1
Search
32.1.2
Shortest path
32.1.3
Minimum spanning tree
33
Information Theory
33.1
Entropy
33.2
Data compression
34
Machine Learning
34.1
Double Machine Learning
34.2
Adversarial machine learning (AML)
34.3
Reinforcement Learning
34.4
Computational Learning Theory
34.4.1
Probably Approximately Correct (PAC)
35
Big Data Analytics Techniques and Applications
35.1
Visualization
35.2
Hadoop
35.3
Spark
36
Data Mining
36.1
Online machine learning
37
Image Processing
38
Deep Learning
38.1
Basic concept
38.2
DNN
38.3
CNN
38.4
RNN
38.4.1
Long Short-Term Memory (LSTM)
38.4.2
Gated Recurrent Unit (GRU)
38.5
Generative adversarial networks (GAN)
38.6
Transformer Networks
38.7
Autoencoders & Variational Autoencoders (VAEs)
38.8
Graph Neural Networks (GNNs)
38.9
Physics-informed neural networks (PINNs)
38.10
Deep Q-Networks (DQNs)
38.11
Quantum neural network (QNN)
38.12
Some famous models
38.12.1
LeNet、AlexNet、VGG、NiN
38.12.2
GoogLeNet
38.12.3
ResNet
38.12.4
DenseNet
38.12.5
YOLO
38.13
Modern NN models
38.13.1
Liquid Neural Network (LNN)
38.13.2
Kolmogorov-Arnold Networks (KAN)
Part VIII: Miscellaneous
39
Statistical Education
40
Ethics and Philosophy
Part II: Statistic Theory
41
Statistical Inference
42
Decision Theory
42.1
Regret
43
Probability Theory
43.1
Basics from Measure Theory
43.2
Limit of the sets
43.3
Probability Inequalities
43.4
Stochastic ordering
43.5
Malliavin Calculus
43.6
Regular conditional probability
43.6.1
Markov kernel
43.7
Martingale
43.7.1
Reverse martingale
Appendix
A
Matrix calculus
B
Advanced programming in R
B.1
Technique for Basic operator
B.2
Special operator
B.2.1
Inner function
B.2.2
Super assignment
<<-
B.3
Pipe operator
B.3.1
User define pipe operator
B.4
Non-standard Evaluation (NSE)
B.4.1
Tidy evaluation
B.5
Functional programming
B.5.1
Helper function
B.6
Progress bar
B.7
Parallel computing
Published with bookdown
Mastering Statistics with R
31.1
Linked list