Mastering Statistics with R
Preface
Welcome
Prerequisites
Who am I ?
Acknowledgement
Progress of this book
Part I: Foundations
1
Probability Concept
1.1
Introduction to Probability
1.1.1
What is probability ?
1.1.2
Basic Mathematic
1.1.3
History of probability
1.1.4
Definitions of Probability
1.1.5
Conditional Probability and Independence
1.1.6
Bayes’ Theorem
1.2
Random Variables
1.2.1
Random variables and probability functions
1.2.2
Expected values and Variance
1.2.3
Transformation of random variables
1.2.4
Families of distributions
1.3
Multivariate Random Variables
1.3.1
Joint distributions
1.3.2
Change of variables
1.3.3
Families of multivariate distributions
1.4
System of Moments
1.5
Limit Theorem
1.5.1
Some inequality
1.5.2
Law of large numbers
2
Elementary Statistics
2.1
Descriptive Statistics
2.1.1
Frequency distribution
2.1.2
Measures of statistical characteristics
2.1.3
Exploratory data analysis (EDA)
2.2
Sampling
2.2.1
Random sampling methods (probability sampling techniques)
2.2.2
Nonrandom sampling methods (non-probability sampling techniques)
2.2.3
Other sampling methods
2.3
Estimation
2.3.1
Point Estimation
2.3.2
Interval Estimation
2.4
Testing Hypotheses
2.4.1
Null hypothesis vs. alternative hypothesis
2.4.2
The Neyman-Pearson Lemma
2.5
Some statistical test
2.5.1
Parametric statistical test
2.5.2
Non-parametric statistical test
2.6
Analysis of Variance (ANOVA)
2.6.1
Levene’s test
2.6.2
Bartlett’s test
2.6.3
One-way ANOVA
2.6.4
Two-way ANOVA
2.6.5
Welch’s ANOVA
2.6.6
Kruskal–Wallis test
2.6.7
Friedman test
2.6.8
Normality Test
2.7
Correlation Analysis and Linear Regression
3
Mathematical Statistics
3.1
Properties of estimators
3.1.1
Uniformly Minimum Variance Unbiased Estimator (UMVUE)
3.2
Limiting Distributions
3.2.1
Converge in probability
3.2.2
Converge in distribution
3.3
Asymptotic Theory
3.4
Hypothesis Testing Theory
3.4.1
MP test and UMP test
3.4.2
monotone likelihood ratio (MLR)
3.4.3
LR-test, GLRT
3.4.4
sequential probability ratio test (SPRT)
3.5
Decision Theory
3.5.1
Regret
3.6
Bayesian Tests
Part II: Methodology - Beginner
4
Probability Models
4.1
Review of Probability Computation
4.2
Stochastic Process
4.2.1
Discrete-time stochastic process
4.2.2
Continuous time stochastic process
4.2.3
Random Walk
4.2.4
Poisson Process
4.2.5
Markov Process
4.2.6
Wiener Process
4.2.7
Lévy Process
4.3
Markov Chain
4.3.1
Semi-Markov Chain
4.3.2
Hidden Markov Models
4.3.3
Mover-Stayer Models
4.4
Ergodic Theory
4.5
Degradation data
4.6
Renewal Theory
4.7
Ruin Theory
4.8
Queueing Theory
4.9
Extreme Value Theory
4.10
Change Detection
4.11
Stochastic Calculus
4.11.1
Ito’s Lemma
5
Regression Analysis
5.1
Ordinary Least Squares (OLS)
5.1.1
Variable Selection
5.1.2
Model Selection
5.2
Fixed Effect and Random Effect
5.3
Analysis of Covariance (ANCOVA)
5.4
Logistic Regression
5.5
Fractional Model
5.6
Isotonic regression
6
Categorical Data Analysis
6.1
Partial least squares regression (PLS)
7
Multivariate Analysis
7.1
Multivariate distributions
7.2
General Linear Model
7.3
Multivariate Analysis of Variance (MANOVA)
7.4
Multivariate Analysis of Covariance (MANCOVA)
7.5
Structural Equation Modeling (SEM)
7.6
Dimension Reduction Method
7.6.1
Random Projection
7.6.2
Discriminant Analysis (LDA)
7.6.3
Principal Component Analysis (PCA)
7.6.4
SVD (Singular Value Decomposition)
7.6.5
Nonnegative Matrix Factorization (NMF)
7.6.6
t-SNE
7.6.7
Locally Linear Embedding
7.6.8
Independent Component Analysis (ICA)
7.6.9
Autoencoders
7.6.10
Laplacian Eigenmaps
7.6.11
ISOMAP
7.6.12
Uniform Manifold Approximation and Projection (UMAP)
7.7
Factor Analysis
7.7.1
Kaiser–Meyer–Olkin test
7.7.2
Questionnaire
7.8
Multidimensional Scaling (MDS)
7.9
Canonical-correlation Analysis (CCA)
7.10
Analysis of Similarities (ANOSIM)
8
Time Series Analysis
8.1
Time Series Decomposition
8.2
ACF and PACF
8.3
White Noise
8.4
Autoregressive (AR)
8.5
Moving Average (MA)
8.6
Kalman Filter and Savitzky–Golay filter
8.7
ARMA, ARIMA, SARIMA, SARFIMA
8.8
Granger causality
8.9
Nonlinear Time Series
8.9.1
Threshold Autoregressive (TAR) Model
8.9.2
GARCH
8.9.3
Smooth Transition Autoregressive (STAR) Model
8.9.4
Non-linear Moving Average (NMA) Model
8.9.5
Polynomial and Exponential Model
8.10
Multivariate Time Series
8.10.1
VAR
8.10.2
Factor Model
8.11
Some Advanced Topics
8.11.1
Lag regression
8.11.2
Mixed-frequency data
Part III: Methodology - Advanced
9
Generalized Linear Models
9.1
Weighted Least Square (WLS) and Generalized Least Square (GLS)
9.1.1
Rootogram
9.2
Complex Linear Model
9.3
Generalized Estimating Equation (GEE)
9.4
Hierarchical Linear Model
9.4.1
Instrumental variable
9.5
Multilevel Model
10
Spatial Statistics
10.1
Point-referenced Data
10.1.1
Gaussian Process
10.1.2
Exploratory data analysis
10.1.3
Models for spatial dependence
10.1.4
Kriging (Spatial prediction)
10.2
Areal/Lattice Data
10.2.1
Spatial autocorrelation
10.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
10.3
Point Pattern Data
10.3.1
Poisson processes
10.3.2
Cox processes
10.3.3
K-functions
10.4
Other Topics
10.4.1
Spatio-temporal models
10.4.2
Frequency domain methods
10.4.3
Deep Kriging
11
Functional Data Analysis
12
Bayesian Analysis
12.1
Laplace Approximation and BIC
13
High Dimensional Data Analysis
13.1
Curse of Dimension
Part IV: Methodology - Others
14
Nonparametric Method
14.1
Quantile Regression
14.2
LOESS
14.3
Curve estimation
14.3.1
Kernel
15
Directional Statistics
15.1
Circular Distribution
15.2
Circular Regression
16
Topological Data Analysis
Part V: Application - Industrial Statistics
17
Quality Control
17.1
History
17.2
7 tools
17.3
ARL
17.4
\(R\)
chart
17.5
\(s\)
chart
17.6
\(\bar{X}\)
chart
17.7
\(p\)
chart
17.8
CUSUM
17.9
EWMA
17.10
Sequential probability ratio test
18
Reliability Analysis
19
Design of Experiments
19.1
Latin hypercube
19.2
Sequential design
19.3
Space-filling design
19.4
Active learning (Optimal experimental design)
19.5
Online machine learning
Part VI: Application - Biostatistic
20
Biostatistical Data Analysis
20.1
p-value correction
20.1.1
Bofferoni
20.1.2
Tukey’s HSD
20.1.3
Fisher
20.1.4
False Discovery Rate (FDR)
20.1.5
Q-value
20.1.6
E-value
20.2
Trend Tests
20.2.1
Cochran-Armitage test
20.2.2
Jonckheere’s trend test
20.3
Propensity score
20.4
PLINK
20.5
Polygenic Risk Score
20.6
RNA-seq Analysis
20.7
Metabolomics Analysis
20.7.1
SMART
20.7.2
pareto normalization
20.8
Permutational multivariate analysis of variance (PERMANOVA)
20.9
PERMDISP
20.10
Case Study
21
Survial Analysis
21.1
Unobserved data
21.2
Survival Function and Hazard Function
21.3
Kaplan–Meier Estimator
21.4
Log-rank Test
21.5
Proportional Hazards Model
21.6
Accelerated Failure Time (AFT) Model
21.7
Nelson–Aalen Estimator
21.8
Turnbull-Frydman Estimator
21.9
Restricted Median Survival Time (RMST)
21.10
Firth’s penalized logistic regression
21.11
Competing Risks
22
Causal Inference
22.1
DAG
23
Clinical Trials
23.1
Phase I
23.2
Phase II
23.3
Phase III
23.4
\(\alpha\)
spending function
Part VII: Application - Others
24
Financial Statistics
25
Social Statistics
25.1
Human Behaviour
25.2
Sport
26
Psychometrics
26.1
Item Response Theory
Part VIII: Computational Statistics
27
Statistical Learning
27.1
Root finding
27.1.1
Newton’s method (Newton–Raphson algorithm)
27.1.2
Gauss–Newton algorithm
27.1.3
Gradient Descent
27.1.4
Conjugate gradient method
27.1.5
quasi-Newton method
27.1.6
Nelder–Mead method
27.1.7
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
27.1.8
Davidon–Fletcher–Powell (DFP) formula
27.2
Model Selection
27.2.1
Information Criteria
27.3
Desicion Tree and Random Forest
27.4
Bagging
27.5
Boosting
27.5.1
Gradient Boost Desicion Tree (GBDT)
27.5.2
XGBoost
27.5.3
LightGBM
27.5.4
CATBoost
27.5.5
RUSBoost
27.6
Clustering Method
27.6.1
K-means, K-medoids
27.6.2
Fuzzy C-mean
27.6.3
KNN
27.6.4
Mean Shift
27.6.5
Hierarchical Clustering
27.6.6
DBSCAN
27.6.7
Principal Co-ordinates Analysis (PCoA)
27.6.8
Isolation Forest
27.6.9
Self-organizing map (SOM)
27.6.10
Spectral clustering
27.6.11
Quantum clustering
28
Statistical Computing
28.1
Generate random variables
28.1.1
Accept-Rejection method
28.1.2
Importance Sampling
28.2
Variance reduction
28.3
Gibbs sampling
28.4
Metropolis-Hastings
28.5
Monte-Carlo and Markov chain (MCMC)
28.6
Statistics Algorithm
28.6.1
EM algorithm
28.6.2
Back-fitting algorithm
28.7
Optimization
28.7.1
Linear Programming
28.7.2
Convex Programming
28.7.3
Non-linear Programming
28.7.4
Integer Programming
28.7.5
Particle Swarm Optimization (PSO)
28.7.6
Approximate Bayesian computation
29
Advanced Machine Learning
29.1
Double Machine Learning
29.2
Adversarial machine learning (AML)
29.3
Reinforcement Learning
29.4
Curriculum learning
29.5
Rule-based machine learning
29.6
Online machine learning
29.7
Knowledge Distillation
29.8
Automated machine learning (AutoML)
29.9
Computational Learning Theory
29.9.1
Probably Approximately Correct (PAC)
30
Deep Learning
30.1
Basic concept
30.2
DNN
30.3
CNN
30.4
RNN
30.4.1
Long Short-Term Memory (LSTM)
30.4.2
Gated Recurrent Unit (GRU)
30.5
Autoencoders & Variational Autoencoders (VAEs)
30.6
Generative adversarial networks (GAN)
30.7
Transformer Networks
30.8
Graph Neural Networks (GNNs)
30.9
Physics-informed neural networks (PINNs)
30.10
Deep Q-Networks (DQNs)
30.11
Quantum neural network (QNN)
30.12
Some famous models
30.12.1
LeNet、AlexNet、VGG、NiN
30.12.2
GoogLeNet
30.12.3
ResNet
30.12.4
DenseNet
30.12.5
U-Net
30.12.6
YOLO
30.13
Modern NN models
30.13.1
Deep Operator Network
30.13.2
Liquid Neural Network (LNN)
30.13.3
Kolmogorov-Arnold Networks (KAN)
30.13.4
Large Language Model (LLM)
Part IX: Computer Science Skills
31
Data Structure and Algorithm
31.1
Data Structure
31.1.1
Linked list
31.1.2
Satck
31.1.3
Queue
31.1.4
Tree
31.2
Algorithm
31.2.1
Graph and tree traversal algorithms
32
Information Theory
32.1
Entropy
32.2
Data compression
33
Big Data Analytics Techniques and Applications
33.1
Visualization
33.2
Hadoop
33.3
Spark
34
Multimodal Data Analysis
34.1
Network Analysis
34.2
Image and Video Analysis
34.3
Audio Analysis
34.4
Text Mining
35
Quantum Computing
35.1
Basic Concept
35.2
Quantum Algorithm
35.3
Quantum Machine Learning
Part X: Data Communication
36
Data Processing
36.1
Data Collection
36.2
Data Preprocessing
36.2.1
Data Cleaning
36.2.2
Handling Missing Data
36.2.3
Normalization & Standardization
36.2.4
Feature Engineering
37
Data Visualization
37.1
Why we need DataVis?
37.2
Visual Analytics
37.3
Sina plot
37.4
Radar chart
37.5
Parallel coordinates
37.6
Streamplots
37.7
Andrews plot
37.8
Spaghetti plot
37.9
Fish plot
37.10
Volcano plot
37.11
Circle Packing Chart
37.12
Chord diagram (information visualization)
37.13
Climate spiral and Warming stripes
37.14
Bland–Altman plot
37.15
Cherry Blossom Front
37.16
Symbolic data analysis (SDA)
37.17
Cartogram
37.18
Compositional data
38
Data Mining
38.1
Association rule learning
38.1.1
Apriori Algorithm
38.1.2
ECLAT Algorithm
38.1.3
FP-growth algorithm
38.2
Anomaly detection
39
Statistical Consulting
39.1
Garbage in, garbage out
Part XI: Data Integration
40
Data Management
40.1
Database
40.2
SQL
40.3
Data Compression
40.4
Data Integration
41
DataOps
42
Meta Analysis
42.1
Federated Learning
Part XII: Statistic Theory
43
Statistical Inference
43.1
Frequentist inference
43.1.1
Estimatation
43.2
Bayesian inference
44
Probability Theory
44.1
Basics from Measure Theory
44.2
Limit of the sets
44.3
Probability Inequalities
44.4
Bertrand Paradox
44.5
Stochastic ordering
44.6
Malliavin Calculus
44.7
Regular conditional probability
44.7.1
Markov kernel
44.8
Martingale
44.8.1
Reverse martingale
45
Algebraic Statistics
46
Free Probability Theory
Part XIII: Miscellaneous
47
Statistical Education
47.1
Stories
47.1.1
Buffon’s needle problem
47.1.2
Simpson’s paradox
47.1.3
Berkson’s paradox
47.1.4
Lindley’s paradox
47.1.5
Freedman’s paradox
47.1.6
Texas sharpshooter fallacy
47.1.7
Survivorship bias
47.1.8
All models are wrong
47.1.9
Stein’s phenomenon
47.1.10
German tank problem
47.1.11
Lindy effect
47.1.12
Doomsday argument
48
Ethics and Philosophy
48.1
Benford’s Law
48.2
Differential Privacy
48.3
Attack
Appendix
A
Matrix calculus
B
Advanced programming in R
B.1
Technique for Basic operator
B.2
Special operator
B.2.1
Inner function
B.2.2
Super assignment
<<-
B.3
Pipe operator
B.3.1
User define pipe operator
B.4
Non-standard Evaluation (NSE)
B.4.1
Tidy evaluation
B.5
Functional programming
B.5.1
Helper function
B.6
Progress bar
B.7
Parallel computing
Published with bookdown
Mastering Statistics with R
4.6
Renewal Theory
https://en.wikipedia.org/wiki/Renewal_theory