Mastering Statistics with R
Welcome
Prerequisites
Who am I ?
Acknowledgement
Progress of this book
Part I: Foundations
1
Probability Concept
1.1
Introduction to Probability
1.1.1
What is probability ?
1.1.2
Basic Mathematic
1.1.3
Set Theory
1.1.4
History of probability
1.1.5
Definitions of Probability
1.1.6
Conditional Probability and Independence
1.1.7
Bayes’ Theorem
1.2
Random variables
1.2.1
Random variables and probability functions
1.2.2
Expected values and Variance
1.2.3
Transformation of random variables
1.2.4
Families of distributions
1.3
Multivariate random variables
1.3.1
Joint distributions
1.3.2
Change of variables
1.3.3
Families of multivariate distributions
1.4
System of Moments
1.5
Limit Theorem
1.5.1
Some inequality
1.5.2
Law of large numbers
2
Basic Statistic
2.1
Descriptive Statistics
2.1.1
Frequency distribution
2.1.2
Measures of statistical characteristics
2.1.3
Exploratory data analysis (EDA)
2.2
Sampling
2.2.1
Random sampling methods (probability sampling techniques)
2.2.2
Nonrandom sampling methods (non-probability sampling techniques)
2.2.3
Other sampling methods
2.3
Estimation
2.3.1
Point Estimation
2.3.2
Interval Estimation
2.4
Testing Hypotheses
2.4.1
Null hypothesis vs. alternative hypothesis
2.4.2
The Neyman-Pearson Lemma
2.5
Some statistical test
2.5.1
Parametric statistical test
2.5.2
Non-parametric statistical test
2.6
Analysis of Variance (ANOVA)
2.6.1
Levene’s test
2.6.2
Bartlett’s test
2.6.3
One-way ANOVA
2.6.4
Two-way ANOVA
2.6.5
Welch’s ANOVA
2.6.6
Kruskal–Wallis test
2.6.7
Friedman test
2.6.8
Normality Test
2.7
Correlation Analysis and Linear Regression
3
Mathematical Statistics
3.1
Limiting Distributions
3.1.1
Converge in probability
3.1.2
Converge in distribution
Part II: Methodology - Beginner
4
Probability Models
4.1
Stochastic Process
4.1.1
Random Walk
4.1.2
Poisson Process
4.1.3
Markov Process
4.1.4
Wiener Process
4.1.5
Lévy Process
4.2
Markov Chain
4.2.1
Semi-Markov Chain
4.2.2
Hidden Markov Models
4.2.3
Mover-Stayer Models
4.3
Ergodic Theory
4.4
Degradation data
4.5
Stochastic Calculus
4.5.1
Ito’s Lemma
5
Regression Analysis
5.1
Ordinary Least Squares (OLS)
5.1.1
Variable Selection
5.1.2
Model Selection
5.2
Fixed Effect and Random Effect
5.3
Analysis of Covariance (ANCOVA)
5.4
Logistic Regression
5.5
Fractional Model
6
Categorical Data Analysis
7
Multivariate Analysis
7.1
Multivariate distributions
7.2
General Linear Model
7.3
Multivariate Analysis of Variance (MANOVA)
7.4
Multivariate Analysis of Covariance (MANCOVA)
7.5
Structural Equation Modeling (SEM)
7.6
Dimension Reduction Method
7.6.1
t-SNE
7.6.2
DBSCAN
7.6.3
Locally Linear Embedding
7.6.4
Laplacian Eigenmaps
7.6.5
ISOMAP
7.6.6
Uniform Manifold Approximation and Projection (UMAP)
7.7
Clustering Method
7.7.1
K-means, K-medoids
7.7.2
KNN
7.7.3
Principal Component Analysis (PCA)
7.7.4
Principal Co-ordinates Analysis (PCoA)
7.7.5
Multidimensional Scaling (MDS)
7.7.6
Self-organizing map (SOM)
7.7.7
Spectral clustering
7.7.8
Quantum clustering
7.7.9
Partial Least Squares Discriminant Analysis (PLS-DA)
7.7.10
Unweighted Paired-Group Method Using Arithmetic Means (UPGMA)
7.8
Factor Analysis
7.8.1
Kaiser–Meyer–Olkin test
7.8.2
Questionnaire
7.9
Canonical-correlation Analysis (CCA)
7.10
Analysis of Similarities (ANOSIM)
8
Time Series Analysis
8.1
Time Series Decomposition
8.2
ACF and PACF
8.3
White Noise
8.4
Autoregressive (AR)
8.5
Moving Average (MA)
8.6
Kalman Filter and Savitzky–Golay filter
8.7
ARMA, ARIMA, SARIMA, SARFIMA
8.8
Granger causality
8.9
VAR
8.10
GARCH
8.11
Factor Model
8.12
Some advanced topics
8.12.1
Lag regression
8.12.2
Mixed-frequency data
Part III: Methodology - Advanced
9
Generalized Linear Models
9.1
Weighted Least Square (WLS) and Generalized Least Square (GLS)
9.1.1
Rootogram
9.2
Complex Linear Model
9.3
Generalized Estimating Equation (GEE)
9.4
Hierarchical Linear Model
9.4.1
Instrumental variable
9.5
Multilevel Model
10
Spatial Statistics
10.1
Point-referenced Data
10.1.1
Gaussian Process
10.1.2
Exploratory data analysis
10.1.3
Models for spatial dependence
10.1.4
Kriging (Spatial prediction)
10.2
Areal/Lattice Data
10.2.1
Spatial autocorrelation
10.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
10.3
Point Pattern Data
10.3.1
Poisson processes
10.3.2
Cox processes
10.3.3
K-functions
10.4
Other Topics
10.4.1
Spatio-temporal models
10.4.2
Frequency domain methods
10.4.3
Deep Kriging
11
Functional Data Analysis
12
Bayesian Analysis
12.1
Laplace Approximation and BIC
13
High Dimensional Data
Part IV: Methodology - Others
14
Non-parametric model
14.1
Quantile Regression
14.2
LOESS
14.3
Curve estimation
14.3.1
Kernel
15
Extreme value theory
16
Directional Statistics
16.1
Circular Regression
17
Topological Data Analysis
Part V: Application - Industrial Statistics
18
Quality Control
18.1
History
18.2
7 tools
18.3
ARL
18.4
\(R\)
chart
18.5
\(s\)
chart
18.6
\(\bar{X}\)
chart
18.7
\(p\)
chart
18.8
CUSUM
18.9
EWMA
18.10
Sequential probability ratio test
19
Reliability Analysis
20
Design of Experiments
20.1
Latin hypercube
20.2
Sequential design
20.3
Space-filling design
20.4
Active learning (Optimal experimental design)
20.5
Online machine learning
Part VI: Application - Biostatistic
21
Biostatistical Data Analysis
21.1
p-value correction
21.1.1
Bofferoni
21.1.2
Tukey’s HSD
21.1.3
Fisher
21.1.4
False Discovery Rate (FDR)
21.1.5
Q-value
21.1.6
E-value
21.2
Trend Tests
21.2.1
Cochran-Armitage test
21.2.2
Jonckheere’s trend test
21.3
Propensity score
21.4
PLINK
21.5
Polygenic Risk Score
21.6
RNA-seq Analysis
21.7
Metabolomics Analysis
21.7.1
SMART
21.7.2
pareto normalization
21.8
Permutational multivariate analysis of variance (PERMANOVA)
21.9
PERMDISP
22
Survial Analysis
22.1
Unobserved data
22.2
Survival Function and Hazard Function
22.3
Kaplan–Meier Estimator
22.4
Log-rank Test
22.5
Proportional Hazards Model
22.6
Accelerated Failure Time (AFT) Model
22.7
Nelson–Aalen Estimator
22.8
Turnbull-Frydman Estimator
22.9
Restricted Median Survival Time (RMST)
22.10
Firth’s penalized logistic regression
22.11
Competing Risks
23
Causal Inference
23.1
DAG
24
Statistical Designs and Analyses in Clinical Trials
24.1
Phase I
24.2
Phase II
24.3
Phase III
24.4
\(\alpha\)
spending function
Part VII: Application - Others
25
Financial Statistics
26
Social Statistics
26.1
Human Behaviour
26.2
Sport
27
Psychometrics
27.1
Item Response Theory
Part VIII: Computational Statistics
28
Statistical Learning
28.1
Root finding
28.1.1
Newton’s method (Newton–Raphson algorithm)
28.1.2
Gauss–Newton algorithm
28.1.3
Gradient Descent
28.1.4
Conjugate gradient method
28.1.5
Nelder–Mead method
28.1.6
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
28.2
Information Criteria
28.2.1
AIC
28.2.2
BIC
28.3
Desicion Tree and Random Forest
28.4
Bagging
28.5
Boosting
28.5.1
Gradient Boost Desicion Tree (GBDT)
28.5.2
XGBoost
28.5.3
LightGBM
28.5.4
CATBoost
28.5.5
RUSBoost
29
Statistical Computing
29.1
Generate random variables
29.1.1
Inverse transform method
29.1.2
Accept-Rejection method
29.1.3
Importance Sampling
29.2
Variance reduction
29.3
Gibbs sampling
29.4
Metropolis-Hastings
29.5
Monte-Carlo and Markov chain (MCMC)
29.6
EM algorithm
29.7
Back-fitting algorithm
29.8
Particle Swarm Optimization (PSO)
30
Advanced Machine Learning
30.1
Double Machine Learning
30.2
Adversarial machine learning (AML)
30.3
Reinforcement Learning
30.4
Curriculum learning
30.5
Rule-based machine learning
30.6
Online machine learning
30.7
Quantum Machine Learning
30.8
Computational Learning Theory
30.8.1
Probably Approximately Correct (PAC)
31
Deep Learning
31.1
Basic concept
31.2
DNN
31.3
CNN
31.4
RNN
31.4.1
Long Short-Term Memory (LSTM)
31.4.2
Gated Recurrent Unit (GRU)
31.5
Generative adversarial networks (GAN)
31.6
Transformer Networks
31.7
Autoencoders & Variational Autoencoders (VAEs)
31.8
Graph Neural Networks (GNNs)
31.9
Physics-informed neural networks (PINNs)
31.10
Deep Q-Networks (DQNs)
31.11
Quantum neural network (QNN)
31.12
Some famous models
31.12.1
LeNet、AlexNet、VGG、NiN
31.12.2
GoogLeNet
31.12.3
ResNet
31.12.4
DenseNet
31.12.5
YOLO
31.13
Modern NN models
31.13.1
Liquid Neural Network (LNN)
31.13.2
Kolmogorov-Arnold Networks (KAN)
Part IX: Deal with Computer Science
32
Data Structure and Algorithm
32.1
Data Structure
32.1.1
Linked list
32.1.2
Satck
32.1.3
Queue
32.1.4
Tree
32.2
Algorithm
32.2.1
Graph and tree traversal algorithms
33
Information Theory
33.1
Entropy
33.2
Data compression
34
Big Data Analytics Techniques and Applications
34.1
Visualization
34.2
Hadoop
34.3
Spark
35
Multimodal Data Analysis
35.1
Network Analysis
35.2
Image and Video Analysis
35.3
Audio Analysis
35.4
Text Mining
Part X: Data Communication
36
Data Processing
36.1
Data Collection
36.2
Data Preprocessing
36.2.1
Data Cleaning
36.2.2
Handling Missing Data
36.2.3
Normalization & Standardization
36.2.4
Feature Engineering
37
Data Visualization
37.1
Visual Analytics
37.2
Radar chart
37.3
Parallel coordinates
37.4
Andrews plot
37.5
Fish plot
37.6
Circle Packing Chart
37.7
Chord diagram (information visualization)
37.8
Climate spiral and Warming stripes
37.9
Symbolic data analysis (SDA)
38
Data Mining
38.1
Association rule learning
38.2
Anomaly detection
39
Statistical Consulting
Part XI: Data Integration
40
Data Management
40.1
Database
40.2
SQL
40.3
Data Compression
40.4
Data Integration
41
DataOps
42
Meta Analysis
42.1
Federated Learning
Part XII: Statistic Theory
43
Statistical Inference
43.1
Frequentist inference
43.2
Bayesian inference
44
Decision Theory
44.1
Regret
45
Probability Theory
45.1
Basics from Measure Theory
45.2
Limit of the sets
45.3
Probability Inequalities
45.4
Stochastic ordering
45.5
Malliavin Calculus
45.6
Regular conditional probability
45.6.1
Markov kernel
45.7
Martingale
45.7.1
Reverse martingale
46
Algebraic Statistics
47
Free Probability Theory
Part VIII: Miscellaneous
48
Statistical Education
49
Ethics and Philosophy
49.1
Differential Privacy
Appendix
A
Matrix calculus
B
Advanced programming in R
B.1
Technique for Basic operator
B.2
Special operator
B.2.1
Inner function
B.2.2
Super assignment
<<-
B.3
Pipe operator
B.3.1
User define pipe operator
B.4
Non-standard Evaluation (NSE)
B.4.1
Tidy evaluation
B.5
Functional programming
B.5.1
Helper function
B.6
Progress bar
B.7
Parallel computing
Published with bookdown
Mastering Statistics with R
20.4
Active learning (Optimal experimental design)