Mastering Statistics with R
Welcome
Prerequisites
Who am I ?
Acknowledgement
Progress of this book
Part I: Foundations
1
Probability Concept
1.1
Introduction to Probability
1.1.1
What is probability ?
1.1.2
Basic Mathematic
1.1.3
Set Theory
1.1.4
History of probability
1.1.5
Definitions of Probability
1.1.6
Conditional Probability and Independence
1.1.7
Bayes’ Theorem
1.2
Random variables
1.2.1
Random variables and probability functions
1.2.2
Expected values and Variance
1.2.3
Transformation of random variables
1.2.4
Families of distributions
1.3
Multivariate random variables
1.3.1
Joint distributions
1.3.2
Change of variables
1.3.3
Families of multivariate distributions
1.4
System of Moments
1.5
Limit Theorem
1.5.1
Some inequality
1.5.2
Law of large numbers
2
Basic Statistic
2.1
Descriptive Statistics
2.1.1
Frequency distribution
2.1.2
Measures of statistical characteristics
2.1.3
Exploratory data analysis (EDA)
2.2
Sampling
2.2.1
Random sampling methods (probability sampling techniques)
2.2.2
Nonrandom sampling methods (non-probability sampling techniques)
2.2.3
Other sampling methods
2.3
Estimation
2.3.1
Point Estimation
2.3.2
Interval Estimation
2.4
Testing Hypotheses
2.4.1
Null hypothesis vs. alternative hypothesis
2.4.2
The Neyman-Pearson Lemma
2.5
Some statistical test
2.5.1
Parametric statistical test
2.5.2
Non-parametric statistical test
2.6
Analysis of Variance (ANOVA)
2.6.1
Levene’s test
2.6.2
Bartlett’s test
2.6.3
One-way ANOVA
2.6.4
Two-way ANOVA
2.6.5
Welch’s ANOVA
2.6.6
Kruskal–Wallis test
2.6.7
Friedman test
2.6.8
Normality Test
2.7
Correlation Analysis and Linear Regression
3
Mathematical Statistics
3.1
Limiting Distributions
3.1.1
Converge in probability
3.1.2
Converge in distribution
4
Data Processing
4.1
Data Collection
4.2
Data Preprocessing
4.2.1
Data Cleaning
4.2.2
Handling Missing Data
4.2.3
Normalization & Standardization
4.2.4
Feature Engineering
Part II: Methodology - Beginner
5
Probability Models
5.1
Stochastic Process
5.1.1
Random Walk
5.1.2
Poisson Process
5.1.3
Markov Process
5.1.4
Wiener Process
5.1.5
Lévy Process
5.2
Markov Chain
5.2.1
Semi-Markov Chain
5.2.2
Hidden Markov Models
5.2.3
Mover-Stayer Models
5.3
Ergodic Theory
5.4
Stochastic Calculus
5.4.1
Ito’s Lemma
6
Regression Analysis
6.1
Ordinary Least Squares (OLS)
6.1.1
Variable Selection
6.1.2
Model Selection
6.2
Fixed Effect and Random Effect
6.3
Analysis of Covariance (ANCOVA)
6.4
Logistic Regression
6.5
Fractional Model
7
Categorical Data Analysis
8
Multivariate Analysis
8.1
Multivariate distributions
8.2
General Linear Model
8.3
Multivariate Analysis of Variance (MANOVA)
8.4
Multivariate Analysis of Covariance (MANCOVA)
8.5
Structural Equation Modeling (SEM)
8.6
Dimension Reduction Method
8.6.1
t-SNE
8.6.2
DBSCAN
8.6.3
Locally Linear Embedding
8.6.4
Laplacian Eigenmaps
8.6.5
ISOMAP
8.6.6
Uniform Manifold Approximation and Projection (UMAP)
8.7
Clustering Method
8.7.1
K-means, K-medoids
8.7.2
KNN
8.7.3
Principal Component Analysis (PCA)
8.7.4
Principal Co-ordinates Analysis (PCoA)
8.7.5
Multidimensional Scaling (MDS)
8.7.6
Self-organizing map (SOM)
8.7.7
Spectral clustering
8.7.8
Quantum clustering
8.7.9
Partial Least Squares Discriminant Analysis (PLS-DA)
8.7.10
Unweighted Paired-Group Method Using Arithmetic Means (UPGMA)
8.8
Factor Analysis
8.8.1
Kaiser–Meyer–Olkin test
8.8.2
Questionnaire
8.9
Canonical-correlation Analysis (CCA)
8.10
Analysis of Similarities (ANOSIM)
9
Time Series Analysis
9.1
Time Series Decomposition
9.2
ACF and PACF
9.3
White Noise
9.4
Autoregressive (AR)
9.5
Moving Average (MA)
9.6
Kalman Filter and Savitzky–Golay filter
9.7
ARMA, ARIMA, SARIMA, SARFIMA
9.8
Granger causality
9.9
VAR
9.10
GARCH
9.11
Factor Model
9.12
Some advanced topics
9.12.1
Lag regression
9.12.2
Mixed-frequency data
Part III: Methodology - Advanced
10
Generalized Linear Models
10.1
Weighted Least Square (WLS) and Generalized Least Square (GLS)
10.1.1
Rootogram
10.2
Complex Linear Model
10.3
Generalized Estimating Equation (GEE)
10.4
Hierarchical Linear Model
10.4.1
Instrumental variable
10.5
Multilevel Model
11
Spatial Statistics
11.1
Point-referenced Data
11.1.1
Gaussian Process
11.1.2
Exploratory data analysis
11.1.3
Models for spatial dependence
11.1.4
Kriging (Spatial prediction)
11.2
Areal/Lattice Data
11.2.1
Spatial autocorrelation
11.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
11.3
Point Pattern Data
11.3.1
Poisson processes
11.3.2
Cox processes
11.3.3
K-functions
11.4
Other Topics
11.4.1
Spatio-temporal models
11.4.2
Frequency domain methods
11.4.3
Deep Kriging
12
Functional Data Analysis
13
Bayesian Analysis
13.1
Laplace Approximation and BIC
14
High Dimensional Data
Part IV: Methodology - Others
15
Non-parametric model
15.1
Quantile Regression
15.2
LOESS
15.3
Curve estimation
15.3.1
Kernel
16
Extreme value theory
17
Directional Statistics
17.1
Circular Regression
Part V: Industrial Statistics
18
Quality Control
18.1
History
18.2
7 tools
18.3
ARL
18.4
\(R\)
chart
18.5
\(s\)
chart
18.6
\(\bar{X}\)
chart
18.7
\(p\)
chart
18.8
CUSUM
18.9
EWMA
18.10
Sequential probability ratio test
19
Reliability Analysis
20
Design of Experiments
20.1
Latin hypercube
20.2
Sequential design
20.3
Space-filling design
20.4
Active learning (Optimal experimental design)
20.5
Online machine learning
Part VI: Biostatistic
21
Survial Analysis
21.1
Unobserved data
21.2
Survival Function and Hazard Function
21.3
Kaplan–Meier Estimator
21.4
Log-rank Test
21.5
Proportional Hazards Model
21.6
Accelerated Failure Time (AFT) Model
21.7
Nelson–Aalen Estimator
21.8
Turnbull-Frydman Estimator
21.9
Restricted Median Survival Time (RMST)
21.10
Firth’s penalized logistic regression
21.11
Competing Risks
22
Biostatistical Data Analysis
22.1
p-value correction
22.1.1
Bofferoni
22.1.2
Tukey’s HSD
22.1.3
Fisher
22.1.4
False Discovery Rate (FDR)
22.1.5
Q-value
22.1.6
E-value
22.2
Trend Tests
22.2.1
Cochran-Armitage test
22.2.2
Jonckheere’s trend test
22.3
Propensity score
22.4
PLINK
22.5
Polygenic Risk Score
22.6
RNA-seq Analysis
22.7
Metabolomics Analysis
22.7.1
SMART
22.7.2
pareto normalization
22.8
Permutational multivariate analysis of variance (PERMANOVA)
22.9
PERMDISP
23
Causal Inference
23.1
DAG
24
Statistical Designs and Analyses in Clinical Trials
24.1
Phase I
24.2
Phase II
24.3
Phase III
24.4
\(\alpha\)
spending function
Part VII: Other Applications
25
Social science
25.1
26
Psychometrics
26.1
Item Response Theory
27
Industry
27.1
Degradation data
Part VIII: Computational Statistics
28
Statistical Learning
28.1
Root finding
28.1.1
Newton’s method (Newton–Raphson algorithm)
28.1.2
Gauss–Newton algorithm
28.1.3
Gradient Descent
28.1.4
Conjugate gradient method
28.1.5
Nelder–Mead method
28.1.6
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
28.2
Information Criteria
28.2.1
AIC
28.2.2
BIC
28.3
Desicion Tree and Random Forest
28.4
Bagging
28.5
Boosting
28.5.1
Gradient Boost Desicion Tree (GBDT)
28.5.2
XGBoost
28.5.3
LightGBM
28.5.4
CATBoost
28.5.5
RUSBoost
29
Statistical Computing
29.1
Generate random variables
29.1.1
Inverse transform method
29.1.2
Accept-Rejection method
29.2
Variance reduction
29.3
Monte-Carlo and Markov chain (MCMC)
29.4
EM algorithm
29.5
Back-fitting algorithm
29.6
Particle Swarm Optimization (PSO)
Part IV: Deal with Computer Science
30
Data Structure and Algorithm
30.1
Data Structure
30.1.1
Linked list
30.1.2
Satck
30.1.3
Queue
30.1.4
Tree
30.2
Algorithm
30.2.1
Graph and tree traversal algorithms
31
Information Theory
31.1
Entropy
31.2
Data compression
32
Machine Learning
32.1
Double Machine Learning
32.2
Adversarial machine learning (AML)
32.3
Reinforcement Learning
32.4
Curriculum learning
32.5
Rule-based machine learning
32.6
Online machine learning
32.7
Quantum Machine Learning
32.8
Computational Learning Theory
32.8.1
Probably Approximately Correct (PAC)
33
Big Data Analytics Techniques and Applications
33.1
Visualization
33.2
Hadoop
33.3
Spark
34
Image Processing
35
Deep Learning
35.1
Basic concept
35.2
DNN
35.3
CNN
35.4
RNN
35.4.1
Long Short-Term Memory (LSTM)
35.4.2
Gated Recurrent Unit (GRU)
35.5
Generative adversarial networks (GAN)
35.6
Transformer Networks
35.7
Autoencoders & Variational Autoencoders (VAEs)
35.8
Graph Neural Networks (GNNs)
35.9
Physics-informed neural networks (PINNs)
35.10
Deep Q-Networks (DQNs)
35.11
Quantum neural network (QNN)
35.12
Some famous models
35.12.1
LeNet、AlexNet、VGG、NiN
35.12.2
GoogLeNet
35.12.3
ResNet
35.12.4
DenseNet
35.12.5
YOLO
35.13
Modern NN models
35.13.1
Liquid Neural Network (LNN)
35.13.2
Kolmogorov-Arnold Networks (KAN)
Part VI: Data Communication
36
Data Visualization
36.1
Visual Analytics
36.2
Radar chart
36.3
Parallel coordinates
36.4
Andrews plot
36.5
Fish plot
36.6
Circle Packing Chart
36.7
Chord diagram (information visualization)
36.8
Climate spiral and Warming stripes
36.9
Symbolic data analysis (SDA)
37
Meta-analysis
38
Data Mining
38.1
Association rule learning
38.2
Anomaly detection
38.3
Data Management
38.3.1
Database
38.3.2
SQL
38.3.3
Data Compression
38.3.4
Data Integration
39
Consulting in Statistics
Part XII: Statistic Theory
40
Statistical Inference
41
Decision Theory
41.1
Regret
42
Probability Theory
42.1
Basics from Measure Theory
42.2
Limit of the sets
42.3
Probability Inequalities
42.4
Stochastic ordering
42.5
Malliavin Calculus
42.6
Regular conditional probability
42.6.1
Markov kernel
42.7
Martingale
42.7.1
Reverse martingale
Part VIII: Miscellaneous
43
Statistical Education
44
Ethics and Philosophy
44.1
Differential Privacy
Appendix
A
Matrix calculus
B
Advanced programming in R
B.1
Technique for Basic operator
B.2
Special operator
B.2.1
Inner function
B.2.2
Super assignment
<<-
B.3
Pipe operator
B.3.1
User define pipe operator
B.4
Non-standard Evaluation (NSE)
B.4.1
Tidy evaluation
B.5
Functional programming
B.5.1
Helper function
B.6
Progress bar
B.7
Parallel computing
Published with bookdown
Mastering Statistics with R
Chapter 27
Industry