Mastering Statistics with R
Welcome
Prerequisites
Who am I ?
Acknowledgement
Progress of this book
Part I: Foundations
1
Probability Concept
1.1
Introduction to Probability
1.1.1
What is probability ?
1.1.2
Basic Mathematic
1.1.3
Set Theory
1.1.4
History of probability
1.1.5
Definitions of Probability
1.1.6
Conditional Probability and Independence
1.1.7
Bayes’ Theorem
1.2
Random variables
1.2.1
Random variables and probability functions
1.2.2
Expected values and Variance
1.2.3
Transformation of random variables
1.2.4
Families of distributions
1.3
Multivariate random variables
1.3.1
Joint distributions
1.3.2
Change of variables
1.3.3
Families of multivariate distributions
1.4
System of Moments
1.5
Limit Theorem
1.5.1
Some inequality
1.5.2
Law of large numbers
2
Basic Statistic
2.1
Descriptive Statistics
2.1.1
Frequency distribution
2.1.2
Measures of statistical characteristics
2.1.3
Exploratory data analysis (EDA)
2.2
Sampling
2.2.1
Random sampling methods (probability sampling techniques)
2.2.2
Nonrandom sampling methods (non-probability sampling techniques)
2.2.3
Other sampling methods
2.3
Estimation
2.3.1
Point Estimation
2.3.2
Interval Estimation
2.4
Testing Hypotheses
2.4.1
Null hypothesis vs. alternative hypothesis
2.4.2
The Neyman-Pearson Lemma
2.5
Some statistical test
2.5.1
Parametric statistical test
2.5.2
Non-parametric statistical test
2.6
Analysis of Variance (ANOVA)
2.6.1
Levene’s test
2.6.2
Bartlett’s test
2.6.3
One-way ANOVA
2.6.4
Two-way ANOVA
2.6.5
Welch’s ANOVA
2.6.6
Kruskal–Wallis test
2.6.7
Friedman test
2.6.8
Normality Test
2.7
Correlation Analysis and Linear Regression
3
Mathematical Statistics
3.1
Limiting Distributions
3.1.1
Converge in probability
3.1.2
Converge in distribution
4
Data Processing
4.1
Data Collection
4.2
Data Preprocessing
4.2.1
Data Cleaning
4.2.2
Handling Missing Data
4.2.3
Normalization & Standardization
4.2.4
Feature Engineering
Part II: Methodology - Beginner
5
Probability Models
5.1
Stochastic Process
5.1.1
Random Walk
5.1.2
Poisson Process
5.1.3
Markov Process
5.1.4
Wiener Process
5.1.5
Lévy Process
5.2
Markov Chain
5.2.1
Semi-Markov Chain
5.2.2
Hidden Markov Models
5.2.3
Mover-Stayer Models
5.3
Ergodic Theory
5.4
Degradation data
5.5
Stochastic Calculus
5.5.1
Ito’s Lemma
6
Regression Analysis
6.1
Ordinary Least Squares (OLS)
6.1.1
Variable Selection
6.1.2
Model Selection
6.2
Fixed Effect and Random Effect
6.3
Analysis of Covariance (ANCOVA)
6.4
Logistic Regression
6.5
Fractional Model
7
Categorical Data Analysis
8
Multivariate Analysis
8.1
Multivariate distributions
8.2
General Linear Model
8.3
Multivariate Analysis of Variance (MANOVA)
8.4
Multivariate Analysis of Covariance (MANCOVA)
8.5
Structural Equation Modeling (SEM)
8.6
Dimension Reduction Method
8.6.1
t-SNE
8.6.2
DBSCAN
8.6.3
Locally Linear Embedding
8.6.4
Laplacian Eigenmaps
8.6.5
ISOMAP
8.6.6
Uniform Manifold Approximation and Projection (UMAP)
8.7
Clustering Method
8.7.1
K-means, K-medoids
8.7.2
KNN
8.7.3
Principal Component Analysis (PCA)
8.7.4
Principal Co-ordinates Analysis (PCoA)
8.7.5
Multidimensional Scaling (MDS)
8.7.6
Self-organizing map (SOM)
8.7.7
Spectral clustering
8.7.8
Quantum clustering
8.7.9
Partial Least Squares Discriminant Analysis (PLS-DA)
8.7.10
Unweighted Paired-Group Method Using Arithmetic Means (UPGMA)
8.8
Factor Analysis
8.8.1
Kaiser–Meyer–Olkin test
8.8.2
Questionnaire
8.9
Canonical-correlation Analysis (CCA)
8.10
Analysis of Similarities (ANOSIM)
9
Time Series Analysis
9.1
Time Series Decomposition
9.2
ACF and PACF
9.3
White Noise
9.4
Autoregressive (AR)
9.5
Moving Average (MA)
9.6
Kalman Filter and Savitzky–Golay filter
9.7
ARMA, ARIMA, SARIMA, SARFIMA
9.8
Granger causality
9.9
VAR
9.10
GARCH
9.11
Factor Model
9.12
Some advanced topics
9.12.1
Lag regression
9.12.2
Mixed-frequency data
Part III: Methodology - Advanced
10
Generalized Linear Models
10.1
Weighted Least Square (WLS) and Generalized Least Square (GLS)
10.1.1
Rootogram
10.2
Complex Linear Model
10.3
Generalized Estimating Equation (GEE)
10.4
Hierarchical Linear Model
10.4.1
Instrumental variable
10.5
Multilevel Model
11
Spatial Statistics
11.1
Point-referenced Data
11.1.1
Gaussian Process
11.1.2
Exploratory data analysis
11.1.3
Models for spatial dependence
11.1.4
Kriging (Spatial prediction)
11.2
Areal/Lattice Data
11.2.1
Spatial autocorrelation
11.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
11.3
Point Pattern Data
11.3.1
Poisson processes
11.3.2
Cox processes
11.3.3
K-functions
11.4
Other Topics
11.4.1
Spatio-temporal models
11.4.2
Frequency domain methods
11.4.3
Deep Kriging
12
Functional Data Analysis
13
Bayesian Analysis
13.1
Laplace Approximation and BIC
14
High Dimensional Data
Part IV: Methodology - Others
15
Non-parametric model
15.1
Quantile Regression
15.2
LOESS
15.3
Curve estimation
15.3.1
Kernel
16
Extreme value theory
17
Directional Statistics
17.1
Circular Regression
18
Topological Data Analysis
Part V: Application - Industrial Statistics
19
Quality Control
19.1
History
19.2
7 tools
19.3
ARL
19.4
\(R\)
chart
19.5
\(s\)
chart
19.6
\(\bar{X}\)
chart
19.7
\(p\)
chart
19.8
CUSUM
19.9
EWMA
19.10
Sequential probability ratio test
20
Reliability Analysis
21
Design of Experiments
21.1
Latin hypercube
21.2
Sequential design
21.3
Space-filling design
21.4
Active learning (Optimal experimental design)
21.5
Online machine learning
Part VI: Application - Biostatistic
22
Biostatistical Data Analysis
22.1
p-value correction
22.1.1
Bofferoni
22.1.2
Tukey’s HSD
22.1.3
Fisher
22.1.4
False Discovery Rate (FDR)
22.1.5
Q-value
22.1.6
E-value
22.2
Trend Tests
22.2.1
Cochran-Armitage test
22.2.2
Jonckheere’s trend test
22.3
Propensity score
22.4
PLINK
22.5
Polygenic Risk Score
22.6
RNA-seq Analysis
22.7
Metabolomics Analysis
22.7.1
SMART
22.7.2
pareto normalization
22.8
Permutational multivariate analysis of variance (PERMANOVA)
22.9
PERMDISP
23
Survial Analysis
23.1
Unobserved data
23.2
Survival Function and Hazard Function
23.3
Kaplan–Meier Estimator
23.4
Log-rank Test
23.5
Proportional Hazards Model
23.6
Accelerated Failure Time (AFT) Model
23.7
Nelson–Aalen Estimator
23.8
Turnbull-Frydman Estimator
23.9
Restricted Median Survival Time (RMST)
23.10
Firth’s penalized logistic regression
23.11
Competing Risks
24
Causal Inference
24.1
DAG
25
Statistical Designs and Analyses in Clinical Trials
25.1
Phase I
25.2
Phase II
25.3
Phase III
25.4
\(\alpha\)
spending function
Part VII: Application - Others
26
Finance
27
Social science
27.1
28
Psychometrics
28.1
Item Response Theory
29
Sport
Part VIII: Computational Statistics
30
Statistical Learning
30.1
Root finding
30.1.1
Newton’s method (Newton–Raphson algorithm)
30.1.2
Gauss–Newton algorithm
30.1.3
Gradient Descent
30.1.4
Conjugate gradient method
30.1.5
Nelder–Mead method
30.1.6
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
30.2
Information Criteria
30.2.1
AIC
30.2.2
BIC
30.3
Desicion Tree and Random Forest
30.4
Bagging
30.5
Boosting
30.5.1
Gradient Boost Desicion Tree (GBDT)
30.5.2
XGBoost
30.5.3
LightGBM
30.5.4
CATBoost
30.5.5
RUSBoost
31
Statistical Computing
31.1
Generate random variables
31.1.1
Inverse transform method
31.1.2
Accept-Rejection method
31.1.3
Importance Sampling
31.2
Variance reduction
31.3
Gibbs sampling
31.4
Metropolis-Hastings
31.5
Monte-Carlo and Markov chain (MCMC)
31.6
EM algorithm
31.7
Back-fitting algorithm
31.8
Particle Swarm Optimization (PSO)
Part IV: Deal with Computer Science
32
Data Structure and Algorithm
32.1
Data Structure
32.1.1
Linked list
32.1.2
Satck
32.1.3
Queue
32.1.4
Tree
32.2
Algorithm
32.2.1
Graph and tree traversal algorithms
33
Information Theory
33.1
Entropy
33.2
Data compression
34
Machine Learning
34.1
Double Machine Learning
34.2
Adversarial machine learning (AML)
34.3
Reinforcement Learning
34.4
Curriculum learning
34.5
Rule-based machine learning
34.6
Online machine learning
34.7
Quantum Machine Learning
34.8
Computational Learning Theory
34.8.1
Probably Approximately Correct (PAC)
35
Big Data Analytics Techniques and Applications
35.1
Visualization
35.2
Hadoop
35.3
Spark
36
Text Mining
37
Image Processing
38
Audio Analysis
39
Network Analysis
40
Deep Learning
40.1
Basic concept
40.2
DNN
40.3
CNN
40.4
RNN
40.4.1
Long Short-Term Memory (LSTM)
40.4.2
Gated Recurrent Unit (GRU)
40.5
Generative adversarial networks (GAN)
40.6
Transformer Networks
40.7
Autoencoders & Variational Autoencoders (VAEs)
40.8
Graph Neural Networks (GNNs)
40.9
Physics-informed neural networks (PINNs)
40.10
Deep Q-Networks (DQNs)
40.11
Quantum neural network (QNN)
40.12
Some famous models
40.12.1
LeNet、AlexNet、VGG、NiN
40.12.2
GoogLeNet
40.12.3
ResNet
40.12.4
DenseNet
40.12.5
YOLO
40.13
Modern NN models
40.13.1
Liquid Neural Network (LNN)
40.13.2
Kolmogorov-Arnold Networks (KAN)
Part VI: Data Communication
41
Data Visualization
41.1
Visual Analytics
41.2
Radar chart
41.3
Parallel coordinates
41.4
Andrews plot
41.5
Fish plot
41.6
Circle Packing Chart
41.7
Chord diagram (information visualization)
41.8
Climate spiral and Warming stripes
41.9
Symbolic data analysis (SDA)
42
Meta-analysis
43
Data Mining
43.1
Association rule learning
43.2
Anomaly detection
43.3
Data Management
43.3.1
Database
43.3.2
SQL
43.3.3
Data Compression
43.3.4
Data Integration
44
Consulting in Statistics
Part XII: Statistic Theory
45
Statistical Inference
46
Decision Theory
46.1
Regret
47
Probability Theory
47.1
Basics from Measure Theory
47.2
Limit of the sets
47.3
Probability Inequalities
47.4
Stochastic ordering
47.5
Malliavin Calculus
47.6
Regular conditional probability
47.6.1
Markov kernel
47.7
Martingale
47.7.1
Reverse martingale
48
Algebraic Statistics
49
Free Probability Theory
Part VIII: Miscellaneous
50
Statistical Education
51
Ethics and Philosophy
51.1
Differential Privacy
Appendix
A
Matrix calculus
B
Advanced programming in R
B.1
Technique for Basic operator
B.2
Special operator
B.2.1
Inner function
B.2.2
Super assignment
<<-
B.3
Pipe operator
B.3.1
User define pipe operator
B.4
Non-standard Evaluation (NSE)
B.4.1
Tidy evaluation
B.5
Functional programming
B.5.1
Helper function
B.6
Progress bar
B.7
Parallel computing
Published with bookdown
Mastering Statistics with R
Chapter 27
Social science
Chapter 27 Social science