Processing math: 100%
Code
Show All Code
Hide All Code
Type to search
Mastering Statistics with R
Welcome!
Preface
Acknowledgement
Progress of this book
How to use this book?
Part I: Foundations
1
Probability Concept
1.1
Introduction to Probability
1.1.1
What is probability ?
1.1.2
Basic Mathematic
1.1.3
History of probability
1.1.4
Definitions of Probability
1.1.5
Conditional Probability and Independence
1.1.6
Bayes’ Theorem
1.2
Random Variables
1.2.1
Random variables and probability functions
1.2.2
Expected values and Variance
1.2.3
Transformation of random variables
1.2.4
Families of distributions
1.3
Multivariate Random Variables
1.3.1
Joint distributions
1.3.2
Change of variables
1.3.3
Families of multivariate distributions
1.4
System of Moments
1.5
Limit Theorem
1.5.1
Some inequality
1.5.2
Law of large numbers
2
Elementary Statistics
2.1
Descriptive Statistics
2.1.1
Frequency distribution
2.1.2
Measures of statistical characteristics
2.1.3
Exploratory data analysis (EDA)
2.2
Sampling
2.2.1
Random sampling methods (probability sampling techniques)
2.2.2
Nonrandom sampling methods (non-probability sampling techniques)
2.2.3
Other sampling methods
2.3
Estimation
2.3.1
Point Estimation
2.3.2
Interval Estimation
2.4
Testing Hypotheses
2.4.1
Null hypothesis vs. alternative hypothesis
2.4.2
The Neyman-Pearson Lemma
2.5
Some statistical test
2.5.1
Parametric statistical test
2.5.2
Non-parametric statistical test
2.6
Analysis of Variance (ANOVA)
2.6.1
Levene’s test
2.6.2
Bartlett’s test
2.6.3
One-way ANOVA
2.6.4
Two-way ANOVA
2.6.5
Welch’s ANOVA
2.6.6
Kruskal–Wallis test
2.6.7
Friedman test
2.6.8
Normality Test
2.7
Correlation Analysis and Linear Regression
3
Mathematical Statistics
3.1
Properties of estimators
3.1.1
Uniformly Minimum Variance Unbiased Estimator (UMVUE)
3.2
Limiting Distributions
3.2.1
Converge in probability
3.2.2
Converge in distribution
3.3
Asymptotic Theory
3.4
Hypothesis Testing Theory
3.4.1
MP test and UMP test
3.4.2
monotone likelihood ratio (MLR)
3.4.3
LR-test, GLRT
3.4.4
sequential probability ratio test (SPRT)
3.5
Decision Theory
3.5.1
Regret
3.6
Bayesian Tests
Part II: Methodology - Beginner
4
Probability Models
4.1
Review of Probability Computation
4.2
Stochastic Process
4.2.1
Discrete-time stochastic process
4.2.2
Continuous time stochastic process
4.2.3
Random Walk
4.2.4
Poisson Process
4.2.5
Markov Process
4.2.6
Wiener Process
4.2.7
Lévy Process
4.3
Markov Chain
4.3.1
Semi-Markov Chain
4.3.2
Hidden Markov Models
4.3.3
Mover-Stayer Models
4.4
Ergodic Theory
4.5
Degradation data
4.6
Renewal Theory
4.7
Ruin Theory
4.8
Extreme Value Theory
4.9
Change Detection
4.10
Stochastic Calculus
4.10.1
Ito’s Lemma
5
Regression Analysis
5.1
Ordinary Least Squares (OLS)
5.1.1
Variable Selection
5.1.2
Model Selection
5.2
Fixed Effect and Random Effect
5.3
Analysis of Covariance (ANCOVA)
5.4
Logistic Regression
5.5
Fractional Model
5.6
Isotonic regression
6
Categorical Data Analysis
6.1
Partial least squares regression (PLS)
7
Multivariate Analysis
7.1
Multivariate distributions
7.2
General Linear Model
7.3
Multivariate Analysis of Variance (MANOVA)
7.4
Multivariate Analysis of Covariance (MANCOVA)
7.5
Structural Equation Modeling (SEM)
7.6
Statistical distance
7.7
Dimension Reduction Method
7.7.1
Random Projection
7.7.2
Discriminant Analysis (LDA)
7.7.3
SVD (Singular Value Decomposition)
7.7.4
Principal Component Analysis (PCA)
7.7.5
Nonnegative Matrix Factorization (NMF)
7.7.6
t-SNE
7.7.7
Locally Linear Embedding
7.7.8
Independent Component Analysis (ICA)
7.7.9
Autoencoders
7.7.10
Laplacian Eigenmaps
7.7.11
ISOMAP
7.7.12
Uniform Manifold Approximation and Projection (UMAP)
7.7.13
Self-organizing map
7.7.14
Dynamic mode decomposition
7.8
Factor Analysis
7.8.1
Kaiser–Meyer–Olkin test
7.8.2
Questionnaire
7.9
Multidimensional Scaling (MDS)
7.10
Canonical-correlation Analysis (CCA)
7.11
Analysis of Similarities (ANOSIM)
8
Time Series Analysis
8.1
Time Series Decomposition
8.2
ACF and PACF
8.3
White Noise
8.4
Autoregressive (AR)
8.5
Moving Average (MA)
8.6
Kalman Filter and Savitzky–Golay filter
8.7
ARMA, ARIMA, SARIMA, SARFIMA
8.8
Granger causality
8.9
Nonlinear Time Series
8.9.1
Threshold Autoregressive (TAR) Model
8.9.2
GARCH
8.9.3
Smooth Transition Autoregressive (STAR) Model
8.9.4
Non-linear Moving Average (NMA) Model
8.9.5
Polynomial and Exponential Model
8.10
Multivariate Time Series
8.10.1
VAR
8.10.2
Factor Model
8.11
Some Advanced Topics
8.11.1
Lag regression
8.11.2
Mixed-frequency data
Part III: Methodology - Advanced
9
Generalized Linear Models
9.1
Weighted Least Square (WLS) and Generalized Least Square (GLS)
9.1.1
Rootogram
9.2
Complex Linear Model
9.3
Generalized Estimating Equation (GEE)
9.4
Hierarchical Linear Model
9.4.1
Instrumental variable
9.5
Multilevel Model
10
Spatial Statistics
10.1
Point-referenced Data
10.1.1
Gaussian Process
10.1.2
Exploratory data analysis
10.1.3
Models for spatial dependence
10.1.4
Kriging (Spatial prediction)
10.2
Areal/Lattice Data
10.2.1
Spatial autocorrelation
10.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
10.3
Point Pattern Data
10.3.1
Poisson processes
10.3.2
Cox processes
10.3.3
K-functions
10.4
Other Topics
10.4.1
Spatio-temporal models
10.4.2
Frequency domain methods
10.4.3
Deep Kriging
11
Functional Data Analysis
12
Bayesian Analysis
12.1
Laplace Approximation and BIC
13
High Dimensional Data Analysis
13.1
Curse of Dimension
Part IV: Methodology - Others
14
Nonparametric Method
14.1
Nonparametric tests
14.2
Quantile Regression
14.3
LOESS
14.4
Isotonic regression
14.5
Convex Regression
14.6
Curve estimation
14.6.1
Kernel
14.7
Shape-Constrained Inference
15
Directional Statistics
15.1
Circular Distribution
15.2
Circular Regression
16
Geometric and Topological Data Analysis
16.1
Compositional data
Part VI: Application - Biostatistic
17
Biostatistical Data Analysis
17.1
p-value correction
17.1.1
Bofferoni
17.1.2
Tukey’s HSD
17.1.3
Fisher
17.1.4
False Discovery Rate (FDR)
17.1.5
Q-value
17.1.6
E-value
17.2
Trend Tests
17.2.1
Cochran-Armitage test
17.2.2
Jonckheere’s trend test
17.3
Propensity score
17.4
PLINK
17.5
Polygenic Risk Score
17.6
RNA-seq Analysis
17.7
Metabolomics Analysis
17.7.1
SMART
17.7.2
pareto normalization
17.8
Permutational multivariate analysis of variance (PERMANOVA)
17.9
PERMDISP
17.10
Case Study
18
Clinical Trials
18.1
Phase I
18.2
Phase II
18.3
Phase III
18.4
α
spending function
19
Survial Analysis
19.1
Unobserved data
19.2
Survival Function and Hazard Function
19.3
Kaplan–Meier Estimator
19.4
Log-rank Test
19.5
Proportional Hazards Model
19.6
Accelerated Failure Time (AFT) Model
19.7
Nelson–Aalen Estimator
19.8
Turnbull-Frydman Estimator
19.9
Restricted Median Survival Time (RMST)
19.10
Firth’s penalized logistic regression
19.11
Competing Risks
20
Causal Inference
20.1
Prerequisite Knowledge
20.2
Causal diagrams
20.3
Counterfactual model
20.4
Identification and causal assumptions
20.5
Estimation and modeling
20.6
mediation analysis
20.7
Moderation, Effect Measurement Modification, and Interaction
20.8
Time-varying system
Part V: Application - Industrial Statistics
21
Quality Control
21.1
History
21.2
7 tools
21.3
ARL
21.4
R
chart
21.5
s
chart
21.6
ˉ
X
chart
21.7
p
chart
21.8
CUSUM
21.9
EWMA
21.10
Sequential probability ratio test
22
Reliability Analysis
23
Design of Experiments
23.1
Latin hypercube
23.2
Sequential design
23.3
Space-filling design
23.4
Active learning (Optimal experimental design)
23.5
Online machine learning
Part VII: Application - Others
24
Operations Research
24.1
Queueing Theory
24.2
Optimization
24.2.1
Linear Programming
24.2.2
Convex Programming
24.2.3
Non-linear Programming
24.2.4
Integer Programming
25
Financial Statistics
26
Social Statistics
26.1
Human Behaviour
26.2
Sport
26.3
Psychometrics
26.3.1
Item Response Theory
Part VIII: Computational Statistics
27
Statistical Learning
27.1
Root finding
27.1.1
Newton’s method (Newton–Raphson algorithm)
27.1.2
Gauss–Newton algorithm
27.1.3
Gradient Descent
27.1.4
Conjugate gradient method
27.1.5
quasi-Newton method
27.1.6
Nelder–Mead method
27.1.7
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
27.1.8
Davidon–Fletcher–Powell (DFP) formula
27.2
Classification Method
27.2.1
KNN
27.2.2
Naive Bayes
27.2.3
LDA, QDA
27.2.4
Desicion Tree and Random Forest
27.2.5
Bagging
27.2.6
Boosting
27.2.7
Neighborhood components analysis (NCA)
27.2.8
Soft independent modelling of class analogies
27.3
Regression Method
27.3.1
SVR
27.4
Clustering Method
27.4.1
K-means, K-medoids
27.4.2
Fuzzy C-mean
27.4.3
Mean Shift
27.4.4
Hierarchical Clustering
27.4.5
DBSCAN
27.4.6
Principal Co-ordinates Analysis (PCoA)
27.4.7
Isolation Forest
27.4.8
Self-organizing map (SOM)
27.4.9
Spectral clustering
27.4.10
Quantum clustering
27.5
Model Selection
27.5.1
Information Criteria
28
Statistical Computing
28.1
Generate random variables
28.1.1
Accept-Rejection method
28.1.2
Importance Sampling
28.2
Variance reduction
28.3
Gibbs sampling
28.4
Metropolis-Hastings
28.5
Monte-Carlo and Markov chain (MCMC)
28.6
Statistics Algorithm
28.6.1
Simulated annealing
28.6.2
EM algorithm
28.6.3
Back-fitting algorithm
28.7
Evolutionary Algorithm
28.7.1
Particle Swarm Optimization (PSO)
28.7.2
Genetic Algorithm
28.7.3
Approximate Bayesian computation
29
Advanced Machine Learning
29.1
Double Machine Learning
29.2
Adversarial machine learning (AML)
29.3
Reinforcement Learning
29.4
Curriculum learning
29.5
Rule-based machine learning
29.6
Online machine learning
29.7
Knowledge Distillation
29.8
Automated machine learning (AutoML)
29.9
Machine Unlearning
29.10
Computational Learning Theory
29.10.1
Inductive Bias
29.10.2
Probably Approximately Correct (PAC)
30
Deep Learning
30.1
Basic concept
30.2
DNN
30.3
CNN
30.4
RNN
30.4.1
Long Short-Term Memory (LSTM)
30.4.2
Gated Recurrent Unit (GRU)
30.5
Autoencoders & Variational Autoencoders (VAEs)
30.6
Generative adversarial networks (GAN)
30.7
Transformer Networks
30.8
Graph Neural Networks (GNNs)
30.9
Physics-informed neural networks (PINNs)
30.10
Deep Q-Networks (DQNs)
30.11
Quantum neural network (QNN)
30.12
Some famous models
30.12.1
LeNet、AlexNet、VGG、NiN
30.12.2
GoogLeNet
30.12.3
ResNet
30.12.4
DenseNet
30.12.5
U-Net
30.12.6
YOLO
30.13
Modern NN models
30.13.1
Deep Operator Network
30.13.2
Liquid Neural Network (LNN)
30.13.3
Kolmogorov-Arnold Networks (KAN)
30.13.4
Large Language Model (LLM)
Part IX: Computer Science Skills
31
Data Structure and Algorithm
31.1
Data Structure
31.1.1
Linked list
31.1.2
Satck
31.1.3
Queue
31.1.4
Tree
31.2
Algorithm
31.2.1
Graph and tree traversal algorithms
31.2.2
Dynamic Programming
31.2.3
Mathematical algorithm
32
Information Theory
32.1
Entropy
32.2
Data compression
33
Big Data Analytics Techniques
33.1
Visualization
33.2
Statistical method for big data
33.3
Hadoop
33.4
Spark
34
Quantum Computing
34.1
Basic Concept
34.2
Quantum Algorithm
34.3
Quantum Machine Learning
Part X: Data Communication
35
Data Processing
35.1
Data Collection
35.2
Data Preprocessing
35.2.1
Data Cleaning
35.2.2
Handling Missing Data
35.2.3
Normalization & Standardization
35.2.4
Feature Engineering
36
Data Visualization
36.1
Why we need DataVis?
36.2
Visual Analytics
36.3
Sina plot
36.4
Radar chart
36.5
Parallel coordinates
36.6
Streamplots
36.7
Andrews plot
36.8
Spaghetti plot
36.9
Fish plot
36.10
Volcano plot
36.11
Circle Packing Chart
36.12
Chord diagram (information visualization)
36.13
Climate spiral and Warming stripes
36.14
Bland–Altman plot
36.15
Cherry Blossom Front
36.16
Symbolic data analysis (SDA)
36.17
Cartogram
37
Data Mining
37.1
Structured Data
37.2
Semi-Structured and Unstructured Data
37.3
SQL
37.4
NoSQL
37.5
Association rule learning
37.5.1
Apriori Algorithm
37.5.2
ECLAT Algorithm
37.5.3
FP-growth algorithm
37.6
Anomaly detection
38
Data Ethics and Philosophy
38.1
Benford’s Law
38.2
Differential Privacy
38.3
Attack
Part XI: Modern Data Analysis
39
Network Analysis
40
Semantic Analysis
41
Audio Analysis
42
Image and Video Analysis
Part XII: Data Workflow
43
Data Management and Integration
43.1
Database
43.2
Data Compression
43.3
Data Integration
43.4
Data Quality
43.5
DataOps
44
Multimodal Data Analysis
44.1
Meta Analysis
44.2
Federated Learning
44.3
Data Fusion
45
Statistical Consulting
45.1
Garbage in, garbage out
Part XIII: Statistic Theory
46
Statistical Inference
46.1
Frequentist inference
46.1.1
Estimatation
46.2
Bayesian inference
47
Probability Theory
47.1
Basics from Measure Theory
47.2
Limit of the sets
47.3
Probability Inequalities
47.4
Bertrand Paradox
47.5
Stochastic ordering
47.6
Malliavin Calculus
47.7
Regular conditional probability
47.7.1
Markov kernel
47.8
Martingale
47.8.1
Reverse martingale
48
Algebraic Statistics
48.1
Free Probability Theory
Part XIV: Miscellaneous
49
Statistical Education
49.1
Stories
49.1.1
Buffon’s needle problem
49.1.2
Simpson’s paradox
49.1.3
Berkson’s paradox
49.1.4
Lindley’s paradox
49.1.5
Freedman’s paradox
49.1.6
Texas sharpshooter fallacy
49.1.7
Survivorship bias
49.1.8
All models are wrong
49.1.9
Stein’s phenomenon
49.1.10
German tank problem
49.1.11
Lindy effect
49.1.12
Doomsday argument
50
Advanced programming in R
50.1
Technique for Basic operator
50.2
Special operator
50.2.1
Inner function
50.2.2
Super assignment
<<-
50.3
Pipe operator
50.3.1
User define pipe operator
50.4
Non-standard Evaluation (NSE)
50.4.1
Tidy evaluation
50.5
Functional programming
50.5.1
Helper function
50.6
Progress bar
50.7
Parallel computing
Appendix
A
Matrix calculus
Published with bookdown
Facebook
Twitter
LinkedIn
Weibo
Instapaper
A
A
Serif
Sans
White
Sepia
Night
Spacing -
Spacing +
Mastering Statistics with R
5.6
Isotonic regression