Processing math: 100%
Code
Show All Code
Hide All Code
Type to search
Mastering Statistics with R
Welcome
Prerequisites
Who am I ?
Acknowledgement
Progress of this book
Part I: Basic Probability
1
Introduction to Probability
1.1
What is probability ?
1.2
Basic Mathematic
1.2.1
Combinatorics
1.2.2
Set Theory
1.3
History of probability
1.3.1
Experiment, Sample space and Events
1.3.2
Definitions of Probability
1.4
Conditional Probability and Independence
1.4.1
Conditional Probability
1.4.2
Independence
1.5
Bayes’ Theorem
2
Random variables
2.1
Random variables and probability functions
2.1.1
Random variables
2.1.2
Discrete Probability Function
2.1.3
Continuous Probability Function
2.1.4
* Mixed Type Probability Function
2.2
Expected values and Variance
2.2.1
*Approximation of a random variable
2.3
Transformation of random variables
2.3.1
Discrete r.v. transformation
2.3.2
Continuous r.v. transformation
2.4
Families of distributions
2.4.1
Discrete probability distributions
2.4.2
Continuous probability distributions
3
Multivariate random variables
3.1
Joint distributions
3.1.1
Marginal Distribution
3.1.2
Sum of two independent random variables
3.2
Change of variables
3.3
Families of multivariate distributions
3.3.1
Trinomial distribution
3.3.2
Bivariate hypergeometric distribution
3.3.3
Multivariate normal distribution
3.3.4
Wishart distribution
3.3.5
Wilks’ lambda distribution
3.3.6
Hotelling’s
T
2
-distribution
4
System of Moments
5
Limit Theorem
5.1
Some inequality
5.1.1
Markov inequality
5.1.2
Chebshev inequality
5.1.3
Jensen inequality
5.2
Law of large numbers
Part II: Basic Statistic
6
Descriptive Statistics
6.1
Frequency distribution
6.2
Measures of statistical characteristics
6.3
Exploratory data analysis (EDA)
6.3.1
Steam-and-leaf plot
6.3.2
Histogram and Bar chart
6.3.3
Pareto chart
6.3.4
Density plot
6.3.5
Box-plot and Violin plot
7
Sampling
7.1
Random sampling methods (probability sampling techniques)
7.1.1
Simple random sampling
7.1.2
Systematic sampling
7.1.3
Stratified sampling
7.1.4
Cluster sampling
7.2
Nonrandom sampling methods (non-probability sampling techniques)
7.2.1
Convenience sampling
7.2.2
Snowball sampling
7.2.3
Judgmental sampling
7.2.4
Quota sampling
7.2.5
Consecutive sampling
7.3
Other sampling methods
7.3.1
Latin hypercube sampling
8
Estimation
8.1
Point Estimation
8.1.1
Method of Moments (MoM)
8.1.2
Maximum Likelihood Estimation (MLE)
8.1.3
Uniformly Minimum Variance Unbiased Estimator (UMVUE)
8.2
Interval Estimation
9
Testing Hypotheses
9.1
Null hypothesis vs. alternative hypothesis
9.2
The Neyman-Pearson Lemma
10
Some statistical test
10.1
Parametric statistical test
10.1.1
t
-test
10.1.2
F
-test
10.1.3
χ
2
-test
10.2
Non-parametric statistical test
10.2.1
Mann–Whitney
U
-test (Wilcoxon rank-sum test)
10.2.2
Wilcoxon signed-rank test
10.2.3
Kolmogorov–Smirnov test
11
Analysis of Variance (ANOVA)
11.1
Levene’s test
11.2
Bartlett’s test
11.3
One-way ANOVA
11.4
Two-way ANOVA
11.5
Welch’s ANOVA
11.6
Kruskal–Wallis test
11.7
Friedman test
11.8
Normality Test
12
Correlation Analysis and Linear Regression
12.1
Variable Selection
12.1.1
Forward, Backward selection
12.1.2
Nonnegative garrote method
12.2
Model Selection
12.2.1
Variance-Bias Trade-off
12.2.2
Information Criteria
Part III: Statistical Inference
13
Limiting Distributions
13.1
Converge in probability
13.2
Converge in distribution
14
(Generalized) Linear Models
14.1
Ordinary Least Squares (OLS)
14.2
Fixed Effect and Random Effect
14.3
Analysis of Covariance (ANCOVA)
14.4
Logistic Regression
14.5
Fractional Model
14.6
Weighted Least Square (WLS) and Generalized Least Square (GLS)
14.7
Hierarchical Linear Model
14.7.1
Instrumental variable
14.8
Multilevel Model
14.9
Quantile Regression
14.10
Complex Linear Model
14.10.1
Rootogram
15
Decision Theory
15.1
Regret
16
Applied Probability Models
16.1
Stochastic Process
16.1.1
Random Walk
16.1.2
Poisson Process
16.1.3
Markov Process
16.1.4
Wiener Process
16.1.5
Lévy Process
16.2
Markov Chain
16.2.1
Semi-Markov Chain
16.2.2
Hidden Markov Models
16.2.3
Mover-Stayer Models
16.3
Ergodic Theory
16.4
Stochastic Calculus
16.4.1
Ito’s Lemma
17
Probability Theory
17.1
Basics from Measure Theory
17.2
Limit of the sets
17.3
Probability Inequalities
17.4
Stochastic ordering
17.5
Malliavin Calculus
17.6
Regular conditional probability
17.6.1
Markov kernel
17.7
Martingale
17.7.1
Reverse martingale
18
High Dimensional Data
19
Bayesian Analysis
19.1
Laplace Approximation and BIC
Part IV: Sequential Data
20
Quality Control
20.1
History
20.2
7 tools
20.3
ARL
20.4
R
chart
20.5
s
chart
20.6
ˉ
X
chart
20.7
p
chart
20.8
CUSUM
20.9
EWMA
20.10
Sequential probability ratio test
21
Time Series Analysis
21.1
Time Series Decomposition
21.2
ACF and PACF
21.3
White Noise
21.4
Autoregressive (AR)
21.5
Moving Average (MA)
21.6
Kalman Filter and Savitzky–Golay filter
21.7
ARMA, ARIMA, SARIMA, SARFIMA
21.8
Granger causality
21.9
VAR
21.10
GARCH
21.11
Factor Model
21.12
Some advanced topics
21.12.1
Lag regression
21.12.2
Mixed-frequency data
22
Design of Experiments
22.1
Latin hypercube
22.2
Sequential design
22.3
Space-filling design
22.4
Active learning (Optimal experimental design)
22.5
Online machine learning
Part V: Computational Statistic
23
Statistical Learning
23.1
Root finding
23.1.1
Newton’s method (Newton–Raphson algorithm)
23.1.2
Gauss–Newton algorithm
23.1.3
Gradient Descent
23.1.4
Conjugate gradient method
23.1.5
Nelder–Mead method
23.1.6
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
23.2
Information Criteria
23.2.1
AIC
23.2.2
BIC
23.3
Desicion Tree and Random Forest
23.4
Bagging
23.5
Boosting
23.5.1
Gradient Boost Desicion Tree (GBDT)
23.5.2
XGBoost
23.5.3
LightGBM
23.5.4
CATBoost
23.5.5
RUSBoost
24
Statistical Computing
24.1
Generate random variables
24.1.1
Inverse transform method
24.1.2
Accept-Rejection method
24.2
Variance reduction
24.3
Monte-Carlo and Markov chain (MCMC)
24.4
EM algorithm
24.5
Back-fitting algorithm
24.6
Particle Swarm Optimization (PSO)
Part VI: Biostatistic
25
Survial Analysis
25.1
Unobserved data
25.2
Survival Function and Hazard Function
25.3
Kaplan–Meier Estimator
25.4
Log-rank Test
25.5
Proportional Hazards Model
25.6
Accelerated Failure Time (AFT) Model
25.7
Nelson–Aalen Estimator
25.8
Turnbull-Frydman Estimator
25.9
Restricted Median Survival Time (RMST)
25.10
Firth’s penalized logistic regression
25.11
Competing Risks
26
Biostatistical Data Analysis
26.1
p-value correction
26.1.1
Bofferoni
26.1.2
Tukey’s HSD
26.1.3
Fisher
26.1.4
False Discovery Rate (FDR)
26.1.5
Q-value
26.1.6
E-value
26.2
Trend Tests
26.2.1
Cochran-Armitage test
26.2.2
Jonckheere’s trend test
26.3
Propensity score
26.4
PLINK
26.5
Polygenic Risk Score
26.6
RNA-seq Analysis
26.7
Metabolomics Analysis
26.7.1
SMART
26.7.2
pareto normalization
26.8
Permutational multivariate analysis of variance (PERMANOVA)
26.9
PERMDISP
27
Causal Inference
27.1
DAG
28
Statistical Designs and Analyses in Clinical Trials
28.1
Phase I
28.2
Phase II
28.3
Phase III
28.4
α
spending function
Part VII: Applications
29
Social science
29.1
30
Psychometrics
30.1
Item Response Theory
31
Industry
31.1
Degradation data
Part VIII: Other Topics
32
Multivariate Analysis
32.1
Multivariate distributions
32.2
General Linear Model
32.3
Multivariate Analysis of Variance (MANOVA)
32.4
Multivariate Analysis of Covariance (MANCOVA)
32.5
Structural Equation Modeling (SEM)
32.6
Dimension Reduction Method
32.6.1
t-SNE
32.6.2
DBSCAN
32.6.3
Locally Linear Embedding
32.6.4
Laplacian Eigenmaps
32.6.5
ISOMAP
32.6.6
Uniform Manifold Approximation and Projection (UMAP)
32.7
Clustering Method
32.7.1
K-means, K-medoids
32.7.2
KNN
32.7.3
Principal Component Analysis (PCA)
32.7.4
Principal Co-ordinates Analysis (PCoA)
32.7.5
Multidimensional Scaling (MDS)
32.7.6
Self-organizing map (SOM)
32.7.7
Spectral clustering
32.7.8
Quantum clustering
32.7.9
Partial Least Squares Discriminant Analysis (PLS-DA)
32.7.10
Unweighted Paired-Group Method Using Arithmetic Means (UPGMA)
32.8
Factor Analysis
32.8.1
Kaiser–Meyer–Olkin test
32.8.2
Questionnaire
32.9
Canonical-correlation Analysis (CCA)
32.10
Analysis of Similarities (ANOSIM)
33
Categorical Data Analysis
34
Consulting in Statistics
35
Spatial Statistics
35.1
Point-referenced Data
35.1.1
Gaussian Process
35.1.2
Exploratory data analysis
35.1.3
Models for spatial dependence
35.1.4
Kriging (Spatial prediction)
35.2
Areal/Lattice Data
35.2.1
Spatial autocorrelation
35.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
35.3
Point Pattern Data
35.3.1
Poisson processes
35.3.2
Cox processes
35.3.3
K-functions
35.4
Other Topics
35.4.1
Spatio-temporal models
35.4.2
Frequency domain methods
35.4.3
Deep Kriging
36
Extreme value theory
37
Directional Statistics
37.1
Circular Regression
38
Functional Data Analysis
Part IV: Deal with Computer Science
39
Data Structure
39.1
Linked list
39.2
Satck
39.3
Queue
39.4
Tree
40
Algorithm
40.1
Graph and tree traversal algorithms
40.1.1
Search
40.1.2
Shortest path
40.1.3
Minimum spanning tree
41
Information Theory
41.1
Entropy
41.2
Data compression
42
Machine Learning
42.1
Double Machine Learning
42.2
Adversarial machine learning (AML)
42.3
Reinforcement Learning
42.4
Computational Learning Theory
42.4.1
Probably Approximately Correct (PAC)
43
Data Visualization and Visual Analytics
43.1
Radar chart
43.2
Parallel coordinates
43.3
Andrews plot
43.4
Fish plot
43.5
Circle Packing Chart
43.6
Chord diagram (information visualization)
43.7
Climate spiral and Warming stripes
43.8
Symbolic data analysis (SDA)
44
Big Data Analytics Techniques and Applications
44.1
Visualization
44.2
Hadoop
44.3
Spark
45
Data Mining
45.1
Online machine learning
46
Image Processing
47
Deep Learning
47.1
Basic concept
47.2
DNN
47.3
CNN
47.4
RNN
47.4.1
Long Short-Term Memory (LSTM)
47.4.2
Gated Recurrent Unit (GRU)
47.5
Generative adversarial networks (GAN)
47.6
Transformer Networks
47.7
Autoencoders & Variational Autoencoders (VAEs)
47.8
Graph Neural Networks (GNNs)
47.9
Physics-informed neural networks (PINNs)
47.10
Deep Q-Networks (DQNs)
47.11
Quantum neural network (QNN)
47.12
Some famous models
47.12.1
LeNet、AlexNet、VGG、NiN
47.12.2
GoogLeNet
47.12.3
ResNet
47.12.4
DenseNet
47.12.5
YOLO
47.13
Modern NN models
47.13.1
Liquid Neural Network (LNN)
47.13.2
Kolmogorov-Arnold Networks (KAN)
Appendix
A
Matrix calculus
B
Advanced programming in R
B.1
Technique for Basic operator
B.2
Special operator
B.2.1
Inner function
B.2.2
Super assignment
<<-
B.3
Pipe operator
B.3.1
User define pipe operator
B.4
Non-standard Evaluation (NSE)
B.4.1
Tidy evaluation
B.5
Functional programming
B.5.1
Helper function
B.6
Progress bar
B.7
Parallel computing
Published with bookdown
Facebook
Twitter
LinkedIn
Weibo
Instapaper
A
A
Serif
Sans
White
Sepia
Night
Mastering Statistics with R
8.2
Interval Estimation