Mastering Statistics with R
Welcome
Prerequisites
Who am I ?
Acknowledgement
Progress of this book
Part I: Basic Probability
1
Introduction to Probability
1.1
What is probability ?
1.2
Basic Mathematic
1.2.1
Combinatorics
1.2.2
Set Theory
1.3
History of probability
1.3.1
Experiment, Sample space and Events
1.3.2
Definitions of Probability
1.4
Conditional Probability and Independence
1.4.1
Conditional Probability
1.4.2
Independence
1.5
Bayes’ Theorem
2
Random variables
2.1
Random variables and probability functions
2.1.1
Random variables
2.1.2
Discrete Probability Function
2.1.3
Continuous Probability Function
2.1.4
* Mixed Type Probability Function
2.2
Expected values and Variance
2.2.1
*Approximation of a random variable
2.3
Transformation of random variables
2.3.1
Discrete r.v. transformation
2.3.2
Continuous r.v. transformation
2.4
Families of distributions
2.4.1
Discrete probability distributions
2.4.2
Continuous probability distributions
3
Multivariate random variables
3.1
Joint distributions
3.1.1
Marginal Distribution
3.1.2
Sum of two independent random variables
3.2
Change of variables
3.3
Families of multivariate distributions
3.3.1
Trinomial distribution
3.3.2
Bivariate hypergeometric distribution
3.3.3
Multivariate normal distribution
3.3.4
Wishart distribution
3.3.5
Wilks’ lambda distribution
3.3.6
Hotelling’s
\(T^2\)
-distribution
4
System of Moments
5
Limit Theorem
5.1
Some inequality
5.1.1
Markov inequality
5.1.2
Chebshev inequality
5.1.3
Jensen inequality
5.2
Law of large numbers
Part II: Basic Statistic
6
Descriptive Statistics
6.1
Frequency distribution
6.2
Measures of statistical characteristics
6.3
Exploratory data analysis (EDA)
6.3.1
Steam-and-leaf plot
6.3.2
Histogram and Bar chart
6.3.3
Pareto chart
6.3.4
Density plot
6.3.5
Box-plot and Violin plot
7
Sampling
7.1
Random sampling methods (probability sampling techniques)
7.1.1
Simple random sampling
7.1.2
Systematic sampling
7.1.3
Stratified sampling
7.1.4
Cluster sampling
7.2
Nonrandom sampling methods (non-probability sampling techniques)
7.2.1
Convenience sampling
7.2.2
Snowball sampling
7.2.3
Judgmental sampling
7.2.4
Quota sampling
7.2.5
Consecutive sampling
7.3
Other sampling methods
7.3.1
Latin hypercube sampling
8
Estimation
8.1
Point Estimation
8.1.1
Method of Moments (MoM)
8.1.2
Maximum Likelihood Estimation (MLE)
8.1.3
Uniformly Minimum Variance Unbiased Estimator (UMVUE)
8.2
Interval Estimation
9
Testing Hypotheses
9.1
Null hypothesis vs. alternative hypothesis
9.2
The Neyman-Pearson Lemma
10
Some statistical test
10.1
Parametric statistical test
10.1.1
\(t\)
-test
10.1.2
\(F\)
-test
10.1.3
\(\chi^2\)
-test
10.2
Non-parametric statistical test
10.2.1
Mann–Whitney
\(U\)
-test (Wilcoxon rank-sum test)
10.2.2
Wilcoxon signed-rank test
10.2.3
Kolmogorov–Smirnov test
11
Analysis of Variance (ANOVA)
11.1
Levene’s test
11.2
Bartlett’s test
11.3
One-way ANOVA
11.4
Two-way ANOVA
11.5
Welch’s ANOVA
11.6
Kruskal–Wallis test
11.7
Friedman test
11.8
Normality Test
12
Correlation Analysis and Linear Regression
13
Limiting Distributions
13.1
Converge in probability
13.2
Converge in distribution
Part III: Statistical Inference
14
(Generalized) Linear Models
14.1
Ordinary Least Squares (OLS)
14.2
Fixed Effect and Random Effect
14.3
Analysis of Covariance (ANCOVA)
14.4
Logistic Regression
14.5
Fractional Model
14.6
Weighted Least Square (WLS) and Generalized Least Square (GLS)
14.7
Hierarchical Linear Model
14.8
Multilevel Model
14.9
Quantile Regression
15
Applied Probability Models
15.1
Stochastic Process
15.1.1
Random Walk
15.1.2
Poisson Process
15.1.3
Markov Process
15.1.4
Wiener Process
15.1.5
Lévy Process
15.2
Markov Chain
15.3
Ergodic Theory
15.4
Stochastic Calculus
15.4.1
Ito’s Lemma
16
Probability Theory
16.1
Basics from Measure Theory
16.2
Limit of the sets
16.3
Stochastic ordering
16.4
Malliavin Calculus
17
Bayesian Analysis
17.1
Laplace Approximation and BIC
Part IV: Sequential Data
18
Quality Control
18.1
History
18.2
7 tools
18.3
ARL
18.4
\(R\)
chart
18.5
\(s\)
chart
18.6
\(\bar{X}\)
chart
18.7
\(p\)
chart
18.8
CUSUM
18.9
EWMA
18.10
Sequential probability ratio test
19
Time Series Analysis
19.1
Time Series Decomposition
19.2
ACF and PACF
19.3
White Noise
19.4
Autoregressive (AR)
19.5
Moving Average (MA)
19.6
Kalman Filter and Savitzky–Golay filter
19.7
ARMA, ARIMA, SARIMA, SARFIMA
19.8
Granger causality
19.9
VAR
19.10
GARCH
19.11
Factor Model
19.12
Some advanced topics
19.12.1
Lag regression
19.12.2
Mixed-frequency data in time series
Part V: Computational Statistic
20
Statistical Learning
20.1
Root finding
20.1.1
Newton’s method (Newton–Raphson algorithm)
20.1.2
Gauss–Newton algorithm
20.1.3
Gradient Descent
20.1.4
Conjugate gradient method
20.1.5
Nelder–Mead method
20.1.6
Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm
20.2
Information Criteria
20.2.1
AIC
20.2.2
BIC
20.3
Desicion Tree and Random Forest
20.4
Bagging
20.5
Boosting
20.5.1
Gradient Boost Desicion Tree (GBDT)
20.5.2
XGBoost
20.5.3
LightGBM
20.5.4
CATBoost
20.5.5
RUSBoost
21
Statistical Computing
21.1
Generate random variables
21.1.1
Inverse transform method
21.1.2
Accept-Rejection method
21.2
Variance reduction
21.3
Monte-Carlo and Markov chain (MCMC)
21.4
EM algorithm
21.5
Particle Swarm Optimization (PSO)
Part VI: Biostatistic
22
Survial Analysis
22.1
Survival Function and Hazard Function
22.2
Kaplan–Meier Estimator
22.3
Log-rank Test
22.4
Proportional Hazards Model
22.5
Accelerated Failure Time (AFT) Model
22.6
Nelson–Aalen Estimator
22.7
Restricted Median Survival Time (RMST)
22.8
Firth’s penalized logistic regression
22.9
Competing Risks
23
Biostatistical Data Analysis
23.1
Trend Tests
23.1.1
Cochran-Armitage test
23.1.2
Jonckheere’s trend test
23.2
Propensity score
23.3
PLINK
23.4
Polygenic Risk Score
23.5
RNA-seq Analysis
23.6
Metabolomics Analysis
23.6.1
SMART
23.6.2
pareto normalization
23.7
Permutational multivariate analysis of variance (PERMANOVA)
23.8
PERMDISP
24
Causal Inference
24.1
DAG
25
Statistical Designs and Analyses in Clinical Trials
Part VII: Applications
Part VIII: Other Topics
26
Multivariate Analysis
26.1
General Linear Model
26.2
Multivariate Analysis of Variance (MANOVA)
26.3
Multivariate Analysis of Covariance (MANCOVA)
26.4
Structural Equation Modeling (SEM)
26.5
Dimension Reduction Method
26.5.1
t-SNE
26.5.2
DBSCAN
26.5.3
Locally Linear Embedding
26.5.4
Laplacian Eigenmaps
26.5.5
ISOMAP
26.5.6
Uniform Manifold Approximation and Projection (UMAP)
26.6
Clustering Method
26.6.1
K-means, K-medoids
26.6.2
KNN
26.6.3
Principal Component Analysis (PCA)
26.6.4
Principal Co-ordinates Analysis (PCoA)
26.6.5
Multidimensional Scaling (MDS)
26.6.6
Self-organizing map (SOM)
26.6.7
Spectral clustering
26.6.8
Quantum clustering
26.6.9
Partial Least Squares Discriminant Analysis (PLS-DA)
26.6.10
Unweighted Paired-Group Method Using Arithmetic Means (UPGMA)
26.7
Factor Analysis
26.7.1
Kaiser–Meyer–Olkin test
26.7.2
Questionnaire
26.8
Canonical-correlation Analysis (CCA)
26.9
Analysis of Similarities (ANOSIM)
27
Categorical Data Analysis
28
Consulting in Statistics
29
Spatial Statistics
29.1
Point-referenced Data
29.1.1
Gaussian Process
29.1.2
Exploratory data analysis
29.1.3
Models for spatial dependence
29.1.4
Kriging (Spatial prediction)
29.2
Areal/Lattice Data
29.2.1
Spatial autocorrelation
29.2.2
Conditionally auto-regressive (CAR) and Simultaneously auto-regressive (SAR) models
29.3
Point Pattern Data
29.3.1
Poisson processes
29.3.2
Cox processes
29.3.3
K-functions
29.4
Other Topics
29.4.1
Spatio-temporal models
29.4.2
Frequency domain methods
29.4.3
Deep Kriging
30
Directional Statistics
30.1
Circular Regression
31
Functional Data Analysis
Part IV: Deal with Computer Science
32
Information Theory
32.1
Entropy
32.2
Data compression
33
Data Visualization and Visual Analytics
33.1
Parallel coordinates
33.2
Andrews plot
33.3
Chord diagram (information visualization)
33.4
Climate spiral and Warming stripes
34
Big Data Analytics Techniques and Applications
34.1
Visualization
34.2
Hadoop
34.3
Spark
35
Data Mining
35.1
Online machine learning
36
Image Processing
37
Deep Learning
37.1
Basic concept
37.2
DNN
37.3
CNN
37.4
RNN
37.4.1
Long Short-Term Memory (LSTM)
37.4.2
Gated Recurrent Unit (GRU)
37.5
Generative adversarial networks (GAN)
37.6
Transformer Networks
37.7
Autoencoders & Variational Autoencoders (VAEs)
37.8
Graph Neural Networks (GNNs)
37.9
Deep Q-Networks (DQNs)
37.10
Quantum neural network (QNN)
37.11
Some famous models
37.11.1
LeNet、AlexNet、VGG、NiN
37.11.2
GoogLeNet
37.11.3
ResNet
37.11.4
DenseNet
37.11.5
YOLO
37.12
Modern NN models
37.12.1
Liquid Neural Network (LNN)
37.12.2
Kolmogorov-Arnold Networks (KAN)
Appendix
A
Matrix calculus
B
Functions in R
Published with bookdown
Mastering Statistics with R
8.1
Point Estimation
8.1.1
Method of Moments (MoM)
8.1.2
Maximum Likelihood Estimation (MLE)
8.1.3
Uniformly Minimum Variance Unbiased Estimator (UMVUE)