Introduction to Computational Statistics

Siegfred Roi L. Codia

First Semester, A.Y. 2025-2026

Definition 0.1 Computational statistics is defined as a collection of techniques that have a strong “focus on the exploitation of computing in the creation of new statistical methodology.” - Wegman (1988)

Efron and Tibshirani (1991) refer to what we call computational statistics as computer-intensive statistical methods. They give the following as examples for these types of techniques:

bootstrap methods,
nonparametric regression,
generalized additive models, and
classification and regression trees.

Gentle (2005) also follows the definition of Wegman (1988) where he states that computational statistics is a discipline that includes a class of statistical methods characterized by computational intensity…

Motivation for Computational Statistics

Complex Dataset
- Modern data is high-dimensional, large-scale, and often messy. Traditional statistical methods, which rely on closed-form solutions, struggle with such data. Computational statistics provides scalable solutions for big data.
Optimization in Statistics
- Many statistical techniques involve optimization (e.g., Maximum Likelihood Estimation, LASSO regression). Computational methods like gradient descent and EM algorithms allow us to estimate parameters efficiently.
Monte Carlo Methods and Simulation
- Many real-world problems cannot be solved analytically, especially in Bayesian inference, finance, and genetics. Computational statistics enables the use of Monte Carlo simulations, which approximate solutions by repeated random sampling.
Resampling Techniques
- Bootstrapping and permutation tests provide non-parametric alternatives when classical statistical assumptions (such as normality) do not hold. These methods rely heavily on computational power.
Machine Learning and AI
- Many machine learning methods, such as decision trees, neural networks, and clustering algorithms, are fundamentally statistical but rely on computational techniques to scale and optimize performance.

We mention the topics that can be considered part of computational statistics to help you understand the difference between these and the more traditional methods of statistics. The following table gives an excellent comparison of the two areas (Wegman, 1988).

Aspect	Traditional Statistics	Computational Statistics
Sample Size	Small to moderate sample size	Large to very large sample size
Observations and Datasets	Independent, identically distributed data sets	Nonhomogeneous datasets
Predictors	One or low dimensional	High Dimensional
Computation	Manually Computational	Computationally Intensive
Tractability	Mathematically Tractable	Numerically Tractable
Assumptions	Strong unverifiable assumptions Relationships (linearity, additivity) Error Structures (normality)	Weak or no assumptions Relationships (nonlinearity) Error structures (distribution free)
Inference	Statistical Inference	Structural Inference
Algorithms	Predominantly closed-form algorithms	Iterative algorithms are possible
Statistical Property	Statistical Optimality Minimum Variance Highest Likelihood	Statistical Robustness Out-of-sample prediction accuracy Robust to outliers

Computational statistics is essential in modern data analysis, enabling statisticians to tackle problems that traditional methods cannot handle efficiently. As data grows in size and complexity, computational techniques continue to expand the scope of statistical applications in science, business, and technology.

Example 1 - An analysis approach with no closed-form Solution

k-means clustering is a method that aims to cluster observations together. The assignment of clusters to a single observation is not a closed-form solution. It is based on a computational algorithm.

Select the number of clusters \(k\).
Pick random locations of the clusters as the centroids.
Select a data point
Find its nearest centroid (using some distance formula).
Repeat 3 and 4 for all data points.
Reassign centroids
Repeat steps 3 to 6 until centroids do not move anymore.

Example 2 - No Theoretical Distribution of Population

Recall from your sampling design class. The variance of the estimator for the population proportion under SRSWOR is given by:

\[ Var(\hat{P}_{SRSWOR})=\frac{PQ}{n}\frac{N-n}{N-1} \]

If the sample size is \(n=100\), the population size is \(N=1,000,000\), and the population proportion is \(P=1-Q=0.5\), what is the theoretical variance of \(\hat{P}\) assuming finite population?

Show answer

P <- 0.5
Q <- 1-0.5
n <- 100
N <- 1000000

var_phat <- ((P*Q)/n)*(N-n)/(N-1)
var_phat

## [1] 0.002499752

\[ Var(\hat{P}_{SRSWOR})= 0.0024998 \]

How did we come up with this?

By knowing theorems and deriving them! Recall hypergeometric distribution.

Computer Experiment

What if we don’t know the theory?

We can get a close approximation of the value using computer experiment and simulations!

How do we craft the computer experiment?

Step 1: Creating the population

Consider a population of size \(N\) with \(M\) individuals having a particular characteristic (successes).

There are \(N=1,000,000\) individuals, 50% of those has our characteristic of interest. Let’s create a vector with 500,000 1s and 500,000 0s.

M <- N*P
pop <- c(rep(1,M), rep(0,N - M))

Step 2: Sampling from the population

Again, this is sampling from a finite population, so we use sampling without replacement of size \(n = 100\).

samp <- sample(pop, size = n, replace = F)

Step 3: Compute for the sample proportion

The sample proportion \(\hat{p}\) is just the mean of the sample vector, since this is a vector of 1s and 0s.

\[ \hat{p}=\frac{1}{n}\sum_{i=1}^n x_i \]

p <- mean(samp)

Step 4: Repeat steps 2 and 3 for a large number of times

We will do this to obtain \(B\) number of estimated proportions.

B <- 100
prop <- c()
for(i in 1:B){
    samp <- sample(pop, size = n, replace = F)
    prop[i] <- mean(samp)
}

prop

##   [1] 0.48 0.49 0.52 0.54 0.47 0.52 0.48 0.65 0.43 0.44 0.45 0.50 0.53 0.56 0.47
##  [16] 0.55 0.57 0.54 0.61 0.54 0.50 0.48 0.47 0.44 0.42 0.51 0.49 0.53 0.56 0.48
##  [31] 0.46 0.54 0.49 0.47 0.44 0.41 0.47 0.47 0.50 0.60 0.47 0.46 0.53 0.48 0.54
##  [46] 0.59 0.53 0.43 0.47 0.50 0.53 0.45 0.48 0.52 0.45 0.55 0.50 0.51 0.50 0.59
##  [61] 0.46 0.59 0.49 0.57 0.48 0.54 0.53 0.43 0.42 0.52 0.52 0.51 0.54 0.43 0.55
##  [76] 0.55 0.54 0.37 0.52 0.49 0.53 0.45 0.54 0.45 0.43 0.42 0.47 0.57 0.53 0.34
##  [91] 0.52 0.47 0.52 0.56 0.51 0.46 0.50 0.56 0.52 0.36

Step 5: Compute for the variance of the estimated proportions

Finally, we can now obtain an approximate for the variance of the sample proportion.

var(prop)

## [1] 0.002923424

Consolidation to a function

We now consolidate everything to a single function.

var_p <- function(n, N, P, B){
    # setting the population 
    M   <- N*P
    pop <- c(rep(1, M), rep(0, N - M))
    # setting empty vector for the p
    prop <- c()
    # sampling B times
    set.seed(1)
    for(b in 1:B){
        samp <- sample(pop, n, replace = T)
        prop[b] <- mean(samp)
    }
    # computation of the variance 
    return(var(prop))
}

Here are some results for different number of iterations \(B=10,100,1000\)

var_p(100, 1000000, 0.5, 10)
var_p(100, 1000000, 0.5, 100)
var_p(100, 1000000, 0.5, 1000)
var_phat

## [1] 0.001622222
## [1] 0.002609889
## [1] 0.002635851
## [1] 0.002499752

Visualization

We can also see how the simulated value gets close the the theoretical value as the number of iterations increase.

In the real world, the population proportion \(P\) is not known, and needs to be estimated using our sample. In Computational Statistics, we have what we call “Bootstrap Resampling” where we create a hypothesized population based on our sample.

The following is an example function that implements Bootstrap to approximate the variance of the proportion estimator \(\hat{P}\).

var_mc <- function(n, N, p, iter){
    prop <- c()
    for(b in 1:iter){
        set.seed(b)
        # generate hypothesized population 
        pop  <- rbinom(N, 1, p)
        
        # sample from the population
        samp <- sample(pop, size = n, replace = T)
        
        # compute for the proportion
        prop[b] <- mean(samp)
    }
    return(var(prop))
}

Why Computational Statistics?

The essence of statistics has not changed ever since.

But as computers become more powerful (e.g., faster, cheaper, and more common), statisticians learned how to harness this increasing power to do statistics.

Of course, mathematics remains as our primary tool and language in studying uncertainty. And will probably remain as such in the many years to come, if not forever.

However, more powerful computing paved the way for us statisticians to be able to advance the way we do statistics in three general areas.

In this course, we will cover core topics in computational statistics with mix of theory and application. By the end of this course, you should be able to design your own computer-based experiments, grounded on statistical theory.

References

This book is being maintained by Siegfred Codia of UP School of Statistics.

Most of the contents in this book are adapted from handouts of Asst. Prof. Xavier Javines Bilon and Asst. Prof. Iana Michelle Garcia from the same school. Other references are included in each section.

STAT 142