Chapter 4 Statistical Learning

In an era where data is abundant, the ability to extract meaningful insights and make accurate predictions is crucial across various fields, including finance, healthcare, business, and science.

Statistical learning refers to a vast set of tools for understanding data. It is a framework for understanding and modeling the relationship between variables using statistical and machine learning techniques. It involves methods for prediction, inference, and pattern recognition based on data.

Statistical learning techniques are widely used in:

Predictive analytics: Forecasting stock prices, disease progression, or customer behavior.
Pattern recognition: Identifying speech, handwriting, or fraudulent transactions.
Optimization and decision-making: Automating processes and improving efficiency in businesses.

Statistical learning is broadly classified into:

Supervised Learning: Learning a function that maps input features to an output based on labeled data. Examples include regression and classification models.
Unsupervised Learning: Discovering hidden patterns in data without labeled outputs. Techniques such as clustering and dimensionality reduction (PCA) fall into this category.

In this Chapter, we will not discuss in-depth theory of existing machine learning algorithms.

We will just introduce and try to implement them as alternative procedures to existing classical approaches in statistical modelling.

Note on terminologies

In machine learning, you will frequently encounter the terms features and label. These are central to understanding how models work.

Features are the individual measurable properties or characteristics of the data. In classical statistics, features correspond to what are typically called independent variables, predictors, or explanatory variables
The label refers to the output we want the model to predict. In statistical terms, this is known as the dependent variable, outcome, or response variable.