Chapter 12 The basics of machine learning

The machine learning flow. source: mlr3 manual
Data: (yi,xi)i=1,…,N where x∈RD.
Want to predict y from x.
Model for prediction: f(x)=θTx+θ0
12.1 Training vs prediction
12.1.1 Training vs testing (predicting) sets
- The entire sample S=(yi,xi)i=1,…,N is randomly separatedly to two disjoint sets of sample, say St and Sp, so that
St∪Sp=S, St∩Sp=S
Model training phase: when you are estimating parameters using St.
Model testing phase: when you are evaluating your model performance using Sp.
12.2 How to train a model
some measure of quality:
- empirical risk minimization: such as SSE
- Bayesian inference
12.3 Training while avoiding over/under-fitting
- Training without proper reigning will lead to a trained model that performs well in \mathcal{S}_t, but poorly in \mathcal{S}_p.
You want to predict head or tail from a coin tossing. You collect a sample of (H,T,H,H,H). Randomly separate it into \mathcal{S}_t=(H,H,H) and \mathcal{S}_p=(H,T).
What is your model?
What is your measure of quality?
What is your parameter estimate?
12.3.1 Reigning while training
- Regularization: add penalty term (call regularizer) to our loss function, such as \left\Vert \bf{\theta}\right\Vert^2, with a tunning parameter \lambda. For example,
\min_{ \bf{\theta}}\ \sum_{(y_i,\bf{x}_i)\in\mathcal{S}_t}(y_i-f(\bf{x_i}| \bf{\theta}))^2/N + \lambda\left\Vert \bf{\theta}\right\Vert^2, \ or \min_{ \bf{\theta}}\ \left\Vert \bf{y}-f(\bf{x}| \bf{\theta})\right\Vert^2/N + \lambda\left\Vert \bf{\theta}\right\Vert^2, where \lambda>0 is called a tunning parameter that needs to be chosen in the training phase as well.
- Addinng a prior
12.3.2 Cross-validation to choose tunning parameter
12.4 Python example
Intall the basic Python machine learning package:
- cross-validation:
- Regularization of Linear Regression Model:
- Ridge regression
\min_{w} || X w - y||_2^2 + \alpha ||w||_2^2
linear_model.Ridge(alpha=0.5) # instance initiation linear_model.RidgeCV()
- Lasso regression
\min_{w} { \frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}
linear_model.Lasso() # instance initiation
- Ridge regression
- Stochastic Gradient Descent:
- SGDRegressor:
a batch size of 1, stochastic gradient descent algorithm.
SGDRegressor( loss="squared_loss", # OLS Loss function panelty="l2", # squared norm of parameters alpha=0.0001 )
其他參數還有max_it(全部data可pass through的次數,即epoch,上限),
Sample\ size(1000) / Batch\ size (250) = Iteration\ times (4) One epoch is one complete sample size passing through, which requires four iterations to complete.
- SGDRegressor:
## choose your model class
from sklearn.linear_model import LinearRegression, Ridge, RidgeCV, Lasso, LassoCV, SGDRegressor
## choose your metrics
from sklearn.metrics import mean_squared_error, r2_score
## training-test split
from sklearn.model_selection import train_test_split
import numpy as np
import matplotlib.pyplot as plt
rng=np.random.default_rng(seed=2019) # initiate a random generator with seed 2019, for replication purpose
x = rng.normal(size=30)
y = 0 +0.1* x + 0.33*rng.normal(size=x.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
12.4.1 模型設定
# Linear regression
# Ridge regression
Ridge2=RidgeCV(alphas=np.linspace(0.1,3,10), cv=5) # 5-fold cv
# Lasso regression
Lasso2=LassoCV(alphas=np.linspace(0.1,3,10), cv=5) # 5-fold cv
# SGD regression
12.4.2 Training
## LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
n_batches = 3
(n, m)=X_train.shape
rng.shuffle(split_index) # 洗牌
for slice_index in np.array_split(split_index, n_batches):
12.4.3 Predict
# Linear regression
# Ridge regression
y_predictRidge2=Ridge2.predict(X_test) # 5-fold cv
# Lasso regression
y_predictLasso2=Lasso2.predict(X_test) # 5-fold cv
# SGD regression
12.4.4 accuracy
# Ridge regression
mseRidge2=mean_squared_error(y_test,y_predictRidge2) # 5-fold cv
# Lasso regression
mseLasso2=mean_squared_error(y_test,y_predictLasso2) # 5-fold cv
# SGD regression
{"mseLinear1": mseLinear1,
# Ridge regression
"mseRidge1": mseRidge1,
"mseRidge2": mseRidge2,
# Lasso regression
"mseLasso1": mseLasso1,
"mseLasso2": mseLasso2,
# SGD regression
"mseSGD1": mseSGD1,
"mseSGD2": mseSGD2
## {'mseLinear1': 0.23913965444458862, 'mseRidge1': 0.22923260522059308, 'mseRidge2': 0.20688285341153212, 'mseLasso1': 0.164949079025128, 'mseLasso2': 0.17400205980219802, 'mseSGD1': 0.18901352025558144, 'mseSGD2': 0.15951178277143552}