8  Least Squares and Applications

The Least Squares method is a foundational statistical technique used to model the relationship between variables and predict outcomes. By minimizing the sum of squared differences between observed data points and the values predicted by a model, it ensures the best fit for a given dataset. This approach is widely applied across various fields such as data analysis, engineering, economics, and machine learning.

This document explores the Least Squares method, focusing on its application in linear regression. Practical examples in Python are provided to demonstrate how to implement this method and interpret results effectively.

8.1 Least Squares Method

The Least Squares Method is a statistical technique used to find the best-fitting line through a set of data points. In the context of simple linear regression, this method is used to minimize the sum of squared differences between the observed data points and the predicted values by the model.

8.2 Linear Regression Model and Matrix Equation

In simple linear regression, we aim to find a line that best fits the data. Let’s consider we have a dataset with n data points (x1,y1),(x2,y2),,(xn,yn). The linear regression model is represented as:



  • yi is the observed value,
  • xi is the predictor (independent variable),
  • β0 is the intercept,
  • β1 is the slope of the line,
  • ϵi is the residual (error term) for each data point.

We can write this equation for all data points in a vector and matrix form as:


8.3 Finding the Coefficients β Using Least Squares

In linear regression, the primary objective is to find the best-fitting line that represents the relationship between the independent variable (X) and the dependent variable (Y). To measure how well the line fits the data, we use the concept of the Residual Sum of Squares (RSS).

Residuals (ϵ) are the differences between the actual values (Y) and the predicted values from the model (Xβ), expressed as:


The RSS is calculated by summing the squared residuals across all data points, which gives the formula:


Expanding this residual sum of squares (RSS) as:


Expanding this quadratic form:



  • YTY is a scalar resulting from the dot product of Y with itself.
  • 2βTXTY is the cross-term representing the interaction between predictors and the response.
  • βTXTXβ is the quadratic term involving the coefficients β.

To minimize RSS, differentiate with respect to β:


The derivatives are:

  • β(YTY)=0, as YTY is independent of β.
  • β(2βTXTY)=2XTY.
  • β(βTXTXβ)=2XTXβ.

Combining these: RSSβ=2XTY+2XTXβ

To find the value of β that minimizes RSS, set the derivative to zero:




8.4 Solving the Normal Equation

To find β, we solve the normal equation:


This gives us the values of the coefficients β0 and β1 (or other coefficients in a more complex model). This solution involves matrix operations, such as matrix multiplication and matrix inversion.

8.5 Linear Regression Example

8.5.1 Data

We have the following data:

x y
1 2.197622
2 5.849113
3 16.793542
4 11.352542
5 13.646439
6 23.575325
7 19.304581
8 12.674694
9 17.565736
10 20.771690

Python can be applied to generate data as the following code:

import numpy as np
import pandas as pd

# Set seed for reproducibility

# Create the data
x = np.arange(1, 11)
y = 2 * x + 3 + np.random.normal(0, 5, 10)

# Create a DataFrame
data = pd.DataFrame({'x': x, 'y': y})

# Display the data
x <- 1:10
y <- 2 * x + 3 + rnorm(10, mean = 0, sd = 5)
data <- data.frame(x, y)

# Display the data
    x         y
1   1  2.197622
2   2  5.849113
3   3 16.793542
4   4 11.352542
5   5 13.646439
6   6 23.575325
7   7 19.304581
8   8 12.674694
9   9 17.565736
10 10 20.771690

8.5.2 Linear Regression Equation

The linear regression model for this data can be written as:



  • yi is the predicted value,
  • xi is the input data,
  • β0 is the intercept,
  • β1 is the slope, and
  • ϵi is the error.

8.5.3 Matrix X and Vector y

Create matrix X, which consists of the first column of ones for the intercept and the second column containing the data xi, and vector y containing the values yi.

import numpy as np

# Assuming data is already defined as a pandas DataFrame
X = np.column_stack((np.ones(len(data)), data['x']))  # Add a column of ones for the intercept
y = data['y'].values  # Convert the 'y' column to a numpy array

# Display X and y
print("X:\n", X)
print("y:\n", y)
# Matrix X and vector y
X <- cbind(1, data$x)  # Add a column of ones for the intercept
y <- data$y

      [,1] [,2]
 [1,]    1    1
 [2,]    1    2
 [3,]    1    3
 [4,]    1    4
 [5,]    1    5
 [6,]    1    6
 [7,]    1    7
 [8,]    1    8
 [9,]    1    9
[10,]    1   10
 [1]  2.197622  5.849113 16.793542 11.352542 13.646439 23.575325 19.304581
 [8] 12.674694 17.565736 20.771690

8.5.4 Compute XTX

Next, we compute XTX:


import numpy as np

# Assuming X is already defined as a numpy array
X_t_X = np.dot(X.T, X)

# Display the result
# Compute X'X
X_t_X <- t(X) %*% X
     [,1] [,2]
[1,]   10   55
[2,]   55  385

8.5.5 Compute XTy

Now, we compute XTy:


# Compute X'Y
X_t_y = np.dot(X.T, y)

# Display the result
# Compute X'Y
X_t_y <- t(X) %*% y
[1,] 143.7313
[2,] 921.7089

8.5.6 Compute the Inverse of XTX

To compute (XTX)1, we use the matrix inverse function:

# Compute the inverse of X'X
inv_X_t_X = np.linalg.inv(X_t_X)

# Display the result
# Compute the inverse of X'X
inv_X_t_X <- solve(X_t_X)
            [,1]        [,2]
[1,]  0.46666667 -0.06666667
[2,] -0.06666667  0.01212121

8.6 7. Compute the Vector β

Now we can compute the vector β:


# Compute the beta vector
beta = np.dot(inv_X_t_X, X_t_y)

# Display the result
# Compute the beta vector
beta <- inv_X_t_X %*% X_t_y
[1,] 5.627337
[2,] 1.590144

8.7 8. Linear Regression Equation

Thus, the estimated regression coefficients are:


Therefore the final regression equation is become:


8.8 Applications of Least Squares

8.8.1 Data Analysis

Predict relationships between variables (e.g., sales vs. advertising spend).

8.8.2 Physics and Engineering

Fit theoretical models to experimental data.

8.8.3 Economics and Logistics

Optimize cost and demand models.

8.8.4 Image Processing

Reduce noise in images by fitting pixel values.