5 Time Series Models
This section summarizes the fundamental models used in time series forecasting, from classical statistical methods to modern deep learning architectures.
5.1 Classical Statistical Models
5.1.1 Naïve Forecast
The Naïve Forecast (Random Walk) is the simplest baseline model in time series forecasting. It assumes that the best prediction for the next period is simply the most recent observed value.
Formula:
\[ \hat{y}_{t+1} = y_t \]
In other words, the forecast “walks” forward by repeating the last value, making it equivalent to a random walk with no drift. Let consider, the naïve forecast for the next 5 periods:
Interpretation
The Observed line shows the true historical values of the time series. The Naïve Forecast appears as a flat horizontal line. This happens because the naïve method simply repeats the last observed value. Mathematically, If the final observed value is \(y_T\), then all future predictions are:
\[\hat{y}_{T+1}=\hat{y}_{T+2}=\cdots=\hat{y}_{T+h}=y_T\]
This means that the forecast does not change over the horizon, it stays constant at the last known value. The naïve method is widely used as a benchmark model in forecasting because any more advanced model should perform at least better than this simple baseline.
5.1.2 Simple Moving Average
The Simple Moving Average (SMA) is one of the fundamental smoothing methods in time series analysis [1]. It reduces short-term fluctuations and highlights longer-term patterns by averaging the most recent \(k\) observations.
Formula
For a window size \(k\), the SMA is defined as:
\[ \hat{y}_t = \frac{1}{k} \sum_{i=0}^{k-1} y_{t-i} \]
This means the forecast at time \(t\) is the average of the most recent \(k\) values. Below is an example of the SMA with window size \(k = 3\) applied to a sample time series.
Interpretation SMA:
The Observed line shows the original time series with its natural variability. The SMA (Smoothed) line reduces noise by averaging the most recent \(k\) observations. This creates a smoother representation of the underlying trend. When forecasting, the SMA method typically uses the last computed SMA value as the prediction for all future periods:
\[\hat{y}_{T+1}=\hat{y}_{T+2}=\cdots=\hat{y}_{T+h}=\hat{y}_T\]
This produces a flat forecast, similar to the naïve method, but based on the smoothed series rather than the last raw observation. The SMA method is widely used as an introductory smoothing technique and serves as a baseline for more advanced exponential smoothing models.
5.1.3 Exponential Smoothing
Exponential smoothing methods are a family of forecasting techniques that apply exponentially decreasing weights to past observations. They are widely used because of their simplicity, interpretability, and strong forecasting performance. According to [2], there are three main types of exponential smoothing models, each designed for different time series patterns:
| Method | Description | Suitable For |
|---|---|---|
| Single Exponential Smoothing (SES) | Applies smoothing to the level only. | Data without trend and without seasonality |
| Holt’s Linear Trend Method | Extends SES by adding a smoothed trend component. | Data with trend, but without seasonality |
| Holt–Winters Method | Extends Holt’s method by adding a seasonal component (additive or multiplicative). | Data with trend and with seasonality |
Each model builds upon the previous one by incorporating more structure from the time series, making exponential smoothing a flexible and progressive forecasting framework.
Single Exponential Smoothing (SES)
Single Exponential Smoothing (SES) is designed for time series without trend and without seasonality. It provides a smoothed estimate by applying exponentially decreasing weights to past observations.
Formula
\[\hat{y}_{t+1}= \alpha y_t + (1-\alpha)\hat{y}_t\]
where:
- \(\alpha\) is the smoothing parameter, \(0 < \alpha \le 1\)
- \(y_t\) is the actual value at time \(t\)
- \(\hat{y}_t\) is the previous smoothed estimate
A higher value of \(\alpha\) gives more weight to recent observations, making the model more responsive to sudden changes. A lower value of \(\alpha\) produces a smoother curve, giving more weight to older data.
library(forecast)
library(plotly)
# --- Sample Time Series ---
set.seed(123)
y <- ts(cumsum(rnorm(30, 0.3, 1))) # example series
# --- SES Model ---
alpha_value <- 0.4 # you can change alpha (0 < alpha <= 1)
ses_model <- ses(y, alpha = alpha_value, h = 6) # SES with forecast horizon h=6
# Extract fitted values and forecasts
fitted_vals <- ses_model$fitted
forecast_vals <- ses_model$mean
t <- 1:length(y)
t_future <- (length(y)+1):(length(y) + length(forecast_vals))
# Build data frame for plot
df <- data.frame(
time = c(t, t_future),
value = c(y, as.numeric(forecast_vals)),
type = c(rep("Observed", length(y)), rep("SES Forecast", length(forecast_vals)))
)
df_fitted <- data.frame(
time = t,
value = as.numeric(fitted_vals),
type = "SES (Smoothed)"
)
# --- Plotly Visualization ---
plot_ly() %>%
add_lines(data = df[df$type == "Observed", ],
x = ~time, y = ~value,
name = "Observed") %>%
add_lines(data = df_fitted,
x = ~time, y = ~value,
name = "SES (Smoothed)") %>%
add_lines(data = df[df$type == "SES Forecast", ],
x = ~time, y = ~value,
name = "SES Forecast",
line = list(dash = "dash")) %>%
layout(
title = "Single Exponential Smoothing (SES)",
xaxis = list(title = "Time"),
yaxis = list(title = "Value")
)Interpretation:
The Observed line represents the original time series values. The SES (Smoothed) line applies exponential smoothing, where recent observations receive a higher weight controlled by the smoothing parameter \(\alpha\). A larger value of \(\alpha\) (closer to 1) makes the smoothed line react more quickly to changes in the data, while a smaller value of \(\alpha\) (closer to 0) produces a smoother curve that responds more gradually.
Because SES does not model trend or seasonality, the forecast is constant over the horizon:
\[\hat{y}_{T+1}=\hat{y}_{T+2}=\cdots=\hat{y}_{T+h}=\hat{y}_T \]
SES is therefore suitable only for series with no trend and no seasonality, and it serves as the foundation for more advanced exponential smoothing techniques such as Holt (for trend) and Holt–Winters (for seasonality).
Holt’s Linear Trend Method
Holt’s Linear Trend Method extends Single Exponential Smoothing (SES) by adding a trend component, making it suitable for time series that exhibit a consistent upward or downward trend. This method smooths both the level and the trend, allowing the forecast to grow or decline over time.
Model Equations
The method consists of three components:
- Level equation: updates the smoothed estimate of the current value
- Trend equation: updates the smoothed estimate of the trend
- Forecast equation: projects the level and trend forward
\[ \begin{aligned} l_t &= \alpha y_t + (1-\alpha)(l_{t-1}+b_{t-1}) \\ b_t &= \beta (l_t - l_{t-1}) + (1-\beta)b_{t-1} \\ \hat{y}_{t+h} &= l_t + h b_t \end{aligned} \]
where:
- \(\alpha\) = level smoothing parameter, \(0 < \alpha \le 1\)
- \(\beta\) = trend smoothing parameter, \(0 < \beta \le 1\)
- \(l_t\) = estimated level at time \(t\)
- \(b_t\) = estimated trend at time \(t\)
- \(\hat{y}_{t+h}\) = forecast \(h\) periods ahead
Holt’s method allows forecasts to follow a straight-line trajectory, capturing the underlying linear trend in the data.
library(forecast)
library(plotly)
# --- Sample Time Series ---
set.seed(123)
y <- ts(cumsum(rnorm(30, 0.8, 1))) # trending series
# --- Holt Model ---
holt_model <- holt(y, h = 6, alpha = 0.5, beta = 0.3)
# Extract components
fitted_vals <- holt_model$fitted
forecast_vals <- holt_model$mean
t <- 1:length(y)
t_future <- (length(y)+1):(length(y) + length(forecast_vals))
# Data for plotting
df <- data.frame(
time = c(t, t_future),
value = c(y, as.numeric(forecast_vals)),
type = c(rep("Observed", length(y)), rep("Holt Forecast", length(forecast_vals)))
)
df_fitted <- data.frame(
time = t,
value = as.numeric(fitted_vals),
type = "Holt (Smoothed)"
)
# --- Plotly Visualization ---
plot_ly() %>%
add_lines(data = df[df$type == "Observed", ],
x = ~time, y = ~value,
name = "Observed") %>%
add_lines(data = df_fitted,
x = ~time, y = ~value,
name = "Holt (Smoothed)") %>%
add_lines(data = df[df$type == "Holt Forecast", ],
x = ~time, y = ~value,
name = "Holt Forecast",
line = list(dash = "dash")) %>%
layout(
title = "Holt’s Linear Trend Method",
xaxis = list(title = "Time"),
yaxis = list(title = "Value")
)Interpretation:
The Observed line shows the original time series, which exhibits a visible upward trend. Holt’s method captures this behavior by smoothing two components:
- Level (\(l_t\)) — the underlying value of the series
- Trend (\(b_t\)) — the rate of increase or decrease
The Holt (Smoothed) line represents the result of applying exponential smoothing to both the level and the trend. Compared with SES, Holt’s method does not produce a flat curve—because it explicitly models and updates the trend. Holt’s method is appropriate for time series with a consistent linear trend, but without seasonality. It forms the foundation for the Holt–Winters method, which adds a seasonal component for more complex patterns.
Holt–Winters Method
The Holt–Winters method is used when the time series exhibits both trend and seasonality. There are two types: additive and multiplicative. Here, we use the additive version, which is suitable for seasonal patterns with roughly constant amplitude.
Model Equations (Additive)
\[ \begin{aligned} l_t &= \alpha (y_t - s_{t-m}) + (1-\alpha)(l_{t-1}+b_{t-1}) \\ b_t &= \beta (l_t - l_{t-1}) + (1-\beta)b_{t-1} \\ s_t &= \gamma (y_t - l_t) + (1-\gamma)s_{t-m} \\ \hat{y}_{t+h} &= l_t + h b_t + s_{t-m+h} \end{aligned} \]
where:
- \(l_t\) = level
- \(b_t\) = trend
- \(s_t\) = seasonal component
- \(m\) = seasonal period (e.g., \(m = 12\) for monthly data)
- \(\alpha, \beta, \gamma\) = smoothing parameters
library(plotly)
library(forecast)
# Example seasonal + trend data
set.seed(123)
t <- 1:60
season <- rep(c(10, 12, 15, 14), 15) # seasonal pattern (period = 4)
trend <- 0.5 * t # upward trend
noise <- rnorm(60, 0, 2)
y <- season + trend + noise
ts_data <- ts(y, frequency = 4)
# Holt-Winters Additive Model
model_hw <- HoltWinters(ts_data, seasonal = "additive")
# Forecast 8 steps ahead
fc <- forecast(model_hw, h = 8)
# Build plotly visualization
p <- plot_ly() %>%
add_lines(x = time(ts_data), y = ts_data,
name = "Observed") %>%
add_lines(x = time(model_hw$fitted), y = model_hw$fitted[,1],
name = "HW Fitted") %>%
add_lines(x = time(fc$mean), y = fc$mean,
name = "Forecast", line = list(dash = "dash")) %>%
layout(title = "Holt–Winters Additive Method",
xaxis = list(title = "Time"),
yaxis = list(title = "Value"))
pInterpretation:
In the Holt–Winters method, the smoothing parameters control how much weight is given to recent observations versus past values. They range between 0 and 1:
Level Smoothing (\(\alpha\))
- Determines how much the current observation \(y_t\) influences the estimated level \(l_t\).
- High \(\alpha\) (close to 1) → the level reacts quickly to recent changes.
- Low \(\alpha\) (close to 0) → the level changes slowly, giving more weight to historical values.
\[ l_t = \alpha (y_t - s_{t-m}) + (1-\alpha)(l_{t-1}+b_{t-1}) \]
Trend Smoothing (\(\beta\))
- Controls the update of the trend component \(b_t\).
- High \(\beta\) → trend responds quickly to changes in slope.
- Low \(\beta\) → trend is more stable and less sensitive to short-term fluctuations.
\[ b_t = \beta (l_t - l_{t-1}) + (1-\beta)b_{t-1} \]
Seasonal Smoothing (\(\gamma\))
- Determines how fast the seasonal component \(s_t\) adapts to new seasonal patterns.
- High \(\gamma\) → seasonal estimates adjust quickly to changes.
- Low \(\gamma\) → seasonal pattern changes slowly over time.
\[ s_t = \gamma (y_t - l_t) + (1-\gamma)s_{t-m} \]
5.1.4 ARIMA
AR(\(p\)): Autoregressive
The Autoregressive (AR) model predicts a time series based on its own past values. It is widely used in time series forecasting and forms the foundation for ARMA/ARIMA models.
Model Formula
\[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t \]
where:
- \(y_t\) = value of the series at time \(t\)
- \(c\) = constant term (intercept)
- \(\phi_1, \phi_2, \dots, \phi_p\) = autoregressive coefficients
- \(p\) = order of the AR model (number of lagged terms)
- \(\epsilon_t\) = white noise error term, \(\epsilon_t \sim N(0, \sigma^2)\)
Key Points:
- The model assumes that the current value \(y_t\) is a linear combination of its past \(p\) values plus random noise.
- The coefficients \(\phi_i\) determine how much influence each lagged term has.
- The order \(p\) can be selected using information criteria (AIC, BIC) or autocorrelation plots.
library(forecast)
library(plotly)
# --- Simulate AR(2) Process ---
set.seed(123)
ar_coeff <- c(0.6, -0.3) # phi1=0.6, phi2=-0.3
y <- arima.sim(n = 50, list(ar = ar_coeff), sd = 1)
# Fit AR model
fit <- arima(y, order = c(2,0,0)) # AR(2)
# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)
# Time index
t <- 1:length(y)
t_future <- (length(y)+1):(length(y)+length(fc$mean))
# Build data frame
df <- data.frame(
time = c(t, t_future),
value = c(y, as.numeric(fc$mean)),
type = c(rep("Observed", length(y)), rep("AR(2) Forecast", length(fc$mean)))
)
# Plotly visualization
plot_ly() %>%
add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
name = "Observed") %>%
add_lines(data = df[df$type=="AR(2) Forecast",], x = ~time, y = ~value,
name = "Forecast", line = list(dash = "dash")) %>%
layout(title = "Autoregressive AR(2) Model",
xaxis = list(title = "Time"),
yaxis = list(title = "Value"))Interpretation
- If \(\phi_i\) is positive, the series tends to continue in the same direction as lag \(i\).
- If \(\phi_i\) is negative, the series tends to move in the opposite direction of lag \(i\).
- AR models are stationary if the roots of the characteristic equation lie outside the unit circle.
MA(\(q\)): Moving Average
The Moving Average (MA) model expresses the current value of a time series as a linear combination of past error terms.
It is commonly used in time series modeling, often as part of ARMA or ARIMA models.
Model Formula
\[ y_t = c + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t \]
where:
- \(y_t\) = value of the series at time \(t\)
- \(c\) = constant term (intercept)
- \(\theta_1, \theta_2, \dots, \theta_q\) = moving average coefficients
- \(q\) = order of the MA model (number of past error terms included)
- \(\epsilon_t\) = white noise error term, \(\epsilon_t \sim N(0, \sigma^2)\)
Key Points:
- Unlike AR models, MA models do not use past values of \(y_t\), but instead use past forecast errors.
- Each coefficient \(\theta_i\) measures the influence of the lagged errors on the current value.
- The order \(q\) can be determined by examining the autocorrelation function (ACF):
- The ACF of an MA(\(q\)) process cuts off after lag \(q\).
library(forecast)
library(plotly)
# --- Simulate MA(2) Process ---
set.seed(123)
ma_coeff <- c(0.7, -0.4) # theta1=0.7, theta2=-0.4
y <- arima.sim(n = 50, list(ma = ma_coeff), sd = 1)
# Fit MA model
fit <- arima(y, order = c(0,0,2)) # MA(2)
# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)
# Time index
t <- 1:length(y)
t_future <- (length(y)+1):(length(y)+length(fc$mean))
# Build data frame
df <- data.frame(
time = c(t, t_future),
value = c(y, as.numeric(fc$mean)),
type = c(rep("Observed", length(y)), rep("MA(2) Forecast", length(fc$mean)))
)
# Plotly visualization
plot_ly() %>%
add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
name = "Observed") %>%
add_lines(data = df[df$type=="MA(2) Forecast",], x = ~time, y = ~value,
name = "Forecast", line = list(dash = "dash")) %>%
layout(title = "Moving Average MA(2) Model",
xaxis = list(title = "Time"),
yaxis = list(title = "Value"))Interpretation:
- The current value \(y_t\) is affected by the most recent \(q\) random shocks \(\epsilon_{t-1}, \dots, \epsilon_{t-q}\).
- Positive \(\theta_i\) → the lagged shock pushes \(y_t\) in the same direction.
- Negative \(\theta_i\) → the lagged shock pushes \(y_t\) in the opposite direction.
- MA models are stationary by definition, so there is no need for differencing as in AR models.
- MA is often combined with AR to form ARMA(\(p,q\)) for better capturing serial dependence in data.
ARMA(\(p,q\)): Autoregressive Moving Average
The ARMA(\(p,q\)) model combines both autoregressive (AR) and moving average (MA) components, capturing the effects of both past values and past errors on the current observation.
Model Formula
\[ y_t = c + \phi_1 y_{t-1} + \dots + \phi_p y_{t-p} + \theta_1 \epsilon_{t-1} + \dots + \theta_q \epsilon_{t-q} + \epsilon_t \]
where:
- \(y_t\) = value at time \(t\)
- \(c\) = constant term (intercept)
- \(\phi_1, \dots, \phi_p\) = AR coefficients
- \(\theta_1, \dots, \theta_q\) = MA coefficients
- \(p\) = order of the AR part
- \(q\) = order of the MA part
- \(\epsilon_t\) = white noise error term, \(\epsilon_t \sim N(0, \sigma^2)\)
Key Points:
- The AR component captures the influence of past values on \(y_t\).
- The MA component captures the influence of past shocks (errors) on \(y_t\).
- The orders \(p\) and \(q\) can be selected using information criteria (AIC, BIC) or by examining the ACF and PACF plots.
- ARMA models assume stationarity, so non-stationary series may require differencing (ARIMA).
library(forecast)
library(plotly)
# --- Simulate ARMA(2,2) Process ---
set.seed(123)
ar_coeff <- c(0.6, -0.3) # AR(2)
ma_coeff <- c(0.5, -0.4) # MA(2)
y <- arima.sim(n = 50, list(ar = ar_coeff, ma = ma_coeff), sd = 1)
# Fit ARMA model (ARMA(2,2))
fit <- arima(y, order = c(2,0,2))
# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)
# Time index
t <- 1:length(y)
t_future <- (length(y)+1):(length(y)+length(fc$mean))
# Build data frame
df <- data.frame(
time = c(t, t_future),
value = c(y, as.numeric(fc$mean)),
type = c(rep("Observed", length(y)), rep("ARMA(2,2) Forecast", length(fc$mean)))
)
# Build fitted values data frame
df_fitted <- data.frame(
time = t,
value = as.numeric(fitted(fit)),
type = "ARMA(2,2) Fitted"
)
# Plotly visualization
plot_ly() %>%
add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
name = "Observed") %>%
add_lines(data = df_fitted, x = ~time, y = ~value,
name = "Fitted", line = list(color = 'green')) %>%
add_lines(data = df[df$type=="ARMA(2,2) Forecast",], x = ~time, y = ~value,
name = "Forecast", line = list(dash = "dash", color = 'red')) %>%
layout(title = "ARMA(2,2) Model",
xaxis = list(title = "Time"),
yaxis = list(title = "Value"))Interpretation:
- ARMA(\(p,q\)) captures both momentum and shocks:
- AR part models persistence (trend or autocorrelation)
- MA part models short-term noise effects
- AR part models persistence (trend or autocorrelation)
- Forecasting with ARMA(\(p,q\)):
- Future values are predicted using a combination of past observations and past errors.
- Provides a more flexible model than AR or MA alone, suitable for stationary time series with autocorrelation and short-term shocks.
- Future values are predicted using a combination of past observations and past errors.
- Advantages:
- Can model a wide variety of stationary time series patterns.
- Forms the basis for ARIMA when differencing is added for non-stationary series.
- Can model a wide variety of stationary time series patterns.
ARIMA(\(p,d,q\)): Autoregressive Integrated Moving Average
The ARIMA(\(p,d,q\)) model extends ARMA models to non-stationary time series by incorporating differencing. It is one of the most widely used models for time series forecasting.
Model Formula:
\[ \nabla^d y_t = (1 - B)^d y_t = ARMA(p,q) \]
where:
- \(y_t\) = original time series at time \(t\)
- \(B\) = backshift operator, \(B y_t = y_{t-1}\)
- \(\nabla^d\) = differencing operator applied \(d\) times, \(\nabla^d y_t = (1-B)^d y_t\)
- \(p\) = order of the AR part
- \(d\) = order of differencing required to make the series stationary
- \(q\) = order of the MA part
Key Points:
Differencing (\(d\)): Removes trend or seasonality to achieve stationarity.
Once differenced, the series can be modeled with an ARMA(\(p,q\)) structure.
ARIMA is denoted as ARIMA(\(p,d,q\)):
- \(p\): number of autoregressive terms
- \(d\): number of differences
- \(q\): number of moving average terms
- \(p\): number of autoregressive terms
library(forecast)
library(plotly)
# --- Simulate non-stationary series with trend ---
set.seed(123)
t <- 1:60
trend <- 0.5 * t
noise <- rnorm(60, 0, 1)
y <- trend + noise
ts_data <- ts(y)
# Plot original series
# Fit ARIMA model (auto.arima to select p,d,q)
fit <- auto.arima(ts_data)
# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)
# Time index
t_obs <- 1:length(ts_data)
t_future <- (length(ts_data)+1):(length(ts_data)+length(fc$mean))
# Data frames
df <- data.frame(
time = c(t_obs, t_future),
value = c(ts_data, as.numeric(fc$mean)),
type = c(rep("Observed", length(ts_data)), rep("Forecast", length(fc$mean)))
)
df_fitted <- data.frame(
time = t_obs,
value = as.numeric(fitted(fit)),
type = "Fitted"
)
# Plotly visualization
plot_ly() %>%
add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
name = "Observed") %>%
add_lines(data = df_fitted, x = ~time, y = ~value,
name = "Fitted", line = list(color = 'green')) %>%
add_lines(data = df[df$type=="Forecast",], x = ~time, y = ~value,
name = "Forecast", line = list(dash = "dash", color = 'red')) %>%
layout(title = "ARIMA Forecast",
xaxis = list(title = "Time"),
yaxis = list(title = "Value"))Interpretation:
- Non-stationary series (with trend or evolving mean) cannot be directly modeled with ARMA.
- Differencing transforms the series into a stationary series.
- ARIMA forecasts combine:
- Past values (\(p\)),
- Past errors (\(q\)),
- Differencing to handle non-stationarity (\(d\)).
- Past values (\(p\)),
- ARIMA forms the foundation for seasonal models (SARIMA) when seasonality is included.
5.1.5 SARIMA / SARIMAX
SARIMA and SARIMAX are extensions of ARIMA that allow modeling of seasonal patterns and external regressors.
SARIMA (Seasonal ARIMA)
SARIMA incorporates seasonality using seasonal autoregressive (P), seasonal differencing (D), and seasonal moving average (Q) components, along with the seasonal period \(s\).
Model notation:
\[ SARIMA(p,d,q)(P,D,Q)_s \]
- \(p,d,q\) = non-seasonal AR, differencing, MA orders
- \(P,D,Q\) = seasonal AR, seasonal differencing, seasonal MA orders
- \(s\) = length of the seasonal cycle (e.g., 12 for monthly data)
Equation (conceptual):
\[ \nabla^d \nabla_s^D y_t = ARMA(p,q) + ARMA(P,Q)_s \]
where \(\nabla_s^D y_t = (1-B^s)^D y_t\) is seasonal differencing.
SARIMAX (Seasonal ARIMA with eXogenous variables)
SARIMAX extends SARIMA by adding external regressors \(X_t\):
\[ y_t = SARIMA(p,d,q)(P,D,Q)_s + \beta X_t + \epsilon_t \]
- \(X_t\) = external variables that may affect the time series
- \(\beta\) = coefficients for regressors
Key Points:
- SARIMA is suitable for seasonal time series, e.g., monthly sales with yearly seasonality.
- SARIMAX allows external influences, e.g., promotions, holidays, temperature.
- Seasonal parameters \((P,D,Q,s)\) capture repeating patterns in data.
- Model selection can use AIC/BIC, ACF/PACF plots, and seasonal diagnostics.
5.1.6 SARIMA and SARIMAX
library(forecast)
library(plotly)
# --- Simulate seasonal time series ---
set.seed(123)
t <- 1:60
seasonal <- rep(c(10, 12, 15, 14, 11), 12)[1:60] # seasonal pattern (period s=5)
trend <- 0.3 * t
noise <- rnorm(60, 0, 1)
y <- trend + seasonal + noise
ts_data <- ts(y, frequency = 5)
# --- External regressor for SARIMAX ---
x <- rnorm(60, 5, 1) # exogenous variable
# Fit SARIMA model
fit_sarima <- auto.arima(ts_data, seasonal = TRUE)
# Fit SARIMAX model
fit_sarimax <- auto.arima(ts_data, xreg = x, seasonal = TRUE)
# Forecast 10 steps ahead
x_future <- rnorm(10, 5, 1) # future exogenous variable for SARIMAX
fc_sarima <- forecast(fit_sarima, h = 10)
fc_sarimax <- forecast(fit_sarimax, xreg = x_future, h = 10)
# Time index
t_obs <- 1:length(ts_data)
t_future <- (length(ts_data)+1):(length(ts_data)+10)
# Build data frames for plotting
df_sarima <- data.frame(
time = c(t_obs, t_future),
value = c(ts_data, as.numeric(fc_sarima$mean)),
type = c(rep("Observed", length(ts_data)), rep("SARIMA Forecast", 10))
)
df_sarimax <- data.frame(
time = c(t_obs, t_future),
value = c(ts_data, as.numeric(fc_sarimax$mean)),
type = c(rep("Observed", length(ts_data)), rep("SARIMAX Forecast", 10))
)
df_fitted_sarima <- data.frame(
time = t_obs,
value = as.numeric(fitted(fit_sarima)),
type = "SARIMA Fitted"
)
df_fitted_sarimax <- data.frame(
time = t_obs,
value = as.numeric(fitted(fit_sarimax)),
type = "SARIMAX Fitted"
)
# Plotly visualization
plot_ly() %>%
add_lines(data = df_sarima[df_sarima$type=="Observed",], x = ~time, y = ~value,
name = "Observed") %>%
add_lines(data = df_fitted_sarima, x = ~time, y = ~value,
name = "SARIMA Fitted", line = list(color = 'green')) %>%
add_lines(data = df_sarima[df_sarima$type=="SARIMA Forecast",], x = ~time, y = ~value,
name = "SARIMA Forecast", line = list(dash = "dash", color = 'red')) %>%
add_lines(data = df_fitted_sarimax, x = ~time, y = ~value,
name = "SARIMAX Fitted", line = list(color = 'blue')) %>%
add_lines(data = df_sarimax[df_sarimax$type=="SARIMAX Forecast",], x = ~time, y = ~value,
name = "SARIMAX Forecast", line = list(dash = "dash", color = 'purple')) %>%
layout(title = "SARIMA vs SARIMAX Forecast",
xaxis = list(title = "Time"),
yaxis = list(title = "Value"))Interpretation:
- Seasonality
- Seasonal AR (\(P\)) captures dependence on previous seasonal cycles.
- Seasonal MA (\(Q\)) captures seasonal shocks.
- Seasonal differencing (\(D\)) removes repeating seasonal trends.
- Seasonal AR (\(P\)) captures dependence on previous seasonal cycles.
- Exogenous Variables (SARIMAX only)
- The model accounts for additional predictors outside the time series itself.
- Useful when external events significantly impact the series.
- The model accounts for additional predictors outside the time series itself.
- Forecasting
- Combines non-seasonal ARMA dynamics, seasonal patterns, and external regressors (if any).
- Produces more accurate predictions for seasonal and influenced time series.
- Combines non-seasonal ARMA dynamics, seasonal patterns, and external regressors (if any).
5.2 Machine Learning Models
5.2.1 Linear Regression
Linear Regression for Time Series [3], uses time-dependent features such as:
- \(y_{t-1}, y_{t-2}, \ldots\)
- moving averages
- sine/cosine seasonal features
5.2.2 Non-Linear Regression
Random Forest, XGBoost, and LightGBM [4], tree-based models capable of capturing nonlinear patterns.
5.3 Deep Learning Models
5.3.1 Recurrent Neural Networks
Recurrent Neural Networks (RNN) [5], designed to capture sequential dependencies.
5.3.2 Long Short-Term Memory
Long Short-Term Memory (LSTM) [6], effective for long-range temporal dependencies.
5.3.3 Gated Recurrent Unit
Gated Recurrent Unit (GRU) [7], a simplified alternative to LSTM with similar performance.
5.3.4 CNN for Time Series
CNN for Time Series [8], uses convolutional filters to detect local temporal patterns.
5.3.5 Encoder–Decoder
Encoder–Decoder (Seq2Seq) Models [9], designed for multi-step or multi-horizon forecasting.
5.3.6 Transformer Architectures
Transformer Architectures [10], examples include Informer, Autoformer, and FEDformer.
Strengths:
- Long-range dependency modeling
- Strong performance on large datasets
5.4 Model Selection Summary
| Data Pattern | Suitable Models |
|---|---|
| No strong pattern | Naïve, SMA |
| Trend | Holt, ARIMA |
| Trend + Seasonality | Holt–Winters, SARIMA |
| External regressors | ARIMAX, SARIMAX |
| Nonlinear structure | Random Forest, XGBoost |
| Complex dependencies | LSTM, GRU, CNN, Encoder–Decoder |
| Long-range forecasting | Transformers |