5 Time Series Models

This section summarizes the fundamental models used in time series forecasting, from classical statistical methods to modern deep learning architectures.

5.1 Classical Statistical Models

5.1.1 Naïve Forecast

The Naïve Forecast (Random Walk) is the simplest baseline model in time series forecasting. It assumes that the best prediction for the next period is simply the most recent observed value.

Formula:

\[ \hat{y}_{t+1} = y_t \]

In other words, the forecast “walks” forward by repeating the last value, making it equivalent to a random walk with no drift. Let consider, the naïve forecast for the next 5 periods:

Interpretation

The Observed line shows the true historical values of the time series. The Naïve Forecast appears as a flat horizontal line. This happens because the naïve method simply repeats the last observed value. Mathematically, If the final observed value is \(y_T\), then all future predictions are:

\[\hat{y}_{T+1}=\hat{y}_{T+2}=\cdots=\hat{y}_{T+h}=y_T\]

This means that the forecast does not change over the horizon, it stays constant at the last known value. The naïve method is widely used as a benchmark model in forecasting because any more advanced model should perform at least better than this simple baseline.

5.1.2 Simple Moving Average

The Simple Moving Average (SMA) is one of the fundamental smoothing methods in time series analysis [1]. It reduces short-term fluctuations and highlights longer-term patterns by averaging the most recent \(k\) observations.

Formula

For a window size \(k\), the SMA is defined as:

\[ \hat{y}_t = \frac{1}{k} \sum_{i=0}^{k-1} y_{t-i} \]

This means the forecast at time \(t\) is the average of the most recent \(k\) values. Below is an example of the SMA with window size \(k = 3\) applied to a sample time series.

Interpretation SMA:

The Observed line shows the original time series with its natural variability. The SMA (Smoothed) line reduces noise by averaging the most recent \(k\) observations. This creates a smoother representation of the underlying trend. When forecasting, the SMA method typically uses the last computed SMA value as the prediction for all future periods:

\[\hat{y}_{T+1}=\hat{y}_{T+2}=\cdots=\hat{y}_{T+h}=\hat{y}_T\]

This produces a flat forecast, similar to the naïve method, but based on the smoothed series rather than the last raw observation. The SMA method is widely used as an introductory smoothing technique and serves as a baseline for more advanced exponential smoothing models.

5.1.3 Exponential Smoothing

Exponential smoothing methods are a family of forecasting techniques that apply exponentially decreasing weights to past observations. They are widely used because of their simplicity, interpretability, and strong forecasting performance. According to [2], there are three main types of exponential smoothing models, each designed for different time series patterns:

Method	Description	Suitable For
Single Exponential Smoothing (SES)	Applies smoothing to the level only.	Data without trend and without seasonality
Holt’s Linear Trend Method	Extends SES by adding a smoothed trend component.	Data with trend, but without seasonality
Holt–Winters Method	Extends Holt’s method by adding a seasonal component (additive or multiplicative).	Data with trend and with seasonality

Each model builds upon the previous one by incorporating more structure from the time series, making exponential smoothing a flexible and progressive forecasting framework.

Single Exponential Smoothing (SES)

Single Exponential Smoothing (SES) is designed for time series without trend and without seasonality. It provides a smoothed estimate by applying exponentially decreasing weights to past observations.

Formula

\[\hat{y}_{t+1}= \alpha y_t + (1-\alpha)\hat{y}_t\]

where:

\(\alpha\) is the smoothing parameter, \(0 < \alpha \le 1\)
\(y_t\) is the actual value at time \(t\)
\(\hat{y}_t\) is the previous smoothed estimate

A higher value of \(\alpha\) gives more weight to recent observations, making the model more responsive to sudden changes. A lower value of \(\alpha\) produces a smoother curve, giving more weight to older data.

library(forecast)
library(plotly)

# --- Sample Time Series ---
set.seed(123)
y <- ts(cumsum(rnorm(30, 0.3, 1)))   # example series

# --- SES Model ---
alpha_value <- 0.4                       # you can change alpha (0 < alpha <= 1)
ses_model <- ses(y, alpha = alpha_value, h = 6)  # SES with forecast horizon h=6

# Extract fitted values and forecasts
fitted_vals <- ses_model$fitted
forecast_vals <- ses_model$mean
t <- 1:length(y)
t_future <- (length(y)+1):(length(y) + length(forecast_vals))

# Build data frame for plot
df <- data.frame(
  time  = c(t, t_future),
  value = c(y, as.numeric(forecast_vals)),
  type  = c(rep("Observed", length(y)), rep("SES Forecast", length(forecast_vals)))
)

df_fitted <- data.frame(
  time = t,
  value = as.numeric(fitted_vals),
  type = "SES (Smoothed)"
)

# --- Plotly Visualization ---
plot_ly() %>%
  add_lines(data = df[df$type == "Observed", ],
            x = ~time, y = ~value,
            name = "Observed") %>%
  add_lines(data = df_fitted,
            x = ~time, y = ~value,
            name = "SES (Smoothed)") %>%
  add_lines(data = df[df$type == "SES Forecast", ],
            x = ~time, y = ~value,
            name = "SES Forecast",
            line = list(dash = "dash")) %>%
  layout(
    title = "Single Exponential Smoothing (SES)",
    xaxis = list(title = "Time"),
    yaxis = list(title = "Value")
  )

Interpretation:

The Observed line represents the original time series values. The SES (Smoothed) line applies exponential smoothing, where recent observations receive a higher weight controlled by the smoothing parameter \(\alpha\). A larger value of \(\alpha\) (closer to 1) makes the smoothed line react more quickly to changes in the data, while a smaller value of \(\alpha\) (closer to 0) produces a smoother curve that responds more gradually.

Because SES does not model trend or seasonality, the forecast is constant over the horizon:

\[\hat{y}_{T+1}=\hat{y}_{T+2}=\cdots=\hat{y}_{T+h}=\hat{y}_T \]

SES is therefore suitable only for series with no trend and no seasonality, and it serves as the foundation for more advanced exponential smoothing techniques such as Holt (for trend) and Holt–Winters (for seasonality).

Holt’s Linear Trend Method

Holt’s Linear Trend Method extends Single Exponential Smoothing (SES) by adding a trend component, making it suitable for time series that exhibit a consistent upward or downward trend. This method smooths both the level and the trend, allowing the forecast to grow or decline over time.

Model Equations

The method consists of three components:

Level equation: updates the smoothed estimate of the current value
Trend equation: updates the smoothed estimate of the trend
Forecast equation: projects the level and trend forward

\[ \begin{aligned} l_t &= \alpha y_t + (1-\alpha)(l_{t-1}+b_{t-1}) \\ b_t &= \beta (l_t - l_{t-1}) + (1-\beta)b_{t-1} \\ \hat{y}_{t+h} &= l_t + h b_t \end{aligned} \]

where:

\(\alpha\) = level smoothing parameter, \(0 < \alpha \le 1\)
\(\beta\) = trend smoothing parameter, \(0 < \beta \le 1\)
\(l_t\) = estimated level at time \(t\)
\(b_t\) = estimated trend at time \(t\)
\(\hat{y}_{t+h}\) = forecast \(h\) periods ahead

Holt’s method allows forecasts to follow a straight-line trajectory, capturing the underlying linear trend in the data.

library(forecast)
library(plotly)

# --- Sample Time Series ---
set.seed(123)
y <- ts(cumsum(rnorm(30, 0.8, 1)))   # trending series

# --- Holt Model ---
holt_model <- holt(y, h = 6, alpha = 0.5, beta = 0.3)

# Extract components
fitted_vals <- holt_model$fitted
forecast_vals <- holt_model$mean

t <- 1:length(y)
t_future <- (length(y)+1):(length(y) + length(forecast_vals))

# Data for plotting
df <- data.frame(
  time  = c(t, t_future),
  value = c(y, as.numeric(forecast_vals)),
  type  = c(rep("Observed", length(y)), rep("Holt Forecast", length(forecast_vals)))
)

df_fitted <- data.frame(
  time = t,
  value = as.numeric(fitted_vals),
  type = "Holt (Smoothed)"
)

# --- Plotly Visualization ---
plot_ly() %>%
  add_lines(data = df[df$type == "Observed", ],
            x = ~time, y = ~value,
            name = "Observed") %>%
  add_lines(data = df_fitted,
            x = ~time, y = ~value,
            name = "Holt (Smoothed)") %>%
  add_lines(data = df[df$type == "Holt Forecast", ],
            x = ~time, y = ~value,
            name = "Holt Forecast",
            line = list(dash = "dash")) %>%
  layout(
    title = "Holt’s Linear Trend Method",
    xaxis = list(title = "Time"),
    yaxis = list(title = "Value")
  )

Interpretation:

The Observed line shows the original time series, which exhibits a visible upward trend. Holt’s method captures this behavior by smoothing two components:

Level (\(l_t\)) — the underlying value of the series
Trend (\(b_t\)) — the rate of increase or decrease

The Holt (Smoothed) line represents the result of applying exponential smoothing to both the level and the trend. Compared with SES, Holt’s method does not produce a flat curve—because it explicitly models and updates the trend. Holt’s method is appropriate for time series with a consistent linear trend, but without seasonality. It forms the foundation for the Holt–Winters method, which adds a seasonal component for more complex patterns.

Holt–Winters Method

The Holt–Winters method is used when the time series exhibits both trend and seasonality. There are two types: additive and multiplicative. Here, we use the additive version, which is suitable for seasonal patterns with roughly constant amplitude.

Model Equations (Additive)

\[ \begin{aligned} l_t &= \alpha (y_t - s_{t-m}) + (1-\alpha)(l_{t-1}+b_{t-1}) \\ b_t &= \beta (l_t - l_{t-1}) + (1-\beta)b_{t-1} \\ s_t &= \gamma (y_t - l_t) + (1-\gamma)s_{t-m} \\ \hat{y}_{t+h} &= l_t + h b_t + s_{t-m+h} \end{aligned} \]

where:

\(l_t\) = level
\(b_t\) = trend
\(s_t\) = seasonal component
\(m\) = seasonal period (e.g., \(m = 12\) for monthly data)
\(\alpha, \beta, \gamma\) = smoothing parameters

library(plotly)
library(forecast)

# Example seasonal + trend data
set.seed(123)
t <- 1:60
season <- rep(c(10, 12, 15, 14), 15)    # seasonal pattern (period = 4)
trend  <- 0.5 * t                        # upward trend
noise  <- rnorm(60, 0, 2)
y <- season + trend + noise
ts_data <- ts(y, frequency = 4)

# Holt-Winters Additive Model
model_hw <- HoltWinters(ts_data, seasonal = "additive")

# Forecast 8 steps ahead
fc <- forecast(model_hw, h = 8)

# Build plotly visualization
p <- plot_ly() %>%
  add_lines(x = time(ts_data), y = ts_data,
            name = "Observed") %>%
  add_lines(x = time(model_hw$fitted), y = model_hw$fitted[,1],
            name = "HW Fitted") %>%
  add_lines(x = time(fc$mean), y = fc$mean,
            name = "Forecast", line = list(dash = "dash")) %>%
  layout(title = "Holt–Winters Additive Method",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Value"))

p

Interpretation:

In the Holt–Winters method, the smoothing parameters control how much weight is given to recent observations versus past values. They range between 0 and 1:

Level Smoothing (\(\alpha\))
- Determines how much the current observation \(y_t\) influences the estimated level \(l_t\).
- High \(\alpha\) (close to 1) → the level reacts quickly to recent changes.
- Low \(\alpha\) (close to 0) → the level changes slowly, giving more weight to historical values.
\[ l_t = \alpha (y_t - s_{t-m}) + (1-\alpha)(l_{t-1}+b_{t-1}) \]
Trend Smoothing (\(\beta\))
- Controls the update of the trend component \(b_t\).
- High \(\beta\) → trend responds quickly to changes in slope.
- Low \(\beta\) → trend is more stable and less sensitive to short-term fluctuations.
\[ b_t = \beta (l_t - l_{t-1}) + (1-\beta)b_{t-1} \]
Seasonal Smoothing (\(\gamma\))
- Determines how fast the seasonal component \(s_t\) adapts to new seasonal patterns.
- High \(\gamma\) → seasonal estimates adjust quickly to changes.
- Low \(\gamma\) → seasonal pattern changes slowly over time.
\[ s_t = \gamma (y_t - l_t) + (1-\gamma)s_{t-m} \]

5.1.4 ARIMA

AR(\(p\)): Autoregressive

The Autoregressive (AR) model predicts a time series based on its own past values. It is widely used in time series forecasting and forms the foundation for ARMA/ARIMA models.

Model Formula

\[ y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + \epsilon_t \]

where:

\(y_t\) = value of the series at time \(t\)
\(c\) = constant term (intercept)
\(\phi_1, \phi_2, \dots, \phi_p\) = autoregressive coefficients
\(p\) = order of the AR model (number of lagged terms)
\(\epsilon_t\) = white noise error term, \(\epsilon_t \sim N(0, \sigma^2)\)

Key Points:

The model assumes that the current value \(y_t\) is a linear combination of its past \(p\) values plus random noise.
The coefficients \(\phi_i\) determine how much influence each lagged term has.
The order \(p\) can be selected using information criteria (AIC, BIC) or autocorrelation plots.

library(forecast)
library(plotly)

# --- Simulate AR(2) Process ---
set.seed(123)
ar_coeff <- c(0.6, -0.3)   # phi1=0.6, phi2=-0.3
y <- arima.sim(n = 50, list(ar = ar_coeff), sd = 1)

# Fit AR model
fit <- arima(y, order = c(2,0,0))  # AR(2)

# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)

# Time index
t <- 1:length(y)
t_future <- (length(y)+1):(length(y)+length(fc$mean))

# Build data frame
df <- data.frame(
  time = c(t, t_future),
  value = c(y, as.numeric(fc$mean)),
  type = c(rep("Observed", length(y)), rep("AR(2) Forecast", length(fc$mean)))
)

# Plotly visualization
plot_ly() %>%
  add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
            name = "Observed") %>%
  add_lines(data = df[df$type=="AR(2) Forecast",], x = ~time, y = ~value,
            name = "Forecast", line = list(dash = "dash")) %>%
  layout(title = "Autoregressive AR(2) Model",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Value"))

Interpretation

If \(\phi_i\) is positive, the series tends to continue in the same direction as lag \(i\).
If \(\phi_i\) is negative, the series tends to move in the opposite direction of lag \(i\).
AR models are stationary if the roots of the characteristic equation lie outside the unit circle.

MA(\(q\)): Moving Average

The Moving Average (MA) model expresses the current value of a time series as a linear combination of past error terms.
It is commonly used in time series modeling, often as part of ARMA or ARIMA models.

Model Formula

\[ y_t = c + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \cdots + \theta_q \epsilon_{t-q} + \epsilon_t \]

where:

\(y_t\) = value of the series at time \(t\)
\(c\) = constant term (intercept)
\(\theta_1, \theta_2, \dots, \theta_q\) = moving average coefficients
\(q\) = order of the MA model (number of past error terms included)
\(\epsilon_t\) = white noise error term, \(\epsilon_t \sim N(0, \sigma^2)\)

Key Points:

Unlike AR models, MA models do not use past values of \(y_t\), but instead use past forecast errors.
Each coefficient \(\theta_i\) measures the influence of the lagged errors on the current value.
The order \(q\) can be determined by examining the autocorrelation function (ACF):
- The ACF of an MA(\(q\)) process cuts off after lag \(q\).

library(forecast)
library(plotly)

# --- Simulate MA(2) Process ---
set.seed(123)
ma_coeff <- c(0.7, -0.4)  # theta1=0.7, theta2=-0.4
y <- arima.sim(n = 50, list(ma = ma_coeff), sd = 1)

# Fit MA model
fit <- arima(y, order = c(0,0,2))  # MA(2)

# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)

# Time index
t <- 1:length(y)
t_future <- (length(y)+1):(length(y)+length(fc$mean))

# Build data frame
df <- data.frame(
  time = c(t, t_future),
  value = c(y, as.numeric(fc$mean)),
  type = c(rep("Observed", length(y)), rep("MA(2) Forecast", length(fc$mean)))
)

# Plotly visualization
plot_ly() %>%
  add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
            name = "Observed") %>%
  add_lines(data = df[df$type=="MA(2) Forecast",], x = ~time, y = ~value,
            name = "Forecast", line = list(dash = "dash")) %>%
  layout(title = "Moving Average MA(2) Model",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Value"))

Interpretation:

The current value \(y_t\) is affected by the most recent \(q\) random shocks \(\epsilon_{t-1}, \dots, \epsilon_{t-q}\).
Positive \(\theta_i\) → the lagged shock pushes \(y_t\) in the same direction.
Negative \(\theta_i\) → the lagged shock pushes \(y_t\) in the opposite direction.
MA models are stationary by definition, so there is no need for differencing as in AR models.
MA is often combined with AR to form ARMA(\(p,q\)) for better capturing serial dependence in data.

ARMA(\(p,q\)): Autoregressive Moving Average

The ARMA(\(p,q\)) model combines both autoregressive (AR) and moving average (MA) components, capturing the effects of both past values and past errors on the current observation.

Model Formula

\[ y_t = c + \phi_1 y_{t-1} + \dots + \phi_p y_{t-p} + \theta_1 \epsilon_{t-1} + \dots + \theta_q \epsilon_{t-q} + \epsilon_t \]

where:

\(y_t\) = value at time \(t\)
\(c\) = constant term (intercept)
\(\phi_1, \dots, \phi_p\) = AR coefficients
\(\theta_1, \dots, \theta_q\) = MA coefficients
\(p\) = order of the AR part
\(q\) = order of the MA part
\(\epsilon_t\) = white noise error term, \(\epsilon_t \sim N(0, \sigma^2)\)

Key Points:

The AR component captures the influence of past values on \(y_t\).
The MA component captures the influence of past shocks (errors) on \(y_t\).
The orders \(p\) and \(q\) can be selected using information criteria (AIC, BIC) or by examining the ACF and PACF plots.
ARMA models assume stationarity, so non-stationary series may require differencing (ARIMA).

library(forecast)
library(plotly)

# --- Simulate ARMA(2,2) Process ---
set.seed(123)
ar_coeff <- c(0.6, -0.3)   # AR(2)
ma_coeff <- c(0.5, -0.4)   # MA(2)
y <- arima.sim(n = 50, list(ar = ar_coeff, ma = ma_coeff), sd = 1)

# Fit ARMA model (ARMA(2,2))
fit <- arima(y, order = c(2,0,2))

# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)

# Time index
t <- 1:length(y)
t_future <- (length(y)+1):(length(y)+length(fc$mean))

# Build data frame
df <- data.frame(
  time = c(t, t_future),
  value = c(y, as.numeric(fc$mean)),
  type = c(rep("Observed", length(y)), rep("ARMA(2,2) Forecast", length(fc$mean)))
)

# Build fitted values data frame
df_fitted <- data.frame(
  time = t,
  value = as.numeric(fitted(fit)),
  type = "ARMA(2,2) Fitted"
)

# Plotly visualization
plot_ly() %>%
  add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
            name = "Observed") %>%
  add_lines(data = df_fitted, x = ~time, y = ~value,
            name = "Fitted", line = list(color = 'green')) %>%
  add_lines(data = df[df$type=="ARMA(2,2) Forecast",], x = ~time, y = ~value,
            name = "Forecast", line = list(dash = "dash", color = 'red')) %>%
  layout(title = "ARMA(2,2) Model",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Value"))

Interpretation:

ARMA(\(p,q\)) captures both momentum and shocks:
- AR part models persistence (trend or autocorrelation)
- MA part models short-term noise effects
Forecasting with ARMA(\(p,q\)):
- Future values are predicted using a combination of past observations and past errors.
- Provides a more flexible model than AR or MA alone, suitable for stationary time series with autocorrelation and short-term shocks.
Advantages:
- Can model a wide variety of stationary time series patterns.
- Forms the basis for ARIMA when differencing is added for non-stationary series.

ARIMA(\(p,d,q\)): Autoregressive Integrated Moving Average

The ARIMA(\(p,d,q\)) model extends ARMA models to non-stationary time series by incorporating differencing. It is one of the most widely used models for time series forecasting.

Model Formula:

\[ \nabla^d y_t = (1 - B)^d y_t = ARMA(p,q) \]

where:

\(y_t\) = original time series at time \(t\)
\(B\) = backshift operator, \(B y_t = y_{t-1}\)
\(\nabla^d\) = differencing operator applied \(d\) times, \(\nabla^d y_t = (1-B)^d y_t\)
\(p\) = order of the AR part
\(d\) = order of differencing required to make the series stationary
\(q\) = order of the MA part

Key Points:

Differencing (\(d\)): Removes trend or seasonality to achieve stationarity.
Once differenced, the series can be modeled with an ARMA(\(p,q\)) structure.
ARIMA is denoted as ARIMA(\(p,d,q\)):
- \(p\): number of autoregressive terms
- \(d\): number of differences
- \(q\): number of moving average terms

library(forecast)
library(plotly)

# --- Simulate non-stationary series with trend ---
set.seed(123)
t <- 1:60
trend <- 0.5 * t
noise <- rnorm(60, 0, 1)
y <- trend + noise
ts_data <- ts(y)

# Plot original series
# Fit ARIMA model (auto.arima to select p,d,q)
fit <- auto.arima(ts_data)

# Forecast 10 steps ahead
fc <- forecast(fit, h = 10)

# Time index
t_obs <- 1:length(ts_data)
t_future <- (length(ts_data)+1):(length(ts_data)+length(fc$mean))

# Data frames
df <- data.frame(
  time = c(t_obs, t_future),
  value = c(ts_data, as.numeric(fc$mean)),
  type = c(rep("Observed", length(ts_data)), rep("Forecast", length(fc$mean)))
)

df_fitted <- data.frame(
  time = t_obs,
  value = as.numeric(fitted(fit)),
  type = "Fitted"
)

# Plotly visualization
plot_ly() %>%
  add_lines(data = df[df$type=="Observed",], x = ~time, y = ~value,
            name = "Observed") %>%
  add_lines(data = df_fitted, x = ~time, y = ~value,
            name = "Fitted", line = list(color = 'green')) %>%
  add_lines(data = df[df$type=="Forecast",], x = ~time, y = ~value,
            name = "Forecast", line = list(dash = "dash", color = 'red')) %>%
  layout(title = "ARIMA Forecast",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Value"))

Interpretation:

Non-stationary series (with trend or evolving mean) cannot be directly modeled with ARMA.
Differencing transforms the series into a stationary series.
ARIMA forecasts combine:
- Past values (\(p\)),
- Past errors (\(q\)),
- Differencing to handle non-stationarity (\(d\)).
ARIMA forms the foundation for seasonal models (SARIMA) when seasonality is included.

5.1.5 SARIMA / SARIMAX

SARIMA and SARIMAX are extensions of ARIMA that allow modeling of seasonal patterns and external regressors.

SARIMA (Seasonal ARIMA)

SARIMA incorporates seasonality using seasonal autoregressive (P), seasonal differencing (D), and seasonal moving average (Q) components, along with the seasonal period \(s\).

Model notation:

\[ SARIMA(p,d,q)(P,D,Q)_s \]

\(p,d,q\) = non-seasonal AR, differencing, MA orders
\(P,D,Q\) = seasonal AR, seasonal differencing, seasonal MA orders
\(s\) = length of the seasonal cycle (e.g., 12 for monthly data)

Equation (conceptual):

\[ \nabla^d \nabla_s^D y_t = ARMA(p,q) + ARMA(P,Q)_s \]

where \(\nabla_s^D y_t = (1-B^s)^D y_t\) is seasonal differencing.

SARIMAX (Seasonal ARIMA with eXogenous variables)

SARIMAX extends SARIMA by adding external regressors \(X_t\):

\[ y_t = SARIMA(p,d,q)(P,D,Q)_s + \beta X_t + \epsilon_t \]

\(X_t\) = external variables that may affect the time series
\(\beta\) = coefficients for regressors

Key Points:

SARIMA is suitable for seasonal time series, e.g., monthly sales with yearly seasonality.
SARIMAX allows external influences, e.g., promotions, holidays, temperature.
Seasonal parameters \((P,D,Q,s)\) capture repeating patterns in data.
Model selection can use AIC/BIC, ACF/PACF plots, and seasonal diagnostics.

5.1.6 SARIMA and SARIMAX

library(forecast)
library(plotly)

# --- Simulate seasonal time series ---
set.seed(123)
t <- 1:60
seasonal <- rep(c(10, 12, 15, 14, 11), 12)[1:60]  # seasonal pattern (period s=5)
trend <- 0.3 * t
noise <- rnorm(60, 0, 1)
y <- trend + seasonal + noise
ts_data <- ts(y, frequency = 5)

# --- External regressor for SARIMAX ---
x <- rnorm(60, 5, 1)  # exogenous variable

# Fit SARIMA model
fit_sarima <- auto.arima(ts_data, seasonal = TRUE)

# Fit SARIMAX model
fit_sarimax <- auto.arima(ts_data, xreg = x, seasonal = TRUE)

# Forecast 10 steps ahead
x_future <- rnorm(10, 5, 1)  # future exogenous variable for SARIMAX
fc_sarima <- forecast(fit_sarima, h = 10)
fc_sarimax <- forecast(fit_sarimax, xreg = x_future, h = 10)

# Time index
t_obs <- 1:length(ts_data)
t_future <- (length(ts_data)+1):(length(ts_data)+10)

# Build data frames for plotting
df_sarima <- data.frame(
  time = c(t_obs, t_future),
  value = c(ts_data, as.numeric(fc_sarima$mean)),
  type = c(rep("Observed", length(ts_data)), rep("SARIMA Forecast", 10))
)

df_sarimax <- data.frame(
  time = c(t_obs, t_future),
  value = c(ts_data, as.numeric(fc_sarimax$mean)),
  type = c(rep("Observed", length(ts_data)), rep("SARIMAX Forecast", 10))
)

df_fitted_sarima <- data.frame(
  time = t_obs,
  value = as.numeric(fitted(fit_sarima)),
  type = "SARIMA Fitted"
)

df_fitted_sarimax <- data.frame(
  time = t_obs,
  value = as.numeric(fitted(fit_sarimax)),
  type = "SARIMAX Fitted"
)

# Plotly visualization
plot_ly() %>%
  add_lines(data = df_sarima[df_sarima$type=="Observed",], x = ~time, y = ~value,
            name = "Observed") %>%
  add_lines(data = df_fitted_sarima, x = ~time, y = ~value,
            name = "SARIMA Fitted", line = list(color = 'green')) %>%
  add_lines(data = df_sarima[df_sarima$type=="SARIMA Forecast",], x = ~time, y = ~value,
            name = "SARIMA Forecast", line = list(dash = "dash", color = 'red')) %>%
  add_lines(data = df_fitted_sarimax, x = ~time, y = ~value,
            name = "SARIMAX Fitted", line = list(color = 'blue')) %>%
  add_lines(data = df_sarimax[df_sarimax$type=="SARIMAX Forecast",], x = ~time, y = ~value,
            name = "SARIMAX Forecast", line = list(dash = "dash", color = 'purple')) %>%
  layout(title = "SARIMA vs SARIMAX Forecast",
         xaxis = list(title = "Time"),
         yaxis = list(title = "Value"))

Interpretation:

Seasonality
- Seasonal AR (\(P\)) captures dependence on previous seasonal cycles.
- Seasonal MA (\(Q\)) captures seasonal shocks.
- Seasonal differencing (\(D\)) removes repeating seasonal trends.
Exogenous Variables (SARIMAX only)
- The model accounts for additional predictors outside the time series itself.
- Useful when external events significantly impact the series.
Forecasting
- Combines non-seasonal ARMA dynamics, seasonal patterns, and external regressors (if any).
- Produces more accurate predictions for seasonal and influenced time series.

5.2 Machine Learning Models

5.2.1 Linear Regression

Linear Regression for Time Series [3], uses time-dependent features such as:

\(y_{t-1}, y_{t-2}, \ldots\)
moving averages
sine/cosine seasonal features

5.2.2 Non-Linear Regression

Random Forest, XGBoost, and LightGBM [4], tree-based models capable of capturing nonlinear patterns.

5.3 Deep Learning Models

5.3.1 Recurrent Neural Networks

Recurrent Neural Networks (RNN) [5], designed to capture sequential dependencies.

5.3.2 Long Short-Term Memory

Long Short-Term Memory (LSTM) [6], effective for long-range temporal dependencies.

5.3.3 Gated Recurrent Unit

Gated Recurrent Unit (GRU) [7], a simplified alternative to LSTM with similar performance.

5.3.4 CNN for Time Series

CNN for Time Series [8], uses convolutional filters to detect local temporal patterns.

5.3.5 Encoder–Decoder

Encoder–Decoder (Seq2Seq) Models [9], designed for multi-step or multi-horizon forecasting.

5.3.6 Transformer Architectures

Transformer Architectures [10], examples include Informer, Autoformer, and FEDformer.

Strengths:
- Long-range dependency modeling
- Strong performance on large datasets

5.4 Model Selection Summary

Data Pattern	Suitable Models
No strong pattern	Naïve, SMA
Trend	Holt, ARIMA
Trend + Seasonality	Holt–Winters, SARIMA
External regressors	ARIMAX, SARIMAX
Nonlinear structure	Random Forest, XGBoost
Complex dependencies	LSTM, GRU, CNN, Encoder–Decoder
Long-range forecasting	Transformers

[1]

Box, G. E. P., Jenkins, G. M., Reinsel, G. C., and Ljung, G. M., Time series analysis: Forecasting and control, John Wiley & Sons, 2015

[2]

Hyndman, R. J. and Athanasopoulos, G., Forecasting: Principles and practice, OTexts, 2008

[3]

James, G., Witten, D., Hastie, T., and Tibshirani, R., An introduction to statistical learning, Springer, 2021

[4]

Friedman, J. H., Greedy function approximation: A gradient boosting machine, Annals of Statistics, vol. 29, no. 5, 1189–1232, 2001

[5]

Rumelhart, D. E., Hinton, G. E., and Williams, R. J., Learning representations by back-propagating errors, Nature, vol. 323, 533–536, 1986

[6]

Hochreiter, S. and Schmidhuber, J., Long short-term memory, Neural Computation, vol. 9, no. 8, 1735–1780, 1997

[7]

Cho, K., Merrienboer, B. van, Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y., Learning phrase representations using RNN encoder–decoder for statistical machine translation, arXiv preprint arXiv:1406.1078, 2014

[8]

LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P., Convolutional networks for images, speech, and time-series, The handbook of brain theory and neural networks, MIT Press, 1995

[9]

Sutskever, I., Vinyals, O., and Le, Q. V., Sequence to sequence learning with neural networks, Advances in neural information processing systems (NeurIPS), 2014

[10]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I., Attention is all you need, Advances in neural information processing systems (NeurIPS), 2017