16.3 Deep Learning for Portfolio Design
In the context of portfolio design, deep learning can be used in a variety of ways. Recall that the two main components in portfolio design are data modeling and portfolio optimization. This is depicted in Figure 1.3 (Chapter 1) and reproduced herein for convenience in Figure 16.11.
In light of the block diagram in Figure 16.11, one could envision the usage of DL in at least three ways:
- using DL only in the modeling or time series forecasting component, while keeping the traditional portfolio optimization part;
- using DL only in the portfolio component, while keeping the traditional data modeling part; and
- using DL for both components, what is called end-to-end modeling.
We will not consider further the option of using DL only for the optimization part, since that is a well-understood component that does not seem to require DL (in fact, this book has explored a wide variety of different portfolio formulations with efficient algorithms). Thus, we will focus on employing DL either in the forecast component or in the end-to-end system.
Regarding the input data to the DL system, one can use raw time series data, such as price data (e.g., open, high, low, close) and volume, as well as other sources of data derived from technical analysis, fundamental analysis, macroeconomic data, financial statements, news, social media feeds, and investor sentiment analysis. Also, depending on the time horizon, a wide range of options for the frequency of the data may be available, varying from high-frequency data and intraday price movements to daily, weekly, or even monthly stock prices.
16.3.1 Challenges
Before we explore the possibilities of DL for portfolio design, it is important to highlight the main challenges faced in this particular area. As already explained, deep neural networks have demonstrated outstanding performance in many domain-specific areas, such as image recognition, natural language processing, board and video games, biomedical applications, self-driving cars, and so on. The million-dollar question is whether this revolution will extend to financial systems.
Since the 2010s, the financial industry and academia have been exploring the potential of DL in various applications, such as financial time series forecasting, algorithmic trading, risk assessment, fraud detection, portfolio management, asset pricing, derivatives markets, cryptocurrency and blockchain studies, financial sentiment analysis, behavioral finance, and financial text mining. The number of research works keeps on increasing every year in an accelerated fashion, as well as open-source software libraries. However, we are just in the initial years of this new era and it is too early to say whether the success of DL enjoyed in non-financial applications will actually extend to financial systems and, particularly, to portfolio design.
Apart from very specific financial applications that have already enjoyed some success, such as sentiment analysis of news, credit default detection, or satellite image analysis for stock level estimation or crop production, we now focus on the potential of deep neural networks specifically for financial time series modeling and portfolio design. Among the many possible challenges that set these problems apart from other successful applications, the following are definitely worth mentioning:
Data scarcity: Compared to other areas, such as natural language processing (e.g., GPT-3 was trained on a massive dataset of over 570 GB of text data), financial time series are in general extremely scarce (except for high-frequency data). For example, two years of daily stock prices amount to just 504 observations.
Low signal-to-noise ratio: The signal in financial data is extremely weak and totally submerged in noise. For example, an exploratory data analysis on asset returns corrected for the volatility envelope reveals a time series with little temporal structure (see Figures 2.23–2.24 in Chapter 2). This is very different from other applications, for example, an image of a cat typically has a high signal and very small noise (this is not to say that recognizing a cat is easy, but at least the signal-to-noise ratio is large).
Data nonstationarity: Financial time series are clearly nonstationary (see Chapter 2) with a statistical distribution that changes over time (e.g., bull markets, bear markets, side markets). This is in sharp contrast with most other applications where DL has succeeded, in which the distribution remains constant: a cat stays the same, be it yesterday, today, or tomorrow.
Data adaptive feedback loop: Data from financial markets is totally influenced by human and machine decisions based on previous data. As a consequence, there exists a very unique feedback loop mechanism that cannot be ignored. In particular, once a pattern is discovered and a trading strategy is designed, this pattern tends to disappear in future data. Again, this is extremely different from other applications; for example, a cat remains a cat regardless of whether one can detect it in an image.
Lack of prior human evidence: In most areas where DL has been successful, there was obvious prior evidence of human performance that showed that the problem was solvable. For example, humans can easily recognize a cat, translate a sentence from English to Spanish, or drive a car. However, in finance there is no human who can effectively forecast the future performance of companies or trade a portfolio. Simply recall (see Chapter 13) the illustrative and clarifying statement (Malkiel, 1973): “a blindfolded chimpanzee throwing darts at the stock listings can select a portfolio that performs as well as those managed by the experts.”
At the risk of oversimplifying, we could make a simple analogy of the problem of financial time series forecasting or portfolio design to that of identifying an octopus in an image, as opposed to the iconic example of identifying a cat. This is exemplified in Figure 16.12. Indeed, this analogy seems to fit the previous list of challenges, namely:
- Data scarcity: Arguably there are more images of cats than octopi in the human library of photos.
- Low signal-to-noise ratio: Think of an octopus that has camouflaged to look exactly like the background (the octopus creates this noise to blend in) as opposed to a domestic cat that stands out.
- Data nonstationarity: Think again of an octopus that changes its camouflage over time to match the background (a cat’s appearance is the same today as it was yesterday).71
- Data adaptive feedback loop: Think once more of an octopus that quickly adapts its camouflage as it is being chased by a predator (a cat is a cat).72
- Lack of prior human evidence: Humans are good at spotting domestic cats, but the same cannot be said about octopi.
We can finally summarize the previous analogy73 by saying that “financial data ain’t cats, but octopi.”
16.3.2 Standard Time Series Forecasting
By far the most common approach to employ DL in portfolio design is by using it in the time series modeling or forecasting component. This area has been intensively explored since 2015, as described in Sezer et al. (2020). LSTM, by its very nature, utilizes the temporal characteristics of any time series signal due to its inherent memory. Thus, LSTM and its variations initially dominated the financial time series forecasting domain (Fischer and Krauss, 2018). Nevertheless, more recently transformers have been shown to deal with long-term memory more efficiently.
The block diagram in Figure 16.13 illustrates the general process of time series forecasting. Following the supervised learning paradigm in Figure 16.2, the input consists of a lookback of the past \(k\) time series values \(\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right)\), the desired output or label is the next value of the time series \(\bm{x}_t\), and the output produced (i.e., the forecast) is denoted by \(\bmu_t\). With this, we can define some error measure between \(\bmu_t\) and \(\bm{x}_t\) to drive the learning process of the deep learning network. Note that the forecast horizon could be chosen further into the future instead of being just the next time index \(t\).
The error measure that drives the learning process can be measured in a variety of ways. In a regression setting, the forecast value is a number or vector of values. We can then define the error vector \(\bm{e}_t = \bmu_t - \bm{x}_t\) and then compute quantities such as the mean square error (MSE), mean absolute error (MAE), median absolute deviation (MAD), mean absolute percentage error (MAPE), and so on. In a classification setting, the forecast is the trend, for example up/down, and typical measures of error are the accuracy (i.e., correct prediction over total predictions), error rate (i.e., wrong predictions over total predictions), cross-entropy, and so on. See Goodfellow et al. (2016) for details.
Mathematically, the DL network implements the function \(\bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right)\), with parameters \(\bm{\theta}\), to produce the estimate of \(\bm{x}_t\) as \(\bmu_t = \bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right)\). The mathematical formulation of a standard time series forecast can be written as the optimization problem \[ \begin{array}{ll} \underset{\bm{\theta}}{\textm{minimize}} & \E\left[\mathcal{L}\left(\bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right), \bm{x}_t\right)\right], \end{array} \] where \(\mathcal{L}(\cdot, \cdot)\) denotes the loss function or prediction error function to be minimized (e.g., the MSE or cross-entropy).
It is important to point out that this architecture focuses on the time series modeling only, while totally ignoring the subsequent portfolio optimization component, which can also be taken into account as described next.
16.3.3 Portfolio-Based Time Series Forecasting
The previous standard time series model totally ignores the subsequent portfolio optimization component. As a consequence, the performance measure has to be defined in terms of an error that depends on the forecast \(\bmu_t\) and the label \(\bm{x}_t\). However, determining the most suitable error definition for the following portfolio optimization step is unclear and the choice is more heuristic.
Alternatively, a more holistic approach is to take into account the portfolio optimization component to measure the overall performance in a meaningful way, so that we do not need to rely on a rather arbitrary error definition.
The block diagram in Figure 16.14 illustrates this process of time series forecasting taking into account the subsequent portfolio optimization block in the training procedure (Bengio, 1997). Following the reinforcement learning paradigm in Figure 16.3, instead of measuring an arbitrary error based on \(\bmu_t\) and \(\bm{x}_t\) to drive the learning process, the output \(\bmu_t\) is fed into the subsequent portfolio optimization block to produce the portfolio \(\bm{w}_t\), from which a meaningful measure of performance can be evaluated, such as the Sharpe ratio.
Mathematically, the DL network implements the function \(\bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right)\), with parameters \(\bm{\theta}\), to produce the estimate of \(\bm{x}_t\) as \(\bmu_t = \bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right)\) (possibly also the corresponding covariance matrix \(\bSigma_t\)), from which the portfolio \(\bm{w}_t\) will be designed by minimizing some objective function \(f_0(\cdot)\) (following any of the portfolio formulation designs covered in this book). The mathematical formulation of a portfolio-based time series forecasting can be written as the optimization problem \[ \begin{array}{ll} \underset{\bm{\theta}}{\textm{minimize}} & \E\left[\xi\left(\bm{w}_t, \bm{x}_t\right)\right]\\ \textm{subject to} & \bm{w}_t = \underset{\bm{w}}{\textm{arg min}} \; f_0\left(\bm{w}; \bmu_t = \bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right)\right), \end{array} \] where \(\xi(\cdot, \cdot)\) denotes the error function to be minimized that measures the overall system performance (e.g., the negative of the Sharpe ratio). Note that in this approach, the parameters of the DL network \(\bm{\theta}\) are optimized to directly minimize the overall system performance instead of a simple forecasting error, such as the MSE or the cross-entropy. In principle, one may use \(f_0 = \xi\), that is, use the same criterion to design the portfolio as used to measure the overall performance; however, there may be reasons to actually use a different criterion.
The difficulty of this architecture is in the learning process. To be more specific, the backpropagation learning algorithm requires the computation of the partial derivatives of the output of each block with respect to its input (to be used in the chain rule for differentiation). If the portfolio optimization block has a closed-form expression, for example \(\bm{w}_t = \bSigma_t^{-1}\bmu_t\), then the partial derivatives are trivially computed. However, if this block is defined in terms of the solution to an optimization problem, then it becomes trickier since one has to be able to compute the partial derivatives of the solution via the Karush–Kuhn–Tucker optimality conditions of the optimization problem (see Section A.6.4 in Appendix A). Fortunately, recent developments have made this possible and are available in open-source libraries (Amos and Kolter, 2017).
It is important to note that in this architecture, the time series forecast not only produces the forecast vector \(\bmu_t\) but also a measure of the uncertainty of the forecast in the form of the covariance matrix \(\bSigma_t\). This is necessary since the subsequent portfolio optimization component may need both \(\bmu_t\) and \(\bSigma_t\).
16.3.4 End-to-End Portfolio Design
The DL portfolio-based time series forecasting architecture in Figure 16.14 is an improvement over the standard time series forecasting architecture in Figure16.13, because it takes into account the subsequent portfolio optimization block and measures the overall performance using a meaningful performance measure.
However, since DL has proven to be such a powerful universal function approximator in many other areas, we can also consider a bolder architecture commonly termed end-to-end design, where the whole process is modeled by a single DL component as illustrated in Figure 16.15.
Mathematically, the end-to-end DL network implements the function \(\bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right)\), with parameters \(\bm{\theta}\), to directly produce the portfolio \(\bm{w}_t\) (without going through an intermediate forecasting block). The formulation of this end-to-end DL portfolio can be written as \[ \begin{array}{ll} \underset{\bm{\theta}}{\textm{minimize}} & \E\left[\xi\left(\bm{w}_t, \bm{x}_t\right)\right]\\ \textm{subject to} & \bm{w}_t = \bm{f}_{\bm{\theta}}\left(\bm{x}_{t-k}, \dots,\bm{x}_{t-1}\right), \end{array} \] where \(\xi(\cdot, \cdot)\) denotes the error function to be minimized that measures the overall system performance (e.g., the negative of the Sharpe ratio).
In principle, end-to-end architectures can offer superior performance by optimizing the overall objective function directly. However, they require substantial amounts of training data due to their deep structure and large number of learnable parameters. In financial applications, where data availability is often limited, this data-hungry nature can make end-to-end designs impractical.
High-frequency trading (HFT) presents a notable exception with its abundance of data. However, HFT strategies must account for market impact – where trade execution significantly affects market conditions. Reinforcement learning is particularly well-suited for this challenge, as it naturally incorporates the feedback loop between trading actions and market responses. A comprehensive overview of research efforts on reinforcement learning methods for quantitative trading is given in S. Sun et al. (2023).
References
Cats, like all living creatures, do evolve, but they do so on an evolutionary time scale of, say, millions of years. So, for practical purposes we can assume them fixed.↩︎
Cats, like most animals, have evolved camouflage to avoid predators, but cannot adapt it to the changing environment in real time, unlike other species like octopus, squid, and chameleon.↩︎
The cat vs. octopus comparison is just an analogy for illustration purposes. This is not to say that DL cannot literally be trained to spot an octopus.↩︎