5.5 Learning Time-Varying Graphs
All the aforementioned graph learning frameworks are designed towards static graphs. However, many practical network-based systems are intrinsically dynamic in nature and static graphs would inherently neglect the time variations of the data (Kolaczyk, 2009, sec. 8.6). For instance, financial systems are evidently dynamic and subject to various market regimes, such as bull markets, bear markets, economic crises, bubbles, and so on.
Learning time-varying or dynamic graphs is more cumbersome than static graphs. Not only does the formulation become more complicated but the number of variables increases significantly. As a consequence, few works can be found in the literature.
A naive approach would be to divide the available observations into \(T\) chunks, each with \(n_t\) samples, and then learn different graphs independently for each chunk: \(\bm{L}_t\) for \(t=1,\dots,T\). The drawback of this approach, apart from the fact that fewer observations are available to learn each graph, is that the graphs learned may lack time consistency, that is, they may change abruptly rather than smoothly.
In order to learn time-varying graphs that preserve the time consistency between consecutive graphs, a regularization term of the form \(d(\bm{L}_{t-1}, \bm{L}_t)\) can be employed (Kalofolias et al., 2017); for example, the Frobenius norm, \(d(\bm{L}_{t-1}, \bm{L}_t) = \|\bm{L}_{t-1} - \bm{L}_t\|_\textm{F}^2\), or the \(\ell_1\)-norm, \(d(\bm{L}_{t-1}, \bm{L}_t) = \|\bm{L}_{t-1} - \bm{L}_t\|_1\).
The following graph learning formulation follows from the GMRF framework in (5.6) or (5.7) but preserving the time consistency (Cardoso and Palomar, 2020): \[ \begin{array}{ll} \underset{\bm{L}_{1}, \dots, \bm{L}_{T}}{\textm{minimize}} & \sum_{t=1}^{T} n_t\times\left[\textm{Tr}(\bm{L}_t\bm{S}_t) - \textm{log gdet}(\bm{L}_t)\right] + \delta \sum_{t=2}^{T}d(\bm{L}_{t-1}, \bm{L}_t),\\ \textm{subject to} & \big\{\bm{L}_t \succeq \bm{0}, \quad \bm{L}_t\bm{1} = \bm{0}, \quad (\bm{L}_t)_{ij} = (\bm{L}_t)_{ji} \leq 0, \quad \forall ~ i \neq j\big\}_{t=1}^{T}, \end{array} \]
where \(\bm{S}_t\) denotes the sample correlation matrix of chunk \(t\) and the hyper-parameter \(\delta\) controls the level of time consistency (for \(\delta=0\) it becomes the naive approach without time consistency and for \(\delta\rightarrow\infty\) tends to the static graph solution).
The solution to this problem is a dynamic graph conditioned on all the \(T\) chunks of observations (i.e., with look-ahead bias): \[\hat{\bm{L}}_{t \mid T} \qquad t=1,\dots,T.\] Alternatively, using a rolling window approach, one can instead obtain a causal estimate without look-ahead bias: \[\hat{\bm{L}}_{t \mid t} \qquad t=1,\dots,T.\]