\( \newcommand{\bm}[1]{\boldsymbol{#1}} \newcommand{\textm}[1]{\textsf{#1}} \def\T{{\mkern-2mu\raise-1mu\mathsf{T}}} \newcommand{\R}{\mathbb{R}} % real numbers \newcommand{\E}{{\rm I\kern-.2em E}} \newcommand{\w}{\bm{w}} % bold w \newcommand{\bmu}{\bm{\mu}} % bold mu \newcommand{\bSigma}{\bm{\Sigma}} % bold mu \newcommand{\bigO}{O} %\mathcal{O} \renewcommand{\d}[1]{\operatorname{d}\!{#1}} \)

5.1 Graphs

Graphs have become a fundamental and ubiquitous mathematical tool for modeling data in a variety of application domains and for understanding the structure of large networks that generate data (Kolaczyk, 2009; Lauritzen, 1996). By abstracting the data as graphs, we can better capture the geometry of data and visualize high-dimensional data. Some iconic examples of graphs, illustrated in Figure 5.1, include the following:

  • Social media graphs: These model the behavioral similarity or influence between individuals, with the data consisting of online activities such as tagging, liking, purchasing, etc.

  • Brain activity graphs: These represent the correlation among sensors on the brain, with the data being the measured brain activity known as fMRI (functional MRI).

  • Financial stock graphs: These capture the interdependencies among financial companies in stock markets, with the data consisting of measured economic quantities such as stock prices, volumes, etc.

  • Financial currency graphs: These summarize the interdependencies among currencies in foreign exchange markets, with the data comprising measured economic quantities like spot prices, volumes, etc.

  • Financial cryptocurrency graphs: Simnilarly to currency graphs, these model the interdependencies among cryptocurrencies in crypto markets.

Examples of graphs in different applications.

Figure 5.1: Examples of graphs in different applications.

5.1.1 Terminology

The basic elements of a graph (see Figure 5.2) are

  • nodes: corresponding to the entities or variables; and
  • edges: encoding the relationships between entities or variables.
Illustration of a graph with nodes and edges.

Figure 5.2: Illustration of a graph with nodes and edges.

A graph is a simple mathematical structure described by \(\mathcal{G}=(\mathcal{V},\mathcal{E},\bm{W})\), where the set \(\mathcal{V}=\{1,2,3,\ldots,p\}\) contains the indices of the nodes, the set of pairs \(\mathcal{E}=\{(1,2),{(1,3)},\ldots,{(i,j)},\ldots\}\) contains the edges between any pair of nodes \((i,j)\), and the weight matrix \(\bm{W}\) encodes the strength of the relationships.

5.1.2 Graph matrices

There are several matrices that are key in characterizing graphs as described next.

  • The adjacency matrix \(\bm{W}\) is the most direct way to fully characterize a graph, where each element \(W_{ij}\) contains the strength of the connectivity, \(w_{ij}\), between node \(i\) and node \(j\): \[ [\bm{W}]_{ij}=\begin{cases} w_{ij} & \textm{if}\;(i,j)\in\mathcal{E}\\ 0 & \textm{if}\;(i,j)\notin\mathcal{E}\\ 0 & \textm{if}\;i=j. \end{cases} \] We tacitly assume \(W_{ij}\ge0\) and \(W_{ii} = 0\) (no self-loops). If \(W_{ij}=W_{ji}\) (symmetric matrix \(\bm{W}\)), then the graph is called undirected.

  • The connectivity matrix \(\bm{C}\) is a particular case of the adjacency matrix containing \(0-1\) elements: \[ [\bm{C}]_{ij}=\begin{cases} 1 & \textm{if}\;(i,j)\in\mathcal{E}\\ 0 & \textm{if}\;(i,j)\notin\mathcal{E}\\ 0 & \textm{if}\;i=j. \end{cases} \] It describes the connectivity pattern of the adjacency matrix. In fact, for some graphs the adjacency matrix is already binary (with no measure of strength) and coincides with the connectivity matrix.

  • The degree matrix \(\bm{D}\) is defined as the diagonal matrix containing the degrees of the nodes \(\bm{d} = (d_1,\dots,d_p)\) along the diagonal, where the degree of a node \(i\), \(d_i\), represents its overall connectivity strength to all the nodes, i.e., \(d_i = \sum_{j} W_{ij}\). That is, \[\bm{D}=\textm{Diag}(\bm{W}\bm{1}).\]

  • The Laplacian matrix of a graph is defined as \[\bm{L} = \bm{D} - \bm{W}.\] It is also referred to as combinatorial Laplacian matrix to distinguish it from other variations in the definition of \(\bm{L}.\)

The Laplacian matrix may appear unusual at first; however, it possesses numerous valuable properties that make it a fundamental matrix in graph analysis. It satisfies the following mathematical properties:

  • it is symmetric and positive semidefinite: \(\bm{L}\succeq\bm{0}\);
  • it has a zero eigenvalue with corresponding eigenvalue the all-one vector \(\bm{1}\): \(\bm{L}\bm{1} = \bm{0}\);
  • the degree vector is found along its diagonal: \(\textm{diag}(\bm{L})=\bm{d}\);
  • the number of zero eigenvalues corresponds to the number of connected components of the graph (i.e., clusters);
  • it defines a measure for the smoothness or variance of graph signals.

To further elaborate on the smoothness property of the Laplacian matrix, let \(\bm{x}=(x_1,\dots,x_p)\) denote a graph signal (i.e., one observation of the signal on all the nodes of the graph). Ignoring the graph, one way to measure the variance of this signal \(\bm{x}\) is with the quantity \(\sum_{i,j}(x_i - x_j)^2.\) Now, to take into account the graph information in this computation, it makes sense to compute the variance between signals \(x_i\) and \(x_j\) only if these two nodes are connected, for example, by weighting the variance between each pair of nodes with the connectivity strength as \(\sum_{i,j}W_{ij}(x_i - x_j)^2.\) This finally leads us to the connection between the Laplacian matrix and a measure of smoothness or variance of the graph signal as \[\begin{equation} \bm{x}^\T\bm{L}\bm{x} = \frac{1}{2}\sum_{i,j}W_{ij}(x_i - x_j)^2. \tag{5.1} \end{equation}\]

Proof. \[\bm{x}^\T\bm{L}\bm{x} = \bm{x}^\T\bm{D}\bm{x} - \bm{x}^\T\bm{W}\bm{x} = \sum_{i}d_{i}x_i^2 - \sum_{i,j}w_{ij}x_ix_j = \sum_{i,j}w_{ij}x_i^2 - \sum_{i,j}w_{ij}x_ix_j.\]


Example 5.1 (Graph matrices for a toy example) Figure 5.3 shows a simple toy undirected graph \(\mathcal{G}=(\mathcal{V},\mathcal{E},\bm{W})\) with four nodes characterized by \(\mathcal{V}=\{1,2,3,4\}\), \(\mathcal{E}=\{(1,2),(2,1),(1,3),(3,1),(2,3),(3,2),(2,4),(4,2)\}\), and weights \(w_{12}=w_{21}=2\), \(w_{13}=w_{31}=2\), \(w_{23}=w_{32}=3\), \(w_{24}=w_{42}=1\).

Toy graph.

Figure 5.3: Toy graph.

The connectivity, adjacency, and Laplacian graph matrices are \[ \bm{C}=\begin{bmatrix} 0 & 1 & 1 & 0\\ 1 & 0 & 1 & 1\\ 1 & 1 & 0 & 0\\ 0 & 1 & 0 & 0 \end{bmatrix},\; \bm{W}=\begin{bmatrix} 0 & 2 & 2 & 0\\ 2 & 0 & 3 & 1\\ 2 & 3 & 0 & 0\\ 0 & 1 & 0 & 0 \end{bmatrix},\; \bm{L}=\begin{bmatrix} 4 & -2 & -2 & 0\\ -2 & 6 & -3 & -1\\ -2 & -3 & 5 & 0\\ 0 & -1 & 0 & 1 \end{bmatrix}. \]

References

Kolaczyk, E. D. (2009). Statistical analysis of network data: Methods and models. New York: Springer-Verlag.
Lauritzen, S. (1996). Graphical models. Oxford: Oxford University Press.