A Methodology

In this Section, we discuss the GraphR model and priors in Section A.1 and brief introduction to variational Bayes inference method with mean-field assumption in Section A.2 followed by detailed derivation of the update equations and evidence lower bound (ELBO) for our GraphR model in Section A.3 and Section A.4 respectively. In Section A.6, we provide a comprehensive overview of the GraphR and competing methods.

For notational purposes, we consider $\mathbf{Y} = (Y_1,...,Y_p) \in \mathbb{R}^ {N \times p} \sim \mathcal{N}(\mu,\Omega^{-1}(\mathbf{X}))$, where the precision matrix $\Omega(\mathbf{X}) = [\omega_{ij}(\mathbf{X})]_{p \times p}$ is a function of intrinsic factors $\mathbf{X} = (X_1,...,X_q) \in \mathbb{R}^ {N \times q}$ and denote $\boldsymbol{\theta}$ as the parameters of estimation. Here $N$, $p$, $q$ denotes sample size, number of features and covariates respectively.

A.1 Model and priors

The GraphR model is expressed as follows: \[\begin{equation} \begin{split} & Y_i = \sum_{j \neq i}^p \gamma_{ij}(\mathbf{X}) \odot Y_j + \epsilon_i, \hspace{0.5cm} \epsilon_i \sim N(0,\frac{1}{\omega_{ii}}), \\ & \gamma_{ij}(X) = -\frac{\omega_{ij}(\mathbf{X})}{\omega_{ii}} = -\frac{1}{\omega_{ii}} \mathbf{X} \boldsymbol{\beta_{ij}} = -\frac{1}{\omega_{ii}} \left(\sum_{l=1}^q \beta_{ijl}X_l \right) \end{split} \tag{A.1} \end{equation}\] where $\odot$ represents the element-wise multiplication between vectors.

Priors on the GraphR model are given below: \[\begin{align} \begin{split} & \beta_{ijl} = b_{ijl} s_{ijl}, \\ & b_{ijl} \mid \tau_{il} \sim N (0,\tau_{il}^{-1}), \\ & s_{ijl} \mid \pi_{ijl} \sim \text{Ber}(\pi_{ijl}), \\ & \tau_{il} \sim \text{Gamma}(a_\tau,b_\tau) \\ & \pi_{ijl} \sim \text{Beta}(a_\pi,b_\pi) \\ & \omega_{ii} \propto 1. \\ \end{split} \tag{A.2} \end{align}\]

Parameters of estimation are $\boldsymbol{\theta} = \{ \boldsymbol{b,s,\omega,\pi,\tau}\}$.

A.2 Mean field variational Bayes

Varitional Bayes method aims to obtain the optimal approximation of true posterior distribution from a class of tractable distributions $Q$, called variational family, by minimizing the Kullback-Leibler (KL) divergence between the approximate $q_{\text{vb}}(\boldsymbol{\theta})$ and the true posterior distribution $p(\boldsymbol{\theta|\mathbf{Y,X}})$ (Attias et al. 2000). A common choice of $Q$, known as mean-field approximation, assumes that $q_{\text{vb}}(\boldsymbol{\theta})$ can be expressed as $\prod_{k=1}^K q^k_{\text{vb}}(\boldsymbol{\theta_k})$ for some partition of $\boldsymbol{\theta}$. One can write $q_{\text{vb}}(\boldsymbol{\theta})$ as

\[\begin{align}\label{eq:KL_div} & q_{\text{vb}}^{*}(\boldsymbol{\theta}) \in \text{arg} \underset{q_{\text{vb}}(\boldsymbol{\theta}) \in \mathbb{Q}}{\text{min}} \ KL(q_{\text{vb}}(\boldsymbol{\theta})\|p(\boldsymbol{\theta |Y,X})) \text{, where} \nonumber \\ & KL(q_{\text{vb}}(\boldsymbol{\theta})\|p(\boldsymbol{\theta|Y,X})) = \int q_{\text{vb}}(\boldsymbol{\theta}) \text{ log} \left (\frac{q_{\text{vb}}(\boldsymbol{\theta})}{p(\boldsymbol{\theta|Y,X})}\right)d\boldsymbol{\theta} \nonumber \\ & = - \int q_{\text{vb}}(\boldsymbol{\theta}) \text{ log} \left (\frac{p(\boldsymbol{\theta|Y,X})}{q_{\text{vb}}(\boldsymbol{\theta})}\right)d\boldsymbol{\theta} = - \int q_{\text{vb}}(\boldsymbol{\theta}) \text{ log} \left (\frac{p(\boldsymbol{\theta,Y,X})}{q_{\text{vb}}(\boldsymbol{\theta})}\right)d\boldsymbol{\theta} + \text{ log} \left[ p(\boldsymbol{Y,X}) \right]. \end{align}\]

We denote $L[q_{\text{vb}}(\boldsymbol{\theta})] = \int q_{\text{vb}}(\boldsymbol{\theta}) \text{log} \left ( p(\boldsymbol{\theta,Y,X}) / q_{\text{vb}}(\boldsymbol{\theta}) \right)d\boldsymbol{\theta}$ which is the lower bound of the model log-likelihood, and write $KL(q_{\text{vb}}(\boldsymbol{\theta})\|p(\boldsymbol{\theta|Y,X})) = -L[q_{\text{vb}}(\boldsymbol{\theta})] + \text{ log} \left[ p(\boldsymbol{Y,X}) \right]$. Minimizing KL-divergence is equivalent to maximizing $L[q_{\text{vb}}(\boldsymbol{\theta})]$ since $\text{ log} \left[ p(\boldsymbol{Y,X}) \right]$ doesn’t involve $\boldsymbol{\theta}$. We can further show that \[\begin{align}\label{eq:lowerbound} L[q_{\text{vb}}(\boldsymbol{\theta})] &= \int q_{\text{vb}}(\boldsymbol{\theta}) \left[ \text{ log }p(\boldsymbol{\theta,Y,X}) - \text{ log } q_{\text{vb}}(\boldsymbol{\theta}) \right] d\boldsymbol{\theta} \nonumber \\ & = \int q^{k}_{\text{vb}}(\boldsymbol{\theta_k}) \int \left[ \left(\text{ log }p(\boldsymbol{\theta,Y,X}) - \text{ log }q_{\text{vb}}(\boldsymbol{\theta_k}) \right) \right] \prod_{i \neq k} q_{\text{vb}}(\boldsymbol{\theta_i}) d\boldsymbol{\theta_{-k}}d\boldsymbol{\theta_k} \nonumber \\ & \text{ } -\int \sum_{i \neq k} \text{ log }q_{\text{vb}}(\boldsymbol{\theta_i}) \prod_{i \neq k} q_{\text{vb}}(\boldsymbol{\theta_i}) \int q_{\text{vb}}(\boldsymbol{\theta_k}) d\boldsymbol{\theta_{-k}}d\boldsymbol{\theta_k} \nonumber \\ &= \int q^{k}_{\text{vb}}(\boldsymbol{\theta_k}) \left[\mathbb{E}_{-k} (\text{ log }p(\boldsymbol{\theta,Y,X})) - \text{ log } q_{\text{vb}}(\boldsymbol{\theta_k})\right] d\boldsymbol{\theta_k} - \text{const} \nonumber \\ &=-KL(q^{k}_{\text{vb}}(\boldsymbol{\theta_k}) \| \text{ exp } \left[ \mathbb{E}_{-k} (\text{ log }p(\boldsymbol{\theta,Y,X})) \right]). \end{align}\]

Therefore, we have $q^{k}_{\text{vb}}(\boldsymbol{\theta_k}) \propto \text{ exp } \left[ \mathbb{E}_{-k} (\text{ log }p(\boldsymbol{\theta,Y,X})) \right]$.

A.3 Update equations

The likelihood is expressed as \[\begin{align} p(\boldsymbol{\theta,Y,X}) &\propto \prod_{i=1}^p \left\{ \left| \frac{1}{\omega_{ii}}I_n \right|^{-\frac{1}{2}} \text{exp} \left[ -\frac{\omega_{ii}}{2} \left\| Y_i - \sum_{j \neq i} \gamma_{ij}(X) \odot Y_j \right \|^2 \right] \right\} \times \nonumber \\ & \prod_{i=1}^p \prod_{j \neq i}^p \prod_{l=1}^q \left\{\left(\frac{1}{\tau_{il}}\right)^{-\frac{1}{2}} \text{exp} \left[-\frac{\tau_{il}}{2}(b_{ijl})^2 \right] \left(\pi_{ijl}\right)^{s_{ijl}} \left(1-\pi_{ijl}\right)^{1-s_{ijl}} \right\} \times \nonumber \\ &\prod_{i=1}^p \prod_{l=1}^q \left\{ \left(\tau_{il} \right) ^ {a_\tau-1} \text{exp}\left[ -b_\tau \tau_{il} \right] \right\} \times \nonumber \\ &\prod_{i=1}^p \prod_{j \neq i}^p \prod_{l=1}^q\left\{ \left(\pi_{ijl}\right)^{a_\pi-1} \left(1-\pi_{ijl}\right)^{b_\pi-1} \right\} \end{align}\]

\[\begin{align} \text{log} p(\boldsymbol{\theta,Y,X}) = &\text{Const}+ \sum_{i=1}^p \left\{ \frac{n}{2} \text{log}(\omega_{ii}) - \frac{\omega_{ii}}{2} \left\| Y_i + \frac{1}{\omega_{ii}} \sum_{j \neq i}^p \sum_{l=1}^q b_{ijl} s_{ijl} X_l \odot Y_j \right \|^2 \right\} \nonumber \\ &+ \sum_{i=1}^p \sum_{l=1}^q \left\{ \left( \frac{p-1}{2} + a_\tau-1 \right) \text{log}(\tau_{il}) - \left( b_\tau + \frac{1}{2}\sum_{j \neq i}^p \left( b_{ijl} \right)^2 \right) \tau_{il} \right\} \nonumber \\ &+ \sum_{i=1}^p \sum_{j \neq i}^p \sum_{l=1}^q \left\{ \left( s_{ijl} + a_\pi -1 \right)\text{log}(\pi_{ijl}) + \left( b_\pi-s_{ijl} \right) \text{log}(1-\pi_{ijl}) \right\}. \end{align}\]

Due to the dependence between $\boldsymbol{b}$ and $\boldsymbol{s}$ (Titsias and Lázaro-Gredilla 2011), the mean-field assumption is considered as: \[q_{\text{vb}}(\boldsymbol{b,s,\omega,\pi,\tau}) = q_{\text{vb}}(\boldsymbol{b,s})q_{\text{vb}}(\boldsymbol{\omega})q_{\text{vb}}(\boldsymbol{\pi})q_{\text{vb}}(\boldsymbol{\tau}).\]

We can obtain the update equation for each parameter as $q^{k}_{\text{vb}}(\boldsymbol{\theta_k}) \propto \text{ exp } \left[ \mathbb{E}_{-k} (\text{ log }p(\boldsymbol{\theta,Y,X})) \right]$.

a. Update of $\tau_{il}$: \[\begin{align} \text{log} \ q_{\text{vb}}(\tau_{il}) &= \mathbb{E}_{-\tau_{il}} (l) \nonumber\\ &= C + \left[ \frac{p-1}{2}+a_\tau-1 \right] \text{log} \tau_{il} + \left[ b_\tau + \frac{1}{2}\sum_{j \neq i}^p \mathbb{E}_{-\tau_{il}} \left( b_{ijl} \right)^2 \right] \tau_{il}. \end{align}\]

\[q_{\text{vb}}(\tau_{il}) \sim \Gamma \left(a_\tau + \frac{p-1}{2}, b_\tau + \frac{1}{2}\sum_{j \neq i}^p \mathbb{E}_{-\tau_{il}}\left( b_{ijl} \right)^2 \right)\]

b. Update of $\pi_{ijl}$: \[\begin{align} \text{log} \ q_{\text{vb}}(\pi_{ijl}) &= \mathbb{E}_{-\pi_{ijl}} (l) \nonumber \\ &= C+ \left[ \mathbb{E}_{-\pi_{ijl}}(s_{ijl})+a_\pi-1 \right] \text{log}(\pi_{ijl}) + \left[ b_\pi - \mathbb{E}_{-\pi_{ijl}}(s_{ijl}) \right] \text{log}(1-\pi_{ijl}). \end{align}\]

\[q_{\text{vb}}(\pi_{ijl}) \sim \text{Beta} \left(\mathbb{E}_{-\pi_{ijl}}(s_{ijl})+a_\pi, b_\pi - \mathbb{E}_{-\pi_{ijl}}(s_{ijl}) +1 \right)\]

c. Update $\omega_{ii}$: \[\begin{align} \text{log} \ q_{\text{vb}}(\omega_{ii}) &= \mathbb{E}_{-\omega_{ii}}(l) \nonumber \\ &= C+ \frac{n}{2} \text{log}(\omega_{ii}) -\frac{\|Y_i\|^2}{2} \omega_{ii} - \frac{\mathbb{E}_{-\omega_{ii}}\|\sum_{j \neq i}^p \sum_{l=1}^q b_{ijl}s_{ijl} X_l \odot Y_j\|^2}{2} \left( \frac{1}{\omega_{ii}} \right). \end{align}\]

\[q_{\text{vb}}(\omega_{ii}) \sim \text{GIG} \left(\frac{n+2}{2},\|Y_i\|^2, \mathbb{E}_{-\omega_{ii}}\|\sum_{j \neq i}^p \sum_{l=1}^q b_{ijl}s_{ijl} X_l \odot Y_j\|^2 \right)\] Here GIG represents generalized inverse Gaussian distribution.

d. Update $\beta_{ijl} = b_{ijl}s_{ijl}$:

$\text{We Denote } M_{-(m,n)}^{-k} = \sum_{j \neq m }^p \sum_{l=1 }^q b_{mjl} s_{mjl} X_l \odot Y_j - b_{mnk} s_{mnk}X_k \odot Y_n$ and \[\begin{align} \text{log} \ q_{\text{vb}}(b_{ijl}|s_{ijl}) &= \mathbb{E}_{-b_{ijl}|s_{ijl}}(l) \nonumber \\ &= C-\frac{1}{2} \left[ \mathbb{E}_{-b_{ijl}|s_{ijl}}(\tau_{il}) + \mathbb{E}_{-b_{ijl}|s_{ijl}} \left( \frac{1}{\omega_{ii}} \right) s_{ijl} \|X_l \odot Y_j \|^2 \right] \left(b_{ijl} \right)^2 \nonumber \\ &-\left[Y_i+\mathbb{E}_{-b_{ijl}|s_{ijl}} \left( \frac{1}{\omega_{ii}} \right)\mathbb{E}_{-b_{ijl}|s_{ijl}}M_{-(i,j)}^{-l} \right]^T \left[X_l \odot Y_j \right] s_{ijl} b_{ijl}. \end{align}\]

\[q_{\text{vb}}(b_{ijl}|s_{ijl}) \sim \mathbb{N} \left(\mu(s_{ijl}),\sigma^2(s_{ijl})\right)\] \[\sigma^2(s_{ijl}) = \left[ \mathbb{E}_{-b_{ijl}|s_{ijl}}(\frac{1}{\omega_{ii}})\|X_l \odot Y_j\|^2 s_{ijl} + \mathbb{E}_{-b_{ijl}|s_{ijl}}(\tau_{il}) \right]^{-1}\]

\[\mu(s_{ijl}) = - \sigma^2(s_{ijl}) \left\{ \left[Y_i + \mathbb{E}_{-b_{ijl}|s_{ijl}} \left(\frac{1}{\omega_{ii}} M_{-(i,j)}^{-l} \right) \right]^T [X_l \odot Y_j] s_{ijl} \right\}\]

The mariginal density of $q_{\text{vb}}(s_{ijl})$ is obtained by integrating the joint density of $q_{\text{vb}}(b_{ijl},s_{ijl})$ as \[\begin{align*} \begin{split} & q_{\text{vb}}(s_{ijl}) = \int \text{exp} \left\{ \text{log} \ q_{\text{vb}}(b_{ijl},s_{ijl}) \right\}db_{ijl} \\ & = \text{exp} \left\{ s_{ijl} \ \mathbb{E}_{-s_{ijl}} \text{logit}(\pi_{ijl})\right\}\ \int \mathbb{N}_{b_{ijl}} \left(\mu(s_{ijl}),\sigma^2(s_{ijl})\right) \sigma(s_{ijl}) \text{exp } \left( \frac{\mu^2(s_{ijl})}{2\sigma^2(s_{ijl})}\right) db_{ijl} \\ & = \sigma(s_{ijl}) \text{exp} \left\{ s_{ijl} \ \mathbb{E}_{-s_{ijl}} \text{logit}(\pi_{ijl}) + \left( \frac{\mu^2(s_{ijl})}{2\sigma^2(s_{ijl})}\right) \right\}. \\ & \text{log} [q_{\text{vb}}(s_{ijl})] = C + \text{log}(\sigma(s_{ijl})) + s_{ijl} \ \mathbb{E}_{-s_{ijl}} \text{logit}(\pi_{ijl}) + \frac{\mu^2(s_{ijl})}{2\sigma^2(s_{ijl})} \\ & \text{log} [q_{\text{vb}}(s_{ijl}=0)] = C - \frac{1}{2} \text{log} \mathbb{E}_{-s_{ijl}}\tau_{ijl} \\ & \text{log} [q_{\text{vb}}(s_{ijl}=1)] = C +\mathbb{E}_{-s_{ijl}} \text{logit}(\pi_{ijl}) - \frac{1}{2} \text{log} \left[\mathbb{E}_{-s_{ijl}}(\frac{1}{\omega_{ii}})\|X_l \odot Y_j\|^2 + \mathbb{E}_{-s_{ijl}}(\tau_{il}) \right] \\ & + \frac{1}{2}\left[\mathbb{E}_{-s_{ijl}}(\frac{1}{\omega_{ii}})\|X_l \odot Y_j\|^2 + \mathbb{E}_{-s_{ijl}}(\tau_{il}) \right]^{-1}\left[(X_l \odot Y_j)^T (Y_i + \mathbb{E}_{-s_{ijl}} \left(\frac{1}{\omega_{ii}}\right) \mathbb{E}_{-s_{ijl}} M_{-(i,j)}^{-l}) \right]^2 \end{split} \end{align*}\]

\[\begin{equation*} \begin{split} & s_{ijl} \sim \text{Ber}(\psi_{ijl}) \\ & \text{log} \ q_{\text{vb}}(s_{ijl}) = C + s_{ijl}\text{logit}(\psi_{ijl}) \\ & \psi_{ijl} = \mathbb{E}_{-s_{ijl}} \text{logit}(\pi_{ijl}) - \frac{1}{2}log\left[ \mathbb{E}_{-s_{ijl}}(\frac{1}{\omega_{ii}})\|X_l \odot Y_j\|^2 + \mathbb{E}_{-s_{ijl}}(\tau_{il}) \right] + \frac{1}{2}log \mathbb{E}_{-s_{ijl}} \tau_{il} \\ &+ \frac{1}{2}\left[ \mathbb{E}_{-s_{ijl}}(\frac{1}{\omega_{ii}})\|X_l \odot Y_j\|^2 + \mathbb{E}_{-s_{ijl}}(\tau_{il}) \right]^{-1}\left[ (X_l \odot Y_j)^T (Y_i + \mathbb{E}_{-s_{ijl}}\left[\frac{1}{\omega_{ii}}\right] \mathbb{E}_{-s_{ijl}} M_{-(i,j)}^{-l}) \right]^2 \\ \end{split} \end{equation*}\]

A.4 Evidence lower bound (ELBO)

As shown previously, the evidence lower bound is defined as:

\[\begin{equation*} \begin{split} L[q_{\text{vb}}(\boldsymbol{\theta})] &= \int q_{\text{vb}}(\boldsymbol{\theta}) \text{log} \left ( p(\boldsymbol{\theta,Y,X}) / q_{\text{vb}}(\boldsymbol{\theta}) \right)d\boldsymbol{\theta} \\ & = \mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} \text{ log}(p(\boldsymbol{\theta,Y,X})) - \mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(\boldsymbol{\theta})] \\ & = \mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} \text{ log}(p(\boldsymbol{\theta,Y,X})) - \sum_{i=1}^p \mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(\omega_{ii})] - \sum_{i=1}^p \sum_{l=1}^q \mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(\tau_{il})] \\ & - \sum_{i=1}^p \sum_{j \neq i}^p \sum_{l=1}^q \left\{ \mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(b_{ijl},s_{ijl})] + \mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(\pi_{ijl})] \right\}. \end{split} \end{equation*}\] Notably, $-\mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(\omega_{ii})] , -\mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(\tau_{il})], -\mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(b_{ijl},s_{ijl})], -\mathbb{E}_{q_{\text{vb}}(\boldsymbol{\theta})} [q_{\text{vb}}(\pi_{ijl})]$ are entropies of GIG, Gamma, Normal, Bernoulli and Beta distributions, which have a close form. We denote entropy as $H(\cdot)$ and all the expectations below are taken w.r.t $q_{\text{vb}}(\boldsymbol{\theta})$.

a. Derivation of $\mathbb{E} \text{ log}(p(\boldsymbol{\theta,Y,X}))$:

\[\begin{align*} \begin{split} \mathbb{E} \text{ log}(p(\boldsymbol{\theta,Y,X})) = & \sum_{i=1}^p \biggl \{ -\frac{n + (p-1)q}{2} \text{log} 2\pi + q \left[ a_\tau \text{log}b_\tau - \text{log} \Gamma(a_\tau) \right] \\ & + (p-1)q \left[ \text{log} \Gamma(a_\pi + b_\pi) -\text{log} \Gamma(a_\pi b_\pi) \right] \\ & + \frac{n}{2} \mathbb{E}(\text{log}\omega_{ii}) -\frac{\|Y_i\|^2}{2} \mathbb{E}\omega_{ii} -\mathbb{E}(\omega_{ii}^{-1}) \mathbb{E} \left\| \sum_{j \neq i}^p \sum_{l=1}^q \beta_{ijl} Z_l \odot Y_j \right \|^2 \\ & - Y_i^T \left(\sum_{j \neq i}^p \sum_{l=1}^q \mathbb{E} \beta_{ijl} Z_l \odot Y_j \right) \\ & + \sum_{l=1}^q \left[ \left( \frac{p-1}{2} + a_\tau-1 \right) \mathbb{E} \left(\text{log} \tau_{il} \right) - \left( b_\tau + \frac{\sum_{j \neq i}^p \mathbb{E}b_{ijl}^2}{2} \right) \mathbb{E} \tau_{il} \right] \\ & + \sum_{j \neq i}^p \sum_{l=1}^q \left[ \left( \mathbb{E}s_{ijl}+a_\pi - 1 \right) \mathbb{E} (\text{log} \pi_{ijl}) + (b_\pi - \mathbb{E}s_{ijl}) \right] \mathbb{E} \left(\text{log} (1-\pi_{ijl}) \right) \biggl \} \end{split} \end{align*}\]

b. Derivation of $H(\tau_{il})$ \[\begin{align*} \begin{split} H(\tau_{il}) = & - \left[a_\tau+\frac{p-1}{2}\right] \text{log}\left[b_\tau + \frac{1}{2}\sum_{j \neq i} ^p\mathbb{E}b_{ijl}^2\right] + \text{log } \Gamma(a_\tau+\frac{p-1}{2}) \\ & - \left[a_\tau+\frac{p-1}{2} -1 \right] \mathbb{E}(\text{log }\tau_{il}) + \left[b_\tau + \frac{1}{2}\sum_{j \neq i} ^p\mathbb{E}b_{ijl}^2\right] \mathbb{E}\tau_{il} \end{split} \end{align*}\]

c. Derivation of $H(\pi_{ijl})$ \[\begin{align*} \begin{split} H(\pi_{ijl}) = & - \text{log }\Gamma(a_\pi+b_\pi+1) + \text{log } \Gamma(\mathbb{E}s_{ijl} + a_\pi) + \text{log } \Gamma(b_\pi + 1 - \mathbb{E}s_{ijl}) \\ & - (\mathbb{E}s_{ijl} + a_\pi -1) \mathbb{E} \text{log } \pi_{ijl} - (b_\pi - \mathbb{E}s_{ijl}) \mathbb{E} \text{log } (1-\pi_{ijl}) \end{split} \end{align*}\]

d. Derivation of $H(\omega_{ii})$

Denote $a_{\omega_{ii}} = \|Y_i\|^2$ and $b_{\omega_{ii}} = \mathbb{E}\|\sum_{j \neq i}^p \sum_{l=1}^q \beta_{ijl} Z_l \odot Y_j\|^2$

\[H(\omega_{ii}) = -\frac{n+2}{4} \text{log}\frac{a_{\omega_{ii}}}{b_{\omega_{ii}}} + \text{log}(2K_{(n+2)/2}\sqrt{a_{\omega_{ii}} b_{\omega_{ii}}}) -\frac{n}{2}\mathbb{E}(\text{log}\omega_{ii})+ \frac{a_{\omega_{ii}}}{2}\mathbb{E}\omega_{ii} + \frac{b_{\omega_{ii}}}{2}\mathbb{E}\omega_{ii}^{-1}\]

e. Derivation of $H(b_{ijl}, s_{ijl})$ \[H(b_{ijl}, s_{ijl}) = H(b_{ijl}|s_{ijl}) + H(s_{ijl})\] where \[H(b_{ijl}|s_{ijl}) = \frac{1}{2} \text{log} \left[2\pi\sigma^2(s_{ijl})\right] + \frac{1}{2}\]

\[H(s_{ijl}) = -\psi_{ijl} \text{ log}(\psi_{ijl}) - (1-\psi_{ijl}) \text{ log}(1-\psi_{ijl})\]

Combining all the previous entropy derivations, we have the following expression of evidence lower bound (ELBO) as $ \[\begin{align*} \begin{split} L[q_{\text{vb}}(\boldsymbol{\theta})] = & -\frac{np + p(p-1)q}{2} \text{log} 2\pi \\ & + p q \left[ a_\tau \text{log}b_\tau - \text{log} \Gamma(a_\tau) + \text{log} \Gamma(a_\tau + \frac{p-1}{2}) \right] \\ & + p(p-1)q \left[ \text{log} \Gamma(a_\pi + b_\pi) -\text{log} \Gamma(a_\pi b_\pi) -\text{log} \Gamma(a_\pi+b_\pi+1) \right] \\ & - \sum_{i=1}^p \left\{ Y_i^T \left( \sum_{j \neq i}^p \sum_{l=1}^q \mathbb{E} \beta_{ijl} Z_l \odot Y_j \right) - \frac{n+2}{4} \text{log}\frac{a_{\omega_{ii}}}{b_{\omega_{ii}}} + \text{log}(2K_{(n+2)/2}\sqrt{a_{\omega_{ii}} b_{\omega_{ii}}}) \right\} \\ &- \left[a_\tau+\frac{p-1}{2}\right] \sum_{i=1}^p \sum_{l=1}^q \left\{ \text{log}\left[b_\tau + \frac{1}{2}\sum_{j \neq i} ^p\mathbb{E}b_{ijl}^2\right] \right \} \\ &+ \sum_{i=1}^p \sum_{j \neq i}^p \sum_{l=1}^q \biggl \{ \text{log } \Gamma(\mathbb{E}s_{ijl} + a_\pi) + \text{log } \Gamma(b_\pi + 1 - \mathbb{E}s_{ijl}) \\ &+ \frac{1}{2} \text{log} \left[2\pi\sigma^2(s_{ijl})\right] + \frac{1}{2} -\psi_{ijl} \text{ log}(\psi_{ijl}) - (1-\psi_{ijl}) \text{ log}(1-\psi_{ijl}) \biggl \}. \end{split} \end{align*}\]

A.5 Algorithm

A.6 Overview of competing methods

Here we only consider methods which allows for multiple graphical models or covariate-dependent graphs for individual. The GraphR model (A.1) estimates covariate-dependent graphs based on Gaussian likelihood with incorporation of discrete and/or continuous variables which encode heterogeneity of samples. Mean-field Variational Bayesian (MFVB) algorithm is implemented in the GraphR in order to achieve computational efficiency. The FGL, GGL (Danaher, Wang, and Witten 2014) and LASICH (Saegusa and Shojaie 2016) can only be used in group-specific settings. The Gaussian graphical regression model proposed by Zhang and Li (2022) apply penalized likelihood functions for the inference which allows scenarios for both multiple graphical models and individual covariate-dependent graphs. However those methods fail to provide measurements of uncertainty and probabilistic reasoning. Peterson, Stingo, and Vannucci (2015) propose a Bayesian model for multiple graphical models and implemented a MCMC based method. Wang et al. (2021) and Ni, Stingo, and Baladandayuthapani (2019) also develop methods to estimate conditional dependencies as a function of individual-level or group-level covariates in directed or undirected graphs. However both methods apply MCMC based algorithm and thus fail to scale with high dimensional data set. The Table below summerizes the comparison among these methods.

Table A.1: Methodological comparison among different methods.
	Frequentist or Bayesian	Multi-category graphs	Continuously-varying graphs	Scalability	Uncertainty quantification
GraphR	Bayesian	$\checkmark$	$\checkmark$	$\checkmark$	$\checkmark$
FGL, GGL (Danaher, Wang, and Witten 2014)	Frequentist	$\checkmark$	x	$\checkmark$	x
LASICH (Saegusa and Shojaie 2016)	Frequentist	$\checkmark$	x	$\checkmark$	x
Zhang and Li (2022)	Frequentist	$\checkmark$	$\checkmark$	$\checkmark$	x
Peterson, Stingo, and Vannucci (2015)	Bayesian	$\checkmark$	x	x	$\checkmark$
Ni, Stingo, and Baladandayuthapani (2019)	Bayesian	$\checkmark$	$\checkmark$	x	$\checkmark$
Wang et al. (2021)	Bayesian	$\checkmark$	$\checkmark$	x	$\checkmark$

References

Attias, Hagai, John Platt, Alex Acero, and Li Deng. 2000. “Speech Denoising and Dereverberation Using Probabilistic Models.” Advances in Neural Information Processing Systems 13.

Danaher, Patrick, Pei Wang, and Daniela M Witten. 2014. “The Joint Graphical Lasso for Inverse Covariance Estimation Across Multiple Classes.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2): 373–97.

Ni, Yang, Francesco C Stingo, and Veerabhadran Baladandayuthapani. 2019. “Bayesian Graphical Regression.” Journal of the American Statistical Association 114 (525): 184–97.

Peterson, Christine, Francesco C Stingo, and Marina Vannucci. 2015. “Bayesian Inference of Multiple Gaussian Graphical Models.” Journal of the American Statistical Association 110 (509): 159–74.

Saegusa, Takumi, and Ali Shojaie. 2016. “Joint Estimation of Precision Matrices in Heterogeneous Populations.” Electronic Journal of Statistics 10 (1): 1341.

Titsias, Michalis, and Miguel Lázaro-Gredilla. 2011. “Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning.” Advances in Neural Information Processing Systems 24.

Wang, Zeya, Veerabhadran Baladandayuthapani, Ahmed O Kaseb, Hesham M Amin, Manal M Hassan, Wenyi Wang, and Jeffrey S Morris. 2021. “Bayesian Edge Regression in Undirected Graphical Models to Characterize Interpatient Heterogeneity in Cancer.” Journal of the American Statistical Association, no. just-accepted: 1–31.

Zhang, Jingfei, and Yi Li. 2022. “High-Dimensional Gaussian Graphical Regression Models with Covariates.” Journal of the American Statistical Association, 1–13.

	Frequentist or Bayesian	Multi-category graphs	Continuously-varying graphs	Scalability	Uncertainty quantification
GraphR	Bayesian	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)
FGL, GGL (Danaher, Wang, and Witten 2014)	Frequentist	\(\checkmark\)	x	\(\checkmark\)	x
LASICH (Saegusa and Shojaie 2016)	Frequentist	\(\checkmark\)	x	\(\checkmark\)	x
Zhang and Li (2022)	Frequentist	\(\checkmark\)	\(\checkmark\)	\(\checkmark\)	x
Peterson, Stingo, and Vannucci (2015)	Bayesian	\(\checkmark\)	x	x	\(\checkmark\)
Ni, Stingo, and Baladandayuthapani (2019)	Bayesian	\(\checkmark\)	\(\checkmark\)	x	\(\checkmark\)
Wang et al. (2021)	Bayesian	\(\checkmark\)	\(\checkmark\)	x	\(\checkmark\)