Chapter 19 Full Bayesian Inference of NHPP(Lecture on 03/09/2021)

Consider a NHPP with intensity function \(\lambda(t)\) and the time that events occuring at \(t_1,\cdots,t_n\), then the likelihood can be written as \[\begin{equation} L\propto \exp(-\int_0^T\lambda(t)dt)\prod_{i=1}^n\lambda(t_i) \tag{19.1} \end{equation}\]

To use the density estimation technique, we reparameterize \(\lambda(t)\) to \((f(t),\gamma)\) and for the priors we use \(p(\gamma)\propto \frac{1}{\gamma}\) and \(f(t,\boldsymbol{\theta})=\sum_{l=1}^SK(t,\boldsymbol{\theta}_l)\omega_l\), where \(\sum_{l=1}^S\omega_l=1\) is the sticking breaking process, that is, \(\omega_1=Z_1,\omega_2=Z_2(1-Z_1),\cdots,\omega_S=(1-Z_1)\cdots(1-Z_{S-1})\), where \(Z_l|\alpha\stackrel{i.i.d.}{\sim}Beta(1,\alpha)\), for \(l=1,\cdots,S-1\). \(\boldsymbol{\theta}=(\mu,\tau)\), we choose the mixing kernel as \[\begin{equation} K(t;\mu,\tau)=\frac{t^{\mu\tau T^{-1}}(T-t)^{\tau(1-\mu\tau^{-1})-1}}{Beta(\mu\tau T^{-1},\tau(1-\mu T^{-1}))T^{\tau-1}} \tag{19.2} \end{equation}\] \(k(t;\cdot,\cdot)\) has support \((0,T)\) and \(\mu\in(0,T)\), \(\tau>0\).

The parameters we need to do inference are \((\mu_1,\tau_1),\cdots,(\mu_S,\tau_S),Z_1,\cdots,Z_{S-1},\alpha\) and \(\gamma\). The priors we use for those parameters are \[\begin{equation} \begin{split} &\tau_1,\cdots,\tau_S\sim IG(\alpha_{\tau},\beta),\quad \beta\sim \exp(d)\\ &d,\alpha_{\tau}\equiv\text{constant(fixed)}\\ &\mu_1,\cdots,\mu_S\stackrel{i.i.d.}{\sim}Unif(0,T)\\ &p(\gamma)\propto\frac{1}{\gamma}\\ &Z_1,\cdots,Z_{S-1}|\alpha\sim Beta(1,\alpha) \end{split} \tag{19.3} \end{equation}\]

Denote \(\mathbf{Z}=(Z_1,\cdots,Z_{S-1})^T,\boldsymbol{\tau}=(\tau_1,\cdots,\tau_S)^T\) and \(\boldsymbol{\mu}=(\mu_1,\cdots,\mu_S)^T\), then the full posterior is \[\begin{equation} \begin{split} p(\mathbf{Z},\boldsymbol{\tau},\boldsymbol{\mu},&\alpha,\beta,\gamma)\propto\exp(-\gamma)\gamma^n\prod_{i=1}^{n}\{\sum_{l=1}^SK(t_i;\mu_l\tau_l)\omega_l\}p(\gamma)\\ &\times\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha)p(\alpha)\prod_{l=1}^S[IG(\tau_l|\alpha_{\tau},\beta)\times Unif(\mu_l|0,T)]p(\beta) \end{split} \tag{19.4} \end{equation}\]

The mixture likelihood framework is not accessable to efficient MCMC. However, we can use the data augumentation technique to make MCMC efficient.

Define \(\zeta_i=l\) if the \(i\)th data comes from the \(l\)th mixture component, \(l=1,\cdots,S,i=1,\cdots,n\). Thus, \(f(t_i|\zeta_i=l)=K(t_i;\mu_l,\tau_l)\) and \(p(\zeta_i=l)=\omega_l\), and marginalize out \(\zeta_i\), \(f(t_i)=\sum_{l=1}^Sf(t_i|\zeta_i=l)p(\zeta_i=l)=\sum_{l=1}^SK(t_i;\mu_l,\tau_l)\omega_l\), which is just the original model.

Therefore, the full posterior can be written as \[\begin{equation} \begin{split} p(\mathbf{Z},\boldsymbol{\tau},\boldsymbol{\mu},&\alpha,\beta,\gamma,\boldsymbol{\zeta})\propto\exp(-\gamma)\gamma^n\prod_{i=1}^{n}\prod_{l=1}^S\{\sum_{l=1}^S[K(t_i;\mu_l\tau_l)\omega_l]^{I(\zeta_i=l)}p(\gamma)\\ &\times\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha)p(\alpha)\prod_{l=1}^S[IG(\tau_l|\alpha_{\tau},\beta)\times Unif(\mu_l|0,T)]p(\beta) \end{split} \tag{19.5} \end{equation}\]

Now, we can use Gibbs sampler to update those parameters.

Firstly, we have \[\begin{equation} \begin{split} p(\gamma|\cdots)&\propto \exp(-\gamma)\gamma^np(\gamma)\\ &\propto \exp(-\gamma)\gamma^{n-1}\quad (as\,p(\gamma)\propto\frac{1}{\gamma}) \end{split} \tag{19.6} \end{equation}\] Thus, \(\gamma|\cdots\sim Gamma(n,1)\).

Next, to update \(\beta\), notice that \[\begin{equation} \begin{split} p(\beta|\cdots)&\propto\prod_{l=1}^SIG(\tau_l|a_{\tau},\beta)p(\beta)\\ &\propto \exp(-\beta\sum_{l=1}^{S}\frac{1}{\tau_l})\beta^{\alpha_{\tau}}\exp(-\beta d)\\ &\propto \exp(-\beta(d+\sum_{l=1}^{S}\frac{1}{\tau_l}))\beta^{\alpha_{\tau}} \end{split} \tag{19.7} \end{equation}\] Therefore, \(\beta|\cdots\sim Gamma(\alpha_{\tau}+1,d+\sum_{l=1}^S\frac{1}{\tau_l})\).

Then, \(\alpha\) is updating using a greedy Gibbs sampler. Think of possible values of \(\alpha\) on a fine gird \(\{\alpha_1^*,\cdots,\alpha_m^*\}\) then draw \(\alpha\) from the discrete distribution with probability of drawing \(\alpha_j^*\) proportional to \[\begin{equation} P(\alpha=\alpha_j^*|\cdots)=\frac{\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha_j^*)}{\sum_{k=1}^m\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha_k^*)} \tag{19.8} \end{equation}\] Typically, each \(\alpha_j^*\in(0,1)\), but you can create a more extensive search by increasing the range of each \(\alpha_j^*\). In conclusion, \(\alpha|\cdots\sim \text{Discrete Uniform}(\alpha_1^*,\cdots,\alpha_m^*)\) with probability given by (19.8).

We then update \(\mu_l\) and \(\tau_l\), since \[\begin{equation} \begin{split} p(\mu_l,\tau_l|\cdots)&\propto\prod_{i=1}^n[\frac{(\tau-t_i)^{\tau_l(1-\mu_l\tau^{-1})-1}}{Beta(\mu_l\tau_l\tau^{-1},\tau_l(1-\mu_l\tau^{-1}))}]^{I(\zeta_i=l)}\\ &\times IG(\tau_l|\alpha_l,\beta)Unif(\mu_l|0,T) \end{split} \tag{19.9} \end{equation}\] The full conditionals are not in closed form. Rather tan using the Metropolis-Hasting algorithm, you need to use Metropolis Adjusted Langevin Algorithm (MALA) or Hamiltonian Monte Carlo (HMC) algorithm, elliptical slice sampling.

Now comes to update \(Z_l\), we have \[\begin{equation} \begin{split} p(Z_l|\cdots)&\propto\prod_{i=1}^n[\prod_{j\geq l}\omega_j^{I(\zeta_i=j)}](1-Z_l)^{\alpha-1}\\ &\propto\prod_{i=1}^n[\prod_{j\geq l}\{Z_j(1-Z_1)\cdots(1-Z_{j-1})\}^{I(\zeta_i=j)}](1-Z_l)^{\alpha-1} \end{split} \tag{19.10} \end{equation}\] From this product, collect all terms of \(Z_l\) and \(1-Z_l\), we have \[\begin{equation} p(Z_l|\cdots)\propto Z_l^{n_1}(1-Z_l)^{n_2} \tag{19.11} \end{equation}\] Therefore, \(Z_l|\cdots\sim Beta(n_1+1,n_2+1)\).

Finally, to update \(\zeta_i\), we have \[\begin{equation} p(\zeta_i=l|\cdots)\propto K(t_i;\mu_l,\tau_l)\omega_l,\quad l=1,\cdots,S \tag{19.12} \end{equation}\] Therefore, \(\zeta_i\) is a discrete distribution taking values \(1,\cdots,S\) with probability \(\frac{K(t_i;\mu_1,\tau_1)\omega_1}{\sum_{l=1}^SK(t_i;\mu_l,\tau_l)\omega_l},\cdots,\frac{K(t_i;\mu_S,\tau_S)\omega_S}{\sum_{l=1}^SK(t_i;\mu_l,\tau_l)\omega_l}\) respectively.

Marked Poisson Process In NHPP, the data is given us as \(t_1,\cdots,t_n\), which are the points where event occurred. In marked Poisson process, we also have \(y_1,\cdots,y_n\), which are the marks at the points where event occurred. To model data of this kind, we can still apply the idea of NHPP, the only difference is to chnage the intensity function to \(\lambda(s,y)=\phi(s)h(y|s)\), where \(\phi(s)\) is the intensity for a non-homogenous Poisson process and \(h(\cdot)\) is assumed to be a density. To do the inference, we define \[\begin{equation} \begin{split} \gamma&=\int\int\lambda(s,y)dsdy\\ &=\int\int \phi(s)h(y|s)dyds\\ &=\int\phi(s)(\int h(y|s)dy)ds\\ &=\int \phi(s)ds \end{split} \tag{19.13} \end{equation}\]

Since \(\gamma=\int \phi(s)ds\), we have \(\lambda(s,y)=\gamma f(s)h(y|s)\). Then estimating \(\lambda(\cdot)\) is equivalent to estimate \(\gamma,f\) and \(h\) where \(f\) and \(h\) are density functions.

Log-Gaussian Cox Process

Assume \(\log \lambda(s)\sim GP(\mu,C_{\nu}(\cdot,\cdot))\) and estimating from there. We won’t have closed form expression for anything. The likelihood is \[\begin{equation} L\propto \exp(\int_0^T\lambda(t)dt)\prod_{i=1}^n\lambda(t_i) \tag{19.14} \end{equation}\] we have to evaluate this integral. \[\begin{equation} \int_0^T\lambda(t)dt=\frac{1}{S}\sum_{s=1}^S\lambda(t_s^*) \tag{19.15} \end{equation}\] where \(t_1^8,\cdots,t_S^*\) are some knot points chosen in the domain.