Chapter 19 Full Bayesian Inference of NHPP(Lecture on 03/09/2021)

Consider a NHPP with intensity function λ(t) and the time that events occuring at t1,,tn, then the likelihood can be written as Lexp(T0λ(t)dt)ni=1λ(ti)

To use the density estimation technique, we reparameterize λ(t) to (f(t),γ) and for the priors we use p(γ)1γ and f(t,\boldsymbol{\theta})=\sum_{l=1}^SK(t,\boldsymbol{\theta}_l)\omega_l, where \sum_{l=1}^S\omega_l=1 is the sticking breaking process, that is, \omega_1=Z_1,\omega_2=Z_2(1-Z_1),\cdots,\omega_S=(1-Z_1)\cdots(1-Z_{S-1}), where Z_l|\alpha\stackrel{i.i.d.}{\sim}Beta(1,\alpha), for l=1,\cdots,S-1. \boldsymbol{\theta}=(\mu,\tau), we choose the mixing kernel as \begin{equation} K(t;\mu,\tau)=\frac{t^{\mu\tau T^{-1}}(T-t)^{\tau(1-\mu\tau^{-1})-1}}{Beta(\mu\tau T^{-1},\tau(1-\mu T^{-1}))T^{\tau-1}} \tag{19.2} \end{equation} k(t;\cdot,\cdot) has support (0,T) and \mu\in(0,T), \tau>0.

The parameters we need to do inference are (\mu_1,\tau_1),\cdots,(\mu_S,\tau_S),Z_1,\cdots,Z_{S-1},\alpha and \gamma. The priors we use for those parameters are \begin{equation} \begin{split} &\tau_1,\cdots,\tau_S\sim IG(\alpha_{\tau},\beta),\quad \beta\sim \exp(d)\\ &d,\alpha_{\tau}\equiv\text{constant(fixed)}\\ &\mu_1,\cdots,\mu_S\stackrel{i.i.d.}{\sim}Unif(0,T)\\ &p(\gamma)\propto\frac{1}{\gamma}\\ &Z_1,\cdots,Z_{S-1}|\alpha\sim Beta(1,\alpha) \end{split} \tag{19.3} \end{equation}

Denote \mathbf{Z}=(Z_1,\cdots,Z_{S-1})^T,\boldsymbol{\tau}=(\tau_1,\cdots,\tau_S)^T and \boldsymbol{\mu}=(\mu_1,\cdots,\mu_S)^T, then the full posterior is \begin{equation} \begin{split} p(\mathbf{Z},\boldsymbol{\tau},\boldsymbol{\mu},&\alpha,\beta,\gamma)\propto\exp(-\gamma)\gamma^n\prod_{i=1}^{n}\{\sum_{l=1}^SK(t_i;\mu_l\tau_l)\omega_l\}p(\gamma)\\ &\times\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha)p(\alpha)\prod_{l=1}^S[IG(\tau_l|\alpha_{\tau},\beta)\times Unif(\mu_l|0,T)]p(\beta) \end{split} \tag{19.4} \end{equation}

The mixture likelihood framework is not accessable to efficient MCMC. However, we can use the data augumentation technique to make MCMC efficient.

Define \zeta_i=l if the ith data comes from the lth mixture component, l=1,\cdots,S,i=1,\cdots,n. Thus, f(t_i|\zeta_i=l)=K(t_i;\mu_l,\tau_l) and p(\zeta_i=l)=\omega_l, and marginalize out \zeta_i, f(t_i)=\sum_{l=1}^Sf(t_i|\zeta_i=l)p(\zeta_i=l)=\sum_{l=1}^SK(t_i;\mu_l,\tau_l)\omega_l, which is just the original model.

Therefore, the full posterior can be written as \begin{equation} \begin{split} p(\mathbf{Z},\boldsymbol{\tau},\boldsymbol{\mu},&\alpha,\beta,\gamma,\boldsymbol{\zeta})\propto\exp(-\gamma)\gamma^n\prod_{i=1}^{n}\prod_{l=1}^S\{\sum_{l=1}^S[K(t_i;\mu_l\tau_l)\omega_l]^{I(\zeta_i=l)}p(\gamma)\\ &\times\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha)p(\alpha)\prod_{l=1}^S[IG(\tau_l|\alpha_{\tau},\beta)\times Unif(\mu_l|0,T)]p(\beta) \end{split} \tag{19.5} \end{equation}

Now, we can use Gibbs sampler to update those parameters.

Firstly, we have \begin{equation} \begin{split} p(\gamma|\cdots)&\propto \exp(-\gamma)\gamma^np(\gamma)\\ &\propto \exp(-\gamma)\gamma^{n-1}\quad (as\,p(\gamma)\propto\frac{1}{\gamma}) \end{split} \tag{19.6} \end{equation} Thus, \gamma|\cdots\sim Gamma(n,1).

Next, to update \beta, notice that \begin{equation} \begin{split} p(\beta|\cdots)&\propto\prod_{l=1}^SIG(\tau_l|a_{\tau},\beta)p(\beta)\\ &\propto \exp(-\beta\sum_{l=1}^{S}\frac{1}{\tau_l})\beta^{\alpha_{\tau}}\exp(-\beta d)\\ &\propto \exp(-\beta(d+\sum_{l=1}^{S}\frac{1}{\tau_l}))\beta^{\alpha_{\tau}} \end{split} \tag{19.7} \end{equation} Therefore, \beta|\cdots\sim Gamma(\alpha_{\tau}+1,d+\sum_{l=1}^S\frac{1}{\tau_l}).

Then, \alpha is updating using a greedy Gibbs sampler. Think of possible values of \alpha on a fine gird \{\alpha_1^*,\cdots,\alpha_m^*\} then draw \alpha from the discrete distribution with probability of drawing \alpha_j^* proportional to \begin{equation} P(\alpha=\alpha_j^*|\cdots)=\frac{\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha_j^*)}{\sum_{k=1}^m\prod_{l=1}^{S-1}Beta(Z_l|1,\alpha_k^*)} \tag{19.8} \end{equation} Typically, each \alpha_j^*\in(0,1), but you can create a more extensive search by increasing the range of each \alpha_j^*. In conclusion, \alpha|\cdots\sim \text{Discrete Uniform}(\alpha_1^*,\cdots,\alpha_m^*) with probability given by (19.8).

We then update \mu_l and \tau_l, since \begin{equation} \begin{split} p(\mu_l,\tau_l|\cdots)&\propto\prod_{i=1}^n[\frac{(\tau-t_i)^{\tau_l(1-\mu_l\tau^{-1})-1}}{Beta(\mu_l\tau_l\tau^{-1},\tau_l(1-\mu_l\tau^{-1}))}]^{I(\zeta_i=l)}\\ &\times IG(\tau_l|\alpha_l,\beta)Unif(\mu_l|0,T) \end{split} \tag{19.9} \end{equation} The full conditionals are not in closed form. Rather tan using the Metropolis-Hasting algorithm, you need to use Metropolis Adjusted Langevin Algorithm (MALA) or Hamiltonian Monte Carlo (HMC) algorithm, elliptical slice sampling.

Now comes to update Z_l, we have \begin{equation} \begin{split} p(Z_l|\cdots)&\propto\prod_{i=1}^n[\prod_{j\geq l}\omega_j^{I(\zeta_i=j)}](1-Z_l)^{\alpha-1}\\ &\propto\prod_{i=1}^n[\prod_{j\geq l}\{Z_j(1-Z_1)\cdots(1-Z_{j-1})\}^{I(\zeta_i=j)}](1-Z_l)^{\alpha-1} \end{split} \tag{19.10} \end{equation} From this product, collect all terms of Z_l and 1-Z_l, we have \begin{equation} p(Z_l|\cdots)\propto Z_l^{n_1}(1-Z_l)^{n_2} \tag{19.11} \end{equation} Therefore, Z_l|\cdots\sim Beta(n_1+1,n_2+1).

Finally, to update \zeta_i, we have \begin{equation} p(\zeta_i=l|\cdots)\propto K(t_i;\mu_l,\tau_l)\omega_l,\quad l=1,\cdots,S \tag{19.12} \end{equation} Therefore, \zeta_i is a discrete distribution taking values 1,\cdots,S with probability \frac{K(t_i;\mu_1,\tau_1)\omega_1}{\sum_{l=1}^SK(t_i;\mu_l,\tau_l)\omega_l},\cdots,\frac{K(t_i;\mu_S,\tau_S)\omega_S}{\sum_{l=1}^SK(t_i;\mu_l,\tau_l)\omega_l} respectively.

Marked Poisson Process In NHPP, the data is given us as t_1,\cdots,t_n, which are the points where event occurred. In marked Poisson process, we also have y_1,\cdots,y_n, which are the marks at the points where event occurred. To model data of this kind, we can still apply the idea of NHPP, the only difference is to chnage the intensity function to \lambda(s,y)=\phi(s)h(y|s), where \phi(s) is the intensity for a non-homogenous Poisson process and h(\cdot) is assumed to be a density. To do the inference, we define \begin{equation} \begin{split} \gamma&=\int\int\lambda(s,y)dsdy\\ &=\int\int \phi(s)h(y|s)dyds\\ &=\int\phi(s)(\int h(y|s)dy)ds\\ &=\int \phi(s)ds \end{split} \tag{19.13} \end{equation}

Since \gamma=\int \phi(s)ds, we have \lambda(s,y)=\gamma f(s)h(y|s). Then estimating \lambda(\cdot) is equivalent to estimate \gamma,f and h where f and h are density functions.

Log-Gaussian Cox Process

Assume \log \lambda(s)\sim GP(\mu,C_{\nu}(\cdot,\cdot)) and estimating from there. We won’t have closed form expression for anything. The likelihood is \begin{equation} L\propto \exp(\int_0^T\lambda(t)dt)\prod_{i=1}^n\lambda(t_i) \tag{19.14} \end{equation} we have to evaluate this integral. \begin{equation} \int_0^T\lambda(t)dt=\frac{1}{S}\sum_{s=1}^S\lambda(t_s^*) \tag{19.15} \end{equation} where t_1^8,\cdots,t_S^* are some knot points chosen in the domain.