4.1 Past FYE Problems
Exercise 4.1 (By Bruno, FYE 2020) A probit regression entails associating a set of covariates to a binary response, using a link function based on the cumulative distribution of the standard normal random variable, denoted by Φ(⋅). For i.i.d. samples y1,⋯,yn of a Bernoulli random variable with probability of success pi, let p_i=\Phi(\mathbf{x}^{\prime}_i\boldsymbol{\beta}). Here, \mathbf{x}_i is a k-dimensional set of covariates for the ith observation, and \boldsymbol{\beta} is a k-dimensional vector of coefficients. Thus, the likelihood for \boldsymbol{\beta} can be obtained from the distribution \begin{equation} f(y_i|\boldsymbol{\beta},\mathbf{x}_i)=\Phi(\mathbf{x}^{\prime}_i\boldsymbol{\beta})^{y_i}(1-\Phi(\mathbf{x}^{\prime}_i\boldsymbol{\beta}))^{1-y_i} \tag{4.1} \end{equation}
(i). (30%) Introduce the latent variables z_1,\cdots,z_n, z_i\stackrel{ind.}{\sim}N(\mathbf{x}^{\prime}_i\boldsymbol{\beta},1), such that \begin{equation} y_i=\left\{\begin{aligned} & 1 & \quad z_i>0\\ & 0 & \quad z_i\leq0 \end{aligned} \right. \tag{4.2} \end{equation} to augment the model. Show that the augmented model is equivalent to the original model.
(ii). (35%) Assuming that p(\boldsymbol{\beta})\propto 1, obtain and identify the full conditionals of each z_i and \boldsymbol{\beta}.
- (iii). (35%) Suppose that the distribution of z_i is changed to t_{\alpha}(\mathbf{x}^{\prime}_i\boldsymbol{\beta},1). Use the representation \begin{equation} t_{\alpha}(\mu,\sigma^2)=\int_{0}^{\infty}N(\omega|\mu,\sigma^2/\lambda)Ga(\lambda|\alpha/2,\alpha/2)d\lambda \tag{4.3} \end{equation} where Ga(\lambda|a,b) is a gamma distribution with shape parameter a and rate parameter b such that E(\lambda)=a/b, to introduce an appropriate second set of latent variables. Obtain the full conditional distributions in this case.
Proof. (i). Under the original model, we have \begin{equation} \begin{split} &f(y_i=1|\boldsymbol{\beta},\mathbf{x}_i)=\Phi(\mathbf{x}^{\prime}_i\boldsymbol{\beta})\\ &f(y_i=0|\boldsymbol{\beta},\mathbf{x}_i)=1-\Phi(\mathbf{x}^{\prime}_i\boldsymbol{\beta})\\ \end{split} \tag{4.4} \end{equation}
Now under the augmented model, \begin{equation} f(y_i=1,z_i|\boldsymbol{\beta},\mathbf{x}_i)=f(y_i=1|z_i)f(z_i|\boldsymbol{\beta},\mathbf{x}_i)=\mathbf{1}_{z_i>0}N(z_i|\boldsymbol{\beta},\mathbf{x}_i) \tag{4.5} \end{equation} Integrate out z_i, we have \begin{equation} f(y_i=1|\boldsymbol{\beta},\mathbf{x}_i)=\int_{z_i>0}N(z_i|\boldsymbol{\beta},\mathbf{x}_i)dz_i=\Phi(\mathbf{x}^{\prime}_i\boldsymbol{\beta}) \tag{4.6} \end{equation} Similarly, we can obtain f(y_i=0|\boldsymbol{\beta},\mathbf{x}_i)=1-\Phi(\mathbf{x}^{\prime}_i\boldsymbol{\beta}). Since under bpth model, we always have (4.4) for i=1,\cdots,n and the data are i.i.d., the two models are therefore equivalent.
(ii). The posterior distribution is \begin{equation} f(\boldsymbol{\beta},\mathbf{z}|data)\propto f(y|z)f(z|\mathbf{x},\boldsymbol{\beta})f(\boldsymbol{\beta})\propto\prod_{i=1}^n (\exp(-\frac{(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2})\mathbf{1}_{z_i>0})^{y_i}(\exp(-\frac{(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2})\mathbf{1}_{z_i\leq 0})^{1-y_i} \tag{4.7} \end{equation}
Thus, we get the full conditionals for z_i as \begin{equation} f(z_i|\boldsymbol{\beta},\mathbf{x}_i,y_i=1)\propto\exp(-\frac{(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2})\mathbf{1}_{z_i>0} \tag{4.8} \end{equation} which is a truncated normal distribution, with mean \mathbf{x}_i^T\boldsymbol{\beta}, variance 1 and truncated at z_i>0.
Similarly, f(z_i|\boldsymbol{\beta},\mathbf{x}_i,y_i=0) is also a truncated normal distribution, with mean \mathbf{x}_i^T\boldsymbol{\beta}, variance 1 and truncated at z_i\leq 0.
As for \boldsymbol{\beta}, \begin{equation} f(\boldsymbol{\beta}|\cdots)\propto\exp(-\frac{\sum_{i=1}^n(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2}) \tag{4.9} \end{equation} Thus, the full conditional distribution of \boldsymbol{\beta} is N((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Z},(\mathbf{X}^T\mathbf{X})^{-1}).
(iii). The model can be written hierarchically as: \begin{equation} \begin{split} &y_i=\left\{\begin{aligned} & 1 & \quad z_i>0\\ & 0 & \quad z_i\leq0 \end{aligned} \right.\\ &z_i\stackrel{ind.}{\sim} N(\mathbf{x}_i^T\boldsymbol{\beta},\frac{1}{\lambda})\\ &\lambda\sim Gamma(\frac{\alpha}{2},\frac{\alpha}{2}) \end{split} \tag{4.10} \end{equation}
By the fact from the problem, this representation is true. Thus, the posterior distribution is
\begin{equation}
f(\mathbf{z},\boldsymbol{\beta},\lambda|data)\propto \prod_{i=1}^n
(\exp(-\frac{(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2\cdot\frac{1}{\lambda}})\mathbf{1}_{z_i>0}(\frac{1}{\lambda})^{-\frac{1}{2}})^{y_i}(\exp(-\frac{(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2\cdot\frac{1}{\lambda}})\mathbf{1}_{z_i\leq 0}(\frac{1}{\lambda})^{-\frac{1}{2}})^{1-y_i}
\tag{4.11}
\end{equation}
Therefore, we have the full conditionals. Firstly, f(z_i|\cdots)\propto \exp(-\frac{(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2\cdot\frac{1}{\lambda}}\mathbf{1}_{z_i>0} if y_i=1. It is a truncated normal distribution with mean \mathbf{x}_i^T\boldsymbol{\beta}, variance \frac{1}{\lambda} and truncated at z_i>0. If Y_i=0, similarly, the full conditional of z_i is a truncated normal distribution with mean \mathbf{x}_i^T\boldsymbol{\beta}, variance \frac{1}{\lambda} and truncated at z_i\leq 0.
Secondly, for \boldsymbol{\beta}, we have \begin{equation} f(\boldsymbol{\beta}|\cdots)\propto\exp(-\frac{\sum_{i=1}^n(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2\cdot\frac{1}{\lambda}}) \tag{4.12} \end{equation} Thus, the full conditonals of \boldsymbol{\beta} is a normal distribution with mean \frac{1}{\lambda}(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Z} and variance \frac{1}{\lambda}(\mathbf{X}^T\mathbf{X})^{-1}.
Finally, for \lambda, \begin{equation} f(\lambda|\cdots)\propto\lambda^{\frac{\alpha+1}{2}-1}e^{-(\frac{\alpha}{2}+\frac{\sum_{i=1}^n(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2})\lambda} \tag{4.13} \end{equation} so the full conditional of \lambda is Gamma(\frac{\alpha+1}{2},\frac{\alpha}{2}+\frac{\sum_{i=1}^n(z_i-\mathbf{x}_i^T\boldsymbol{\beta})^2}{2}).Exercise 4.2 (By Bruno, FYE 2018) To obtain a robust linear regression model, we consider an error distribution given by a double exponential, instead of a normal. As for the Student’s t-distribution case, it is possible to use the fact that the double exponential is a scale mixture of normals to obtain an efficient Gibbs sampler for exploring the posterior distribution of the model parameters. More specifically, consider the model \begin{equation} \mathbf{Y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon};\quad\mathbf{Y},\boldsymbol{\epsilon}\in\mathbb{R}^n,\quad \mathbf{X}\in\mathbb{R}^{n\times p},\quad \boldsymbol{\beta}\in\mathbb{R}^p \tag{4.14} \end{equation} where \begin{equation} p(\boldsymbol{\epsilon}|\sigma^2)=\prod_{i=1}^n\frac{1}{2\sqrt{\sigma^2}}e^{-|\epsilon_i|/\sqrt{\sigma^2}} \tag{4.15} \end{equation} Consider a non-informative prior p(\boldsymbol{\beta},\sigma^2)\propto 1/\sigma^2.
(20%) Use the fact that \begin{equation} \frac{1}{2\sqrt{\sigma^2}}e^{-|\epsilon_i|/\sqrt{\sigma^2}}=\int_0^{\infty}(\frac{1}{\sqrt{2\pi}\sqrt{s^2}}e^{-\epsilon_i^2/(2s^2)})(\frac{1}{2\sigma^2}e^{-s^2/(2\sigma^2)})ds^2 \tag{4.16} \end{equation} to write the likelihood as a scale mixture of normals.
(35%) Introduce latent variables in order to write the model as a hierarchical normal linear model.
- (45%) Obtain and identify the posterior full conditional distributions for all the parameters of the hierarchical model.
Exercise 4.3 (By Raquel, FYE 2015) Count data with an overabundance of zeros can be modeled through the zero-inflated Poisson model (ZIP). The p.m.f. of the zero-inflated Poisson distribution, ZIP(\lambda,\pi), is given by \begin{equation} f(y|\lambda,\pi)=\pi I(y=0)+(1-\pi)f_0(y|\lambda),\quad y=0,1,2,\cdots, \tag{4.17} \end{equation} where 0\leq\pi\leq 1. I(\cdot) is the indicator function, and f_0(y|\lambda)=\lambda^ye^{-\lambda}/y!, with \lambda>0.
(10%) Let Y\sim ZIP(\lambda,\pi). Find Pr(Y=0|\lambda,\pi), and Pr(Y=y|\lambda,\pi) for y\neq 0.
(90%) Given \lambda and \pi, let Y_1,\cdots,Y_n be i.i.d. ZIP(\lambda,\pi), with priors \lambda\sim Gamma(a,b) and \pi\sim Beta(c,d). Assume this model is used to describe the number of cigarettes smoked by the respondents of a health survey. More specifically, suppose that out of 10 respondents, the number of cigarettes smoked by each respondent are: \begin{equation} y_1=0,\,y_2=5,\,y_3=0,\,y_4=0,\,y_5=10,\,y_6=0,\,y_7=0,\,y_8=0,\,y_9=3,\,y_{10}=0. \end{equation}
(a). (15%) Write down p(\lambda,\pi|y_1,\cdots,y_{10}) up to a proportionality constant.
(b). (15%) Find p(\lambda|\pi,y_1,\cdots,y_{10}) and p(\pi|y_1,\cdots,y_{10}) (up to a proportionality constant in each case). Are these distributions available in closed form?
- (c). (60%) Introduce auxiliary variables Z_i such that Y_i=0 if Z_i=1, and Y_i\sim Poisson(\lambda) if Z_i=0, with Z_i\sim Bernoulli(\pi). Derive an MCMC algorithm to sample from p(\lambda,\pi,Z_1,\cdots,Z_{10}|y_1,\cdots,y_{10}). Provide the details of the steps in your algorithm, i.e. for Gibbs steps, provide the full conditional distributions, for Metropolis-Hastings steps, provide the proposal distributions and the acceptance probabilities.
Proof. 1. \begin{equation} Pr(Y=y|\lambda,\pi)=\left\{\begin{aligned} &\pi+(1-\pi)e^{-\lambda} & \quad y=0\\ & (1-\pi)\frac{e^{-\lambda}\lambda^y}{y!} & \quad y>0 \end{aligned} \right. \tag{4.18} \end{equation}
- (a). \begin{equation} \begin{split} p(\lambda,\pi|y_1,\cdots,y_{10})&\propto [\pi+(1-\pi)e^{-\lambda}]^7[(1-\pi)e^{-\lambda}]^3\lambda^{18}\lambda^{a-1}e^{-b\lambda}\pi^{c-1}(1-pi)^{d-1}\\ &\propto [\pi+(1-\pi)e^{-\lambda}]^7(1-pi)^{3+d-1}\pi^{c-1}e^{-\lambda(3+b)}\lambda^{18+a-1} \end{split} \tag{4.19} \end{equation}
(b).
\begin{equation}
p(\lambda|\pi,y_1,\cdots,y_{10})\propto [\pi+(1-\pi)e^{-\lambda}]^7e^{-\lambda(3+b)}\lambda^{18+a-1}
\tag{4.20}
\end{equation}
and
\begin{equation}
p(\pi|y_1,\cdots,y_{10})\propto [\pi+(1-\pi)e^{-\lambda}]^7(1-pi)^{3+d-1}\pi^{c-1}
\tag{4.21}
\end{equation}
These are not available in close form.
(c).
\begin{equation}
\begin{split}
p(\lambda,\pi,Z_1,\cdots,Z_{10}|y_1,\cdots,y_{10})&\propto \prod_{i=1}^{10}\pi^{Z_i}[(1-\pi)Pr(Y_i=y_i|Z_i=0)]^{1-Z_i}\\
&\times\lambda^{a-1}e^{-b\lambda}\pi^{c-1}(1-\pi)^{d-1}\\
&\propto \prod_{i=1}^{10}\pi^{Z_i}[(1-\pi)\frac{\lambda^{y_i}}{y_i!}e^{-\lambda}]^{1-Z_i}\lambda^{a-1}e^{-b\lambda}\pi^{c-1}(1-\pi)^{d-1}
\end{split}
\tag{4.22}
\end{equation}
Then, the full conditionals are as follows:
For i=1,\cdots,10, Pr(Z_i|\lambda,\pi,y_1,\cdots,y_10). \begin{equation} Pr(Z_i=1|\lambda,\pi,y_1,\cdots,y_10)=\frac{\pi y_i!}{\pi y_i!+(1-\pi)e^{-\lambda}\lambda^{y_i}} \tag{4.23} \end{equation} and \begin{equation} Pr(Z_i=0|\lambda,\pi,y_1,\cdots,y_10)=\frac{(1-\pi)e^{-\lambda}\lambda^{y_i}}{\pi y_i!+(1-\pi)e^{-\lambda}\lambda^{y_i}} \tag{4.24} \end{equation}
(\lambda|\pi,Z_1,\cdots,Z_{10},y_1,\cdots,y_{10}) is drawn from a Gamma distribution, i.e. \begin{equation} \begin{split} p(\lambda|\pi,Z_1,\cdots,Z_{10},y_1,\cdots,y_{10})&\propto \prod_{i,Z_i=0}e^{-\lambda}\lambda^{y_i}e^{-b\lambda}\lambda^{a-1}\\ &\propto e^{-\lambda(n_0+b)}\lambda^{\sum_{i,Z_i=0}y_i+a-1} \end{split} \tag{4.25} \end{equation} which corresponds tp a Gamma((\sum_{i,Z_i=0}y_i+a),n_0+b), with n_0 the number of Z_i=0.
- (\pi|\lambda,Z_1,\cdots,Z_{10},y_1,\cdots,y_{10}) is frawn from a Beta distribution, i.e. \begin{equation} p(\pi|\lambda,Z_1,\cdots,Z_{10},y_1,\cdots,y_{10})\propto \pi^{\sum_{i=1}^{10}Z_i+c-1}(1-\pi)^{10-\sum_{i=1}^{10}Z_i+d-1} \tag{4.26} \end{equation} which correspond to a Beta(\sum_{i=1}^{10}Z_i+c,10-\sum_{i=1}^{10}Z_i+d)
Exercise 4.4 (By Raquel, FYE 2015 Retake) Consider the following model for i=1,\cdots,n: \begin{equation} \begin{split} & Y_i=\left\{\begin{aligned} & Z_i & Z_i<200 \\ & 200 & Z_i\geq 200 \end{aligned}\right.\\ & Z_i|X_i=\alpha+\beta X_i +\epsilon_i, \quad \epsilon_i\sim N(0,\sigma^2) \end{split} \tag{4.27} \end{equation} with n fixed, X_i known for all i, and the \epsilon_i, i=1,\cdots,n, independent. In addition, assume the prior: p(\alpha,\beta,\sigma^2)\propto1/\sigma^2.
(35%) Write down p(\Theta|X,Y) up to a proportionality constant, where \Theta=[\alpha,\beta,\sigma^2], X=[X_1,\cdots,X_n] and Y=[Y_1,\cdots,Y_n].
- (65%) Now augument your parameter space by considering \Theta^*=[\Theta,Z_1,\cdots,Z_n]. Develop a Gibbs sampling scheme to obtain draws from the joint posterior distribution of (\Theta^*|X,Y).
Proof. 1. The posterior density is given by \begin{equation} \begin{split} &p(\Theta|X,Y)\propto\frac{1}{\sigma^2}\prod_{i,Y_i=200}\Phi(\frac{\alpha+\beta X_i-200}{\sigma})\prod_{i,Y_i\neq 200}\frac{1}{\sigma}\exp(-\frac{(Y_i-\alpha+\beta X_i)^2}{2\sigma^2})\\ &\propto(\sigma^2)^(-n_1/2+1)\exp(\frac{-\sum_{i,Y_i\neq 200}(Y_i-\alpha+\beta X_i)^2}{2\sigma^2})\prod_{i,Y_i=200}\Phi(\frac{\alpha+\beta X_i-200}{\sigma}) \end{split} \tag{4.28} \end{equation} with n_1 represent the number of Y_i\neq 200, and \Phi(\cdot) the c.d.f. of a standard normal distribution.
- Gibbs sampling scheme:
Sampling (\alpha,\beta) from N((\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{Z},\sigma^2(\mathbf{X}^T\mathbf{X})^{-1}) with \begin{equation} \mathbf{X}=\begin{pmatrix} 1 & X_1 \\ \vdots & \vdots \\ 1 & X_n \end{pmatrix}, \quad \mathbf{Z}=\begin{pmatrix} Z_1 \\ \vdots \\ Z_n \end{pmatrix} \tag{4.29} \end{equation}
Sampling \sigma^2 from IG(n/2,\sum_{i=1}^n(Z_i-\alpha-\beta X_i)^2/2).
- For i=1,\cdots,n, sampling Z_i by \begin{equation} (Z_i|\alpha,\beta,X,Y,\sigma^2)=\left\{\begin{aligned} & Y_i & \quad Y_i\neq 200 \\ & TN(\alpha+\beta X_i,\sigma^2|200,\infty) & \quad Y_i=200 \end{aligned}\right. \tag{4.30} \end{equation}
Exercise 4.5 (By Bruno, FYE 2014) Consider the linear model \begin{equation} \mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon},\quad \boldsymbol{\epsilon}\sim N_n(0,(1/\phi)\mathbf{I}) \tag{4.31} \end{equation} where \mathbf{y}\in\mathbb{R}^n, \mathbf{X} is a full rank matrix of dimensions n\times p, \boldsymbol{\beta}\in\mathbb{R}^p, \mathbf{I} is a n\times n identity matrix and \phi>0. Consider the prior p(\boldsymbol{\beta}|\phi)=\prod_ip(\beta_i|\phi) where \begin{equation} p(\beta_i|\phi)=(1+\frac{\phi\beta_i^2}{\nu\lambda})^{-(\nu+1)/2}\frac{\phi}{\lambda} \tag{4.32} \end{equation} where \nu and \lambda are known, and assume that p(\phi)\propto 1/\phi.
(30%) Write p(\beta_i|\phi) as a scale mixture of normals.
(20%) Use the above representation to introduce latent variables that facilitate sampling the posterior distribution of all parameters using Gibbs sampling. Write the resulting model.
- (50%) Obtain the full conditionals for all model parameters.
Some information that may be useful:
Assuming the matrices and vector below are of appropriate dimensions (and also that matrix A+B is invertible), we have \begin{equation} \begin{split} (x-&a)^TA(x-a)+(x-b)^TB(x-b)\\ &=(x-c)^T(A+B)(x-c)+(a-b)^TA(A+B)^{-1}B(a-b) \end{split} \tag{4.33} \end{equation} where c=(A+B)^{-1}(Aa+Bb).
- \begin{equation} \int_{0}^{\infty}\frac{e^{-\beta/x}}{x^{\alpha+1}}\frac{\beta^{\alpha}}{\Gamma{\alpha}}dx=1 \tag{4.34} \end{equation}
Proof. 1. \begin{equation} p(\beta_i|\phi)=(1+\frac{\phi\beta_i^2}{\nu\lambda})^{-(\nu+1)/2}\frac{\phi}{\lambda}\propto\int_0^{\infty}\frac{exp\{-\frac{1}{\eta_i}(\frac{\phi\beta_i^2}{2\lambda}+\frac{\nu}{2})\}}{\eta_i^{(\nu+1)/2+1}}d\eta_i \tag{4.35} \end{equation} Thus, \begin{equation} p(\beta_i|\phi)=\int_0^{\infty}N(\beta_i|0,\lambda/(\phi\eta_i))IG(\eta_i|\nu/2,\nu/2)d\eta_i \tag{4.36} \end{equation}
\begin{equation} \begin{split} &\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon},\quad \boldsymbol{\epsilon}\sim N_n(0,(1/\phi)\mathbf{I})\\ &p(\beta_i|\phi)=N(\beta_i|0,\lambda/(\phi\eta_i)),\quad i=1,\cdots,p\\ &p(\eta_i)=IG(\eta_i|\nu/2,\nu/2),\quad i=1,\cdots,p \end{split} (\#eq:fye207201407) \end{equation} and p(\phi)\propto 1/\phi.
- \begin{equation} p(\boldsymbol{\beta}|\cdots)=N(\mathbf{A}^{-1}\mathbf{X}^T\mathbf{y},\lambda/\phi(\mathbf{X}^T\mathbf{X}+\mathbf{D}_{\eta}^{-1})^{-1}),\quad \mathbf{D}=diag(\eta_i) \tag{4.37} \end{equation} where \mathbf{A}=(\mathbf{X}^T\mathbf{X}+\mathbf{D}_{\eta}^{-1}). \begin{equation} \begin{split} &p(\eta_i|\cdots)=IG((\nu+1)/2,(\nu+\phi\beta_i^2/\lambda)/2),\quad i=1,\cdots,p\\ &p(\phi|\cdots)=Gamma((n+p)/2,||\mathbf{y}-\mathbf{X}\boldsymbol{\beta}||^2+\boldsymbol{\beta}^T\mathbf{D}^{-1}\boldsymbol{\beta}/\lambda),\quad i=1,\cdots,p \end{split} \tag{4.38} \end{equation}
Exercise 4.6 (By Bruno, FYE 2014 Retake) Consider a set of observations x_1,\cdots,x_n,x_{n+1}^*,\cdots,x_{n+k}^*. Ignore for now the asterisks, and assume that they are independently distributed according to Exp(\lambda_i), for i=1,\cdots,n+k, where \lambda_i^{-1} is the mean of the exponential distribution for the ith observation. Consider the prior \begin{equation} p(\lambda_1,\cdots,\lambda_{n+k}|\beta)=\prod_{i=1}^{n+k}p(\lambda_i|\beta)=\prod_{i=1}^{n+k}Gamma(\lambda_i|\alpha,\beta) \tag{4.39} \end{equation} where \alpha is a known constant, and p(\beta)=Gamma(\beta|\alpha,\beta), with a and b are both known. Here, \Gamma(x|A,B) denotes the density of a Gamma distribution with shape parameter A>0 and rate parameter B>0, that is, \{B^Ax^{A−1}\exp(−Bx)\}/\Gamma(A), for x>0. (Recall that the exponential density arises as the special case with A=1.)
(20%) Find the unconditional prior density p(\lambda_1,\cdots, \lambda_{n+k}). Are the \lambda_{i} independent?
(20%) Find the marginal posterior distribution of \beta given the sample.
Next, suppose that the observations with an asterisk are right censored. That is, the true value is missing, but we know that it is larger than the value recorded with an asterisk.
(15%) Use the missing value approach to obtain the likelihood that accounts for the censoring.
(15%) Find the marginal posterior distribution of \beta in this case. Is there any difference with respect to the case where the censoring is ignored?
- (30%) Introduce latent variables to obtain a Gibbs sample, in order to explore the posterior distribution of all model parameters. Write, explicitly, all the full conditionals, and identify them.
Proof. 1. \begin{equation} \begin{split} p(\lambda_1,\cdots,\lambda_{n+k})&\propto\int_0^{\infty}\prod_i\lambda_i^{\alpha-1}e^{-\lambda_i\beta}\beta^{\alpha+a-1}e^{-b\beta}d\beta\\ &=\prod_{i}\lambda_i^{\alpha-1}\int_0^{\infty}\beta^{\alpha+a-1}e^{-\beta((n+k)\bar{\lambda}+b)}d\beta\\ &=\prod_{i}\lambda_i^{\alpha-1}((n+k)\bar{\lambda}+b)^{-(\alpha+a)} \end{split} \tag{4.40} \end{equation} There is no independence, due to the \bar{\lambda} term. But there is exchangeability, as the ordering of the \lambdas does not affect the value of the density.
The posterior distributions is proportional to \begin{equation} \prod_{i}\lambda_ie^{-\lambda_ix_i}\lambda_i^{\alpha-1}e^{-\lambda_i\beta}\beta^{\alpha+a-1}e^{-b\beta} \tag{4.41} \end{equation} Integrating out all \lambda_i we have
\begin{equation} \begin{split} p(\beta|\mathbf{x},\mathbf{x}^*)&\propto\beta^{\alpha+a-1}e^{-b\beta}\prod_{i}\int_0^{\infty}\lambda_i^{\alpha+1-1}e^{-\lambda_i(x_i+\beta)}d\lambda_i\\ &=\beta^{\alpha+a-1}e^{-b\beta}\prod_{i=1}^{n+k}(x_i+\beta)^{-(\alpha+1)} \end{split} \tag{4.42} \end{equation}We can assume that the censored observations correspond to missing values \xi_j, j=1,\cdots,k, so that the full likelihood of x_i and \xi_j is given as \begin{equation} \prod_{i}\lambda_ie^{-\lambda_ix_i}\prod_j\lambda_{n+j}e^{-\lambda_{n+j}\xi_j} \tag{4.43} \end{equation} The integrated likelihood is given as \begin{equation} \begin{split} p(\lambda_1,\cdots,\lambda_{n+k})&\propto\prod_{i}\lambda_ie^{-\lambda_ix_i}\prod_j\int_{x_{n+j}^*}^{\infty}\lambda_{n+j}e^{-\lambda_{n+j}\xi_j}d\xi_j &=\prod_{i}\lambda_ie^{-\lambda_ix_i}\prod_je^{-\lambda_{n+j}x^*_j} \end{split} \tag{4.44} \end{equation}
Using the integrated likelihood in the previous question, we obtain the joint posterior for all parameters as
\begin{equation} \begin{split} p(\lambda_1,&\cdots,\lambda_{n+k},\beta,\alpha|\mathbf{x},\mathbf{x}^*)\propto\beta^{\alpha+a-1}e^{-b\beta}\prod_{i}\lambda_ie^{-\lambda_ix_i}\lambda_i^{\alpha-1}e^{-\lambda_i\beta}\\ &\times \prod_j\lambda^{\alpha-1}_{n+j}e^{-\lambda_{n+j}\beta}e^{-\lambda_{n+j}x^*_j}\\ &=\beta^{\alpha+a-1}e^{-b\beta}\prod_{i}\lambda_i^{\alpha+1-1}e^{-\lambda_i(x_i+\beta)}\prod_j\lambda^{\alpha-1}_{n+j}e^{-\lambda_{n+j}(x^*_j+\beta)} \end{split} \tag{4.45} \end{equation} Integrating \lambda_i out \forall i and j, we have \begin{equation} p(\beta|\mathbf{x},\mathbf{x}^*)\propto\beta^{\alpha+a-1}e^{-b\beta}\prod_{i=1}^n(x_i+\beta)^{-(\alpha+1)}\prod_{j=1}^k(x_j^*+\beta)^{-\alpha} \tag{4.46} \end{equation} Compared to the posterior that we obtained by ignoring the censoring, we have that the last k terms in the product have exponent −\alpha instead of −(\alpha+1).The latent variables are already introduced in part 3. The full conditionals are:
Firstly, for \lambda_i, i=1,\cdots,n+k, \begin{equation} p(\lambda_i|\cdots)\propto\lambda_i^{\alpha+1-1}e^{-\lambda_i(z_i+\beta)} \tag{4.47} \end{equation} where z_i=x_i if i\leq n, and z_{n+j}=\xi_j for j=1,\cdots,k. This corresponds to a Gamma(\lambda_i|\alpha+1,z_i+\beta).
Then, for \beta \begin{equation} p(\beta|\cdots)\propto\beta^{\alpha+a-1}e^{-\beta(b+(n+k)\bar{\lambda})} \tag{4.48} \end{equation} This corresponds to a Gamma(\beta|\alpha+a,b+(n+k)\bar{\lambda}).
- Finally, \begin{equation} p(\xi_j|\cdots)\propto\lambda_{n+j}e^{-\lambda_{n+j}\xi_j} \tag{4.49} \end{equation} for \xi_j\geq x_{n+j}^*, j=1,\cdots,k. This corresponding to a truncated exponential with parameter \lambda_{n+j} and truncation x_j^*.
where \mathbf{y}\in\mathbb{R}^n, \mathbf{X} is a full rank matrix of dimensions n\times p, \boldsymbol{\beta}\in\mathbb{R}^p, \mathbf{I} is a n\times n identity matrix and \phi>0. Consider the prior p(\boldsymbol{\beta}|\boldsymbol{\gamma},\phi)=\prod_ip(\beta_i|\gamma_i,\phi) where \begin{equation} p(\beta_i|\gamma_i,\phi)=(1-\gamma_i)N(\beta_i|0,1/\phi)+\gamma_iN(\beta_i|0,c_i/\phi) \tag{4.51} \end{equation} Here c_i>0 is a known constant, and \gamma_i is a binary variable taking value 1 with probability \omega_i and 0 with probability 1-\omega_i. Thus, \begin{equation} p(\boldsymbol{\gamma})=\prod_{i=1}^p\omega_i^{\gamma_i}(1-\omega_i)^{1-\gamma_i},\quad \gamma_i=0,1,\quad \omega_i\in(0,1) \tag{4.52} \end{equation} To complete the model, we assume that p(\phi)=Gamma(\phi|a_{\phi},b_{\phi}), and \begin{equation} p(\boldsymbol{\omega})=\prod_{i=1}^pBeta(\omega_i|a_{\omega},b_{\omega}) \tag{4.53} \end{equation} Write the full conditional distributions needed to explore the posterior distribution of the parameters in the model, using a Gibbs sampler. Hint: \begin{equation} \begin{split} (x-a)^TA(x-a)+&(x-b)^TB(x-b)\\ &=(x-c)^T(A+B)(x-c)+(a-b)^TA(A+B)^{-1}B(a-b) \end{split} \tag{4.54} \end{equation} where c=(A+B)^{-1}(Aa+Bb).
Exercise 4.8 (By Bruno, FYE 2012) Consider the linear model
\begin{equation}
\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\boldsymbol{\epsilon},\quad \boldsymbol{\epsilon}\sim N_n(0,(1/\phi)\mathbf{V}_{\psi})
\tag{4.55}
\end{equation}
where \mathbf{y}\in\mathbb{R}^n, \mathbf{X} is a full rank matrix of dimensions n\times p, \boldsymbol{\beta}\in\mathbb{R}^p, and \mathbf{V}_{\psi} is a p\times p matrix that is a known function \psi.
(35%) Assume that \psi is known. Find the sufficient statistics for \boldsymbol{\beta} and \phi.
(45%) Consider the prior \begin{equation} p(\boldsymbol{\beta},\phi|\psi)\propto\frac{1}{\phi}N_n(0,g/\phi(\mathbf{X}^T\mathbf{V}_{\psi}^{-1}\mathbf{X})^{-1}) \tag{4.56} \end{equation}
Write the posterior distribution of \boldsymbol{\beta} and \phi as \begin{equation} p(\boldsymbol{\beta},\phi|\mathbf{y},\mathbf{X},\psi)=p(\boldsymbol{\beta}|\phi,\mathbf{y},\mathbf{X},\psi)p(\phi|\mathbf{y},\mathbf{X},\psi) \tag{4.57} \end{equation} and specific the densities for each one of the two factors in this expression as a function of the sufficient statistics.- (20%) Suppose \psi is unknown. Obtain the posterior p(\psi|\mathbf{y},\mathbf{X}). Hint: \begin{equation} \begin{split} (x-a)^TA(x-a)+&(x-b)^TB(x-b)\\ &=(x-c)^T(A+B)(x-c)+(a-b)^TA(A+B)^{-1}B(a-b) \end{split} \tag{4.58} \end{equation}
Exercise 4.9 (By Bruno, FYE 2012 Retake) Ten water samples are taken at a given site in the Monterey Bay and temperature y_1 and salinity y_2 are recorded. As a result we have the sample \mathbf{y}_i=(y_{1,i},y_{2,i}), for i=1,\cdots,10. Assume that all \mathbf{y}_i are independent and that their distribution corresponds to a bivariate normal with mean \boldsymbol{\mu}\in\mathbb{R}^2 and covariance matrix V\in\mathbb{R}^{2\times 2}.
(35%) Write the likelihood function for \boldsymbol{\mu} and V and find the sufficient statistics.
(65%) Consider the prior p(\boldsymbol{\mu},V)\propto p(V), where p(V) is an inverse Wishart distribution with parameters \nu and S_0. Thus, the prior is proportional to \begin{equation} |V|^{-(\nu+k+1)/2}\exp\{-\frac{1}{2}S_0V^{-1}\} \tag{4.59} \end{equation} where k is the dimension of V, in this case k=2.
Some information that may be useful:
Recall that the trace of a product of matrices is invariant under cyclic permutations; for instance, for three matrices A\in\mathbb{R}^{n\times m},B\in\mathbb{R}^{m\times k} and C\in\mathbb{R}^{k\times n}, we have tr(ABC)=tr(BCA)=tr(CAB).
- Consider a bivariate normal vector \begin{equation} \begin{pmatrix} x\\ y \end{pmatrix}\sim N_2(\begin{pmatrix} \mu_x\\ \mu_y \end{pmatrix},\begin{pmatrix} v_x & v_{xy}\\ v_{yx} & v_y \end{pmatrix}) \tag{4.60} \end{equation} then \begin{equation} p(x|y)=N_1(\mu_x+\frac{v_{xy}}{v_y}(y-\mu_y),v_x-\frac{v_{xy}^2}{v_y}) \tag{4.61} \end{equation}
Exercise 4.10 (By Draper, FYE 2010) Consider the following random-effects hierarchical model, which is useful in meta-analysis and other applications: \begin{equation} \begin{split} &(\mu,\sigma^2)\sim p(\mu,\sigma^2)\\ &(\gamma_i|\mu,\sigma^2)\stackrel{i.i.d.}{\sim} N(\mu,\sigma^2)\\ &(y_i|\gamma_i)\stackrel{ind.}{\sim} N(\gamma_i,\tau_i^2) \end{split} \tag{4.62} \end{equation} for i=1,\cdots,I, in which the \tau_i^2 are assumed known.
(40%) With \theta=(\mu,\sigma^2) and an appropriate choice for latent data z, specify the two distributions p(\theta|z,y) and p(z|\theta,y) needed to carry out an EM algorithm to find th posterior mode of \theta given y=(y_1,\cdots,y_I), making an appropriate conditionally conjugate choice for the prior distribution on \theta, and use this to specify the E and M steps of your EM algorithm.
- (60%) Specify a Gibbs sampler to make random draws from the augumented posterior distribution p(\mu,\sigma^2,z|y), by providing details on the full-conditional distributions p(\mu|\sigma^2,z,y),p(\sigma^2|\mu,z,y) and p(z|\mu,\sigma^2,y).
Exercise 4.11 (By Thanos, FYE 2010 (Actually a GLM Problem)) Consider a Bernoulli regression model for a vector of covariates \mathbf{x} and a binary response y, with observed values \mathbf{x}_i and y_i, i=1,\cdots,n, respectively. Specifically, \begin{equation} y_i|\boldsymbol{\beta}\stackrel{ind.}{\sim} \{\Phi(\mathbf{x}_i^T\boldsymbol{\beta})\}^{y_i}\{1-\Phi(\mathbf{x}_i^T\boldsymbol{\beta})\}^{1-y_i},\quad i=1,\cdots,n \tag{4.63} \end{equation} where \Phi(\cdot) denotes the standard normal c.d.f., and \boldsymbol{\beta} is the vector of regression coefficients, which is assigned a flat prior. Assume that the design matrix of covariates \mathbf{X} is of full rank.
Moreover, assume that the binary response y arises by discretizing an underlying (real-valued) continuous response z. Specifically, we observe y_i=1 if z_i>0 or y_i=0 if z_i\leq 0, where z_i is the latent (unobserved) continuous response corresponding to observed response y_i, for i=1,\cdots,n. Consider the following augmented regression model \begin{equation} \begin{split} &y_i|z_i\stackrel{ind.}{\sim} \{1(z_i>0)1(y_i=1)+1(z_i\leq 0)1(y_i=0)\},\quad i=1,\cdots,n\\ &z_i|\boldsymbol{\beta}\stackrel{ind.}{\sim} N(\mathbf{x}_i^T\boldsymbol{\beta},1),\quad i=1,\cdots,n \end{split} \tag{4.64} \end{equation} where 1(\upsilon\in A) denotes the indicator function with value 1 when \upsilon\in A. Hence, note that the first stage of model (4.64) denotes the (degenerate) distribution for y_i conditional on the sign of z_i as described above.
Exercise 4.12 (By Raquel, FYE 2009) 1. (40%) Consider the following model \begin{equation} \begin{split} &y_{i,j}|\theta_i\stackrel{ind.}{\sim} N(\theta_i,\upsilon),\quad i=1,\cdots,I, j=1,\cdots,n_i\\ &\theta_i\stackrel{i.i.d.}{\sim} N(\mu,\sigma^2),\quad i=1,\cdots,I \end{split} \tag{4.65} \end{equation} where \upsilon,\mu and \sigma^2 are assumed known.
Write the model as a linear regression model and find p(\theta_1,\cdots,\theta_I|\mathbf{y}), where \mathbf{y}=\{y_{i,j},i=1,\cdots,I, j=1,\cdots,n_i\}. What is the mode of this distribution?
- (60%) Consider again the model from part 1, but now assume that \mu is unknown, with \begin{equation} \begin{split} &\mu|\phi\sim N(m,\phi)\\ &\phi\sim Inv-\chi^2(\nu) \end{split} \tag{4.66} \end{equation} where m and \nu are fixed (and \upsilon and \sigma^2 are still assumed known).
Proof. 1. Linear model representation: \begin{equation} \begin{split} &\mathbf{y}=\mathbf{X}\boldsymbol{\theta}+\boldsymbol{\epsilon},\quad \boldsymbol{\epsilon}\sim N(\mathbf{0},\Sigma_y)\\ &\boldsymbol{\theta}=\mathbf{1}\mu+\boldsymbol{\xi},\quad \boldsymbol{\xi}\sim N(\mathbf{0},\Sigma_{\theta}) \end{split} \tag{4.67} \end{equation} with \Sigma=\upsilon\mathbf{I}_N, \Sigma_{\theta}=\sigma^2\mathbf{I}_I and N=\sum_in_i.
The posterior distribution is multivariate normal. The mode could be found by maxi- mizing the log-posterior, by writing the model above as a linear regression model with a non-informative prior and using some results from class, or by using properties of the multivariate normal distribution. The first and second options appear below.
(a). The log-posterior is given by \begin{equation} \log(p(\theta_1,\cdots,\theta_I|\mathbf{y}))=K-\frac{\sum_{i=1}^I\sum_{j=1}^{n_i}(y_{i,j}-\theta_i)^2}{2\upsilon}-\frac{\sum_{i=1}^I(\theta_i-\mu)^2}{2\sigma^2} \tag{4.67} \end{equation} Now, \begin{equation} \frac{\partial\log(p(\boldsymbol{\theta}|\mathbf{y}))}{\partial\theta_i}=\frac{\sum_{j=1}^{n_i}(y_{i,j}-\theta_i)}{\upsilon}-\frac{(\theta_i-\mu}{\sigma^2} \tag{4.68} \end{equation} and so, \begin{equation} \hat{\theta}_i=\frac{\mu/\sigma^2+\sum_{j=1}^{n_i}y_{i,j}/\upsilon}{1/\sigma^2+n_i/\upsilon} \tag{4.69} \end{equation}
(b). Write the model as \mathbf{y}_*=\mathbf{X}_*\boldsymbol{\theta}+\boldsymbol{\epsilon}_*, with \boldsymbol{\epsilon}_*\sim N(\mathbf{0},\Sigma_*), p(\boldsymbol{\theta})\propto 1 and
\begin{equation}
\mathbf{y}_*=\begin{pmatrix} \mathbf{y} \\ \mu \\ \vdots \\ \mu \end{pmatrix}, \quad \mathbf{X}_*=\begin{pmatrix} \mathbf{X} \\ \mathbf{I}_I \end{pmatrix},\quad \Sigma_*=blcokdiag(\Sigma_y,\Sigma_{\theta})
\tag{4.70}
\end{equation}
Then, the posterior distribution is p(\boldsymbol{\theta}|\mathbf{y})=N(\hat{\boldsymbol{\theta}},\mathbf{V}_{\theta}), with
\begin{equation}
\hat{\boldsymbol{\theta}}=(\mathbf{X}_*^T\Sigma_*^{-1}\mathbf{X}_*)^{-1}\mathbf{X}_*^T\Sigma_*^{-1}\mathbf{y}_*
\tag{4.71}
\end{equation}
and \mathbf{V}_{\theta}=(\mathbf{X}_*^T\Sigma_*^{-1}\mathbf{X}_*)^{-1}, and so, the posterior model is \hat{\boldsymbol{\theta}}.
- Gibbse sampling.
(a). p(\theta_i|\boldsymbol{\theta}_{(-i)},\mu,\phi,\mathbf{y})=N(\theta_i|\hat{\theta}_i,V_{\theta_i}), with \begin{equation} \begin{split} &V_{\theta_i}=(\frac{1}{\sigma^2}+\frac{n_i}{\nu})^{-1}\\ &\hat{\theta}_i=V_{\theta_i}(\frac{\mu}{\sigma^2}+\frac{\sum_{j=1}^{n_i}y_{i,j}}{\upsilon}) \end{split} \tag{4.72} \end{equation}
(b). p(\mu|\boldsymbol{\theta},\phi,\mathbf{y})=N(\mu|m_{\mu},V_{\mu}) with \begin{equation} \begin{split} &V_{\mu}=(\frac{1}{\sigma^2}+\frac{1}{\phi})^{-1}\\ &m_{\mu}=V_{\mu}(\frac{\sum_{i=1}^I\theta_i}{\sigma^2}+\frac{m}{\phi}) \end{split} \tag{4.73} \end{equation}
- (c). p(\phi|\mu,\boldsymbol{\theta},\mathbf{y})=Inv-\chi^2(\phi|\nu+1,(\mu-m)^2+1).
Exercise 4.13 (By Thanos, FYE 2009 (Actually a GLM Problem)) 1. (20%) Consider the Poisson distribution with mean \mu, denoted by Poisson(\cdot;\mu). Write the probability mass function of the distribution in the exponential dispersion family form, and provide the expressions for the natural parameter and the variance function.
(40%) Let y_i be realizations of independent random variables Y_i with Poisson(\cdot;\mu) distributions, for i=1,\cdots,n (where E(Y_i)=\mu_i). Consider the Poisson generalized linear model (glm) based on link function \log(\mu_i)=\mathbf{x}_i^T\boldsymbol{\beta} where \mathbf{x}_i is the covariate vector for reponse y_i and \boldsymbol{\beta} is the vector of regression coefficients. Obtain the deviance for the comparison of the Poisson glm above with the full model that includes a different \mu_i for each y_i.
- (40%) Consider a Bayesian formulation for the special case of the Poisson glm from part 2 based on a single covariate with values x_i. That is, \begin{equation} y_i|\beta_1,\beta_2\stackrel{ind.}{\sim}Poisson(y_i;\mu_i=\exp(\beta_1+\beta_2x_i)),\quad i=1,\cdots,n \tag{4.74} \end{equation} with N(a_j,\beta_j^2) priors for \beta_j, j=1,2. Derive a Metropolis-Hasting algorithm to sample from p(\beta_1,\beta_2|data), the posterior distribution of (\beta_1,\beta_2), where data is \{(y_i,x_i):i=1,\cdots,n\}.
Proof. 1. For the Poisson p.m.f. with mean \mu, we have \begin{equation} f(y;\mu)=\mu^y\exp(-\mu)(y!)^{-1}=\exp\{y\log(\mu)-\mu-\log(y!)\},\quad y\in\{0,1,2,\cdots\} \tag{4.75} \end{equation} which is of the EDF form with natural parameter \theta=\log(\mu), b(\theta)=\exp(\theta), and dispersion parameter \phi=1. The variance function is given by b^{\prime\prime}(\theta)=\exp(\theta)=\mu.
- Under the full model, the ML estimates are \tilde{\mu}_i=y_i and thus, \tilde{\theta}_i=\log(y_i). Under the Poisson glm with the logarithmic link function, we have \hat{\theta}_i=\log(\hat{\mu}_i)=\mathbf{x}_i^T\hat{\boldsymbol{\beta}}, where \hat{\boldsymbol{\beta}} is the vector of ML estimates for \boldsymbol{\beta}.
The deviance is given, in general, by D=-2[\ell(\hat{\boldsymbol{\theta}};data)-\ell(\tilde{\boldsymbol{\theta}};data)], where \hat{\boldsymbol{\theta}}=\{\hat{\theta}_i:i=1,\cdots,n\}, \tilde{\boldsymbol{\theta}}=\{\tilde{\theta}_i:i=1,\cdots,n\}, and \ell(\boldsymbol{\theta};data) is the log-likelihood function (expressed as a function of the natural parameters). Using the Poisson likelihood form and the expressions for \tilde{\boldsymbol{\theta}} and \hat{\boldsymbol{\theta}} above, we obtain \begin{equation} \begin{split} D&=2\sum_{i=1}^n[y_i\log(\frac{y_i}{\hat{\mu}_i})-(y_i-\hat{\mu}_i)]\\ &=2\sum_{i=1}^n[(y_i\log(y_i)-y_i)+(\exp(\mathbf{x}_i^T\hat{\boldsymbol{\beta}})-y_i(\mathbf{x}_i^T\hat{\boldsymbol{\beta}}))] \end{split} \tag{4.76} \end{equation}
- We have \begin{equation} \begin{split} p(\beta_1,\beta_2)&\propto \exp[-\frac{(\beta_1-a_1)^2}{2b_1^2}-\frac{(\beta_2-a_2)^2}{2b_2^2}\\ &+\sum_{i=1}^n(y_i(\beta_1+\beta_2x_i)-\exp(\beta_1+\beta_2x_i))]\\ &=h(\beta_1,\beta_2) \end{split} \tag{4.77} \end{equation}
A random walk Metropolis-Hastings algorithm, with a normal proposal distribution, would proceed as follows:
Let (\beta_1^{(t)},\beta_2^{(t)}) be the current state of the chain.
Draw (\beta_1^*,\beta_2^*) from a normal distribution with mean vector (\beta_1^{(t)},\beta_2^{(t)}) and covariance matrix D, which can be specified through the inverse of the expected Fisher information matrix estimated at the MLEs for (\beta_1,\beta_2).
Set (\beta_1^{(t+1)},\beta_2^{(t+1)})=(\beta_1^*,\beta_2^*) with probability \min\{1,h(\beta_1^*,\beta_2^*)/h(\beta_1^{(t)},\beta_2^{(t)})\}, and (\beta_1^{(t+1)},\beta_2^{(t+1)})=(\beta_1^{(t)},\beta_2^{(t)}) otherwise.
The full Bayesian model that includes the distritbuion component for (y_0,x_0) is given by \begin{equation} p(y_0,\beta_1,\beta_2|x_0,data)=Poisson(y_0;\mu_0=\exp(\beta_1+\beta_2x_0))p(\beta_1,\beta_2|data) \tag{4.78} \end{equation} and, therefore, the posterior predictive distribution of new response y_0, associated with specified covariate value x_0, is given by \begin{equation} p(y_0|x_0,data)=\int\int Poisson(y_0;\mu_0=\exp(\beta_1+\beta_2x_0))p(\beta_1,\beta_2|data)d\beta_1d\beta_2 \tag{4.79} \end{equation}