9.2 Logit model
We can use the framework of Section 9.1 to perform inference in models with longitudinal/panel data of binary response variables. In particular, let yit∼B(πit), where logit(πit)=log(πit1−πit)≡y∗it, such that y_{it}^* \sim N(\boldsymbol{x}_{it}^{\top} \boldsymbol{\beta} + \boldsymbol{w}_{it}^{\top} \boldsymbol{b}_i, \sigma^2). Thus, we can augment the model with the latent variable y_{it}^* and perform inference using a Metropolis-within-Gibbs sampling algorithm based on the posterior conditional distributions from the previous section.
We can implement a Gibbs sampling algorithm to sample draws from the posterior conditional distributions of \boldsymbol{\beta}, \sigma^2, \boldsymbol{b}_i, and \boldsymbol{D} using the equations in Section 9.1 conditional on \boldsymbol{y}_i^*. Then, we can use a random walk Metropolis-Hastings algorithm to sample y_{it}^*, where the proposal distribution is Gaussian with mean y_{it}^* and variance v^2, that is, y_{it}^{*c} = y_{it}^* + \epsilon_{it}, where \epsilon_{it} \sim \mathcal{N}(0, v^2), and v is a tuning parameter to achieve good acceptance rates.
Finally, for making predictions, we should take into account that \mathbb{E}[\pi_{it}] = \frac{1}{1 + \exp\left\{(\boldsymbol{x}_{it}^{\top} \boldsymbol{\beta} + \boldsymbol{w}_{it}^{\top} \boldsymbol{b}_i)/\sqrt{1 + \left(\frac{16\sqrt{3}}{15\pi}\right)^2 \sigma^2}\right\}} (Diggle et al. 2002).
The posterior distribution of this model is \begin{align*} \pi(\boldsymbol{\beta},\sigma^2, \boldsymbol{b}_i, \boldsymbol{D}, \boldsymbol{y}^*\mid \boldsymbol{y}, \boldsymbol{X}, \boldsymbol{W})&\propto \prod_{i=1}^N \prod_{t=1}^{T_i}\left\{\pi_{it}^{y_{it}}(1-\pi_{it})^{1-y_{it}}\right.\\ &\left.\times (\sigma^2)^{-1}\exp\left\{-\frac{1}{2\sigma^2}(y_{it}^{*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)^{\top}(y_{it}^{*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)\right\}\right\}\\ &\times \exp\left\{-\frac{1}{2}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)^{\top}\boldsymbol{B}_0^{-1}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)\right\}\\ &\times \exp\left\{-\frac{1}{2}\sum_{i=1}^N \boldsymbol{b}_i^{\top}\boldsymbol{D}^{-1}\boldsymbol{b}_i\right\}\\ &\times (\sigma^2)^{-\alpha_0-1}\exp\left\{-\frac{\delta_0}{\sigma^2}\right\}\\ &\times |\boldsymbol{D}|^{-(d_0+K_2+1)/2}\exp\left\{-\frac{1}{2}tr(d_0\boldsymbol{D}_0\boldsymbol{D}^{-1})\right\}. \end{align*}
We can get samples of y_{it}^* from a normal distribution with mean equal to \boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}+\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i and variance \sigma^2, and use these samples to get \pi_{it}=\frac{1}{1+e^{-y_{it}^*}}, y_{it}^{*c}=y_{it}^{*}+\epsilon_{it} and \pi_{it}^c=\frac{1}{1+e^{-y_{it}^{*c}}}, and calculate the acceptance rate of the Metropolis-Hastings algorithm, \begin{align*} \alpha=\min\left(1,\frac{ \pi_{it}^{cy_{it}}(1-\pi_{it}^c)^{(1-y_{it})}\times\exp\left\{-\frac{1}{2\sigma^2}(y_{it}^{c*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)^{\top}(y_{it}^{c*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)\right\}}{\pi_{it}^{y_{it}}(1-\pi_{it})^{(1-y_{it})}\times\exp\left\{-\frac{1}{2\sigma^2}(y_{it}^{*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)^{\top}(y_{it}^{*}-\boldsymbol{x}_{it}^{\top}\boldsymbol{\beta}-\boldsymbol{w}_{it}^{\top}\boldsymbol{b}_i)\right\}}\right). \end{align*}
Example: Doctor visits in Germany
We used the dataset 9VisitDoc.csv provided by Winkelmann (2004)52. We analyze the determinants of a binary variable (DocVis), which equals 1 if an individual visited a physician in the last three months and 0 otherwise. The dataset contains 32,837 observations of 9,197 individuals in an unbalanced longitudinal/panel dataset over the years 1995–1999 from the German Socioeconomic Panel Data.
The specification is given by \begin{align*} \text{logit}(\pi_{it}) &= \beta_1 + \beta_2 \text{Age} + \beta_3 \text{Male} + \beta_4 \text{Sport} + \beta_5 \text{LogInc} \\ &\quad + \beta_6 \text{GoodHealth} + \beta_7 \text{BadHealth} + b_i + b_{i1} \text{Sozh}, \end{align*} where \pi_{it} = p(\text{DocVis}_{it} = 1).
This specification controls for age, a gender indicator (with 1 representing male), whether the individual practices any sport (with 1 for sport), the logarithm of monthly gross income, and self-perception of health status, where “good” and “bad” are compared to a baseline of “regular”. Additionally, we assume that unobserved heterogeneity is linked to whether the individual receives welfare payments (with Sozh equal to 1 for receiving welfare).
We set 10,000 MCMC iterations, plus 1,000 burn-in, and a thinning parameter equal to 10. In addition, \boldsymbol{\beta}_0 = \boldsymbol{0}_7, \boldsymbol{B}_0 = \boldsymbol{I}_7, \alpha_0 = \delta_0 = 0.001, d_0 = 5, and \boldsymbol{D}_0 = \boldsymbol{I}_2.
The following Algorithm shows how to perform inference of the hierarchical longitudinal logit model using our GUI.
Algorithm: Hierarchical Longitudinal Logit Model
Select Hierarchical Longitudinal Model on the top panel
Select Logit model using the left radio button
Upload the dataset selecting first if there is a header in the file and the kind of separator in the csv file of the dataset (comma, semicolon, or tab). Then, use the Browse button under the Choose File legend
Select MCMC iterations, burn-in, and thinning parameters using the Range sliders
Write down the formula of the fixed effects equation in the Main Equation: Fixed Effects box. This formula must be written using the syntax of the formula command of R software. This equation includes the intercept by default, so do not include it in the equation
Write down the formula of the random effects equation in the Main Equation: Random Effects box without writing the dependent variable, that is, starting the equation with the tilde (
~
) symbol. This formula must be written using the syntax of the formula command of R software. This equation includes the intercept by default, so do not include it in the equation. If there are just random intercepts, do not write anything in this boxWrite down the name of the grouping variable, that is, the variable that indicates the cross-sectional units
Set the hyperparameters of the fixed effects: mean vector, covariance matrix, shape, and scale parameters. This step is not necessary as by default our GUI uses non-informative priors
Set the hyperparameters of the random effects: degrees of freedom and scale matrix of the inverse Wishart distribution. This step is not necessary as by default our GUI uses non-informative priors
Click the Go! button
Analyze results
Download posterior chains and diagnostic plots using the Download Posterior Chains and Download Posterior Graphs buttons
We show in the following code how to perform inference of this example using the command MCMChlogit from the MCMCpack package. We fixed the variance for over-dispersion (\sigma^2) setting FixOD = 1 in this example. Our GUI does not fix this value, that is, it sets FixOD = 0, which is the default value in the command MCMChlogit. We ask to replicate this example using our GUI in Exercise 4. The command MCMChlogit uses an adaptive algorithm to tune v based on an optimal acceptance rate equal to 0.44.
rm(list = ls())
set.seed(12345)
Data <- read.csv("https://raw.githubusercontent.com/besmarter/BSTApp/refs/heads/master/DataApp/9VisitDoc.csv", sep = ",", header = TRUE, quote = "")
attach(Data)
## The following objects are masked from DataGSP:
##
## id, unemp
## The following object is masked from DataIntRate (pos = 7):
##
## Year
## The following object is masked from DataIntRate (pos = 9):
##
## Year
## The following objects are masked from Data (pos = 10):
##
## Age, id
## The following object is masked from DataUtEst (pos = 11):
##
## id
## The following object is masked from Data (pos = 14):
##
## Age
## The following object is masked from Data (pos = 15):
##
## Age
## The following objects are masked from Data (pos = 16):
##
## Age, id
## The following objects are masked from mydata:
##
## Age, id
## The following object is masked from Data (pos = 18):
##
## Age
## The following object is masked from DataUtEst (pos = 22):
##
## id
K1 <- 7; K2 <- 2; N <- 9197
b0 <- rep(0, K1); B0 <- diag(K1)
r0 <- 5; R0 <- diag(K2)
a0 <- 0.001; d0 <- 0.001
RegLogit <- glm(DocVis ~ Age + Male + Sport + LogInc + GoodHealth + BadHealth, family = binomial(link = "logit"))
SumLogit <- summary(RegLogit)
Beta0 <- SumLogit[["coefficients"]][,1]
mcmc <- 10000; burnin <- 1000; thin <- 10
# MCMChlogit
Resultshlogit <- MCMCpack::MCMChlogit(fixed = DocVis ~ Age + Male + Sport + LogInc + GoodHealth + BadHealth, random = ~Sozh, group="id", data = Data, burnin = burnin, mcmc = mcmc, thin = thin, mubeta = b0, Vbeta = B0, r = r0, R = R0, nu = a0, delta = d0, beta.start = Beta0, FixOD = 1)
##
## Running the Gibbs sampler. It may be long, keep cool :)
##
## **********:10.0%, mean accept. rate=0.445
## **********:20.0%, mean accept. rate=0.445
## **********:30.0%, mean accept. rate=0.446
## **********:40.0%, mean accept. rate=0.446
## **********:50.0%, mean accept. rate=0.445
## **********:60.0%, mean accept. rate=0.446
## **********:70.0%, mean accept. rate=0.446
## **********:80.0%, mean accept. rate=0.445
## **********:90.0%, mean accept. rate=0.446
## **********:100.0%, mean accept. rate=0.445
Betas <- Resultshlogit[["mcmc"]][,1:K1]
Sigma2RanEff <- Resultshlogit[["mcmc"]][,c(K2*N+K1+1, 2*N+K1+K2^2)]
summary(Betas)
##
## Iterations = 1001:10991
## Thinning interval = 10
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## beta.(Intercept) -0.410007 0.351422 1.111e-02 0.018150
## beta.Age 0.009393 0.002241 7.088e-05 0.000122
## beta.Male -1.098865 0.048329 1.528e-03 0.002393
## beta.Sport 0.315638 0.046056 1.456e-03 0.002079
## beta.LogInc 0.268030 0.048295 1.527e-03 0.002604
## beta.GoodHealth -1.071630 0.047380 1.498e-03 0.002606
## beta.BadHealth 1.375003 0.079114 2.502e-03 0.005421
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## beta.(Intercept) -1.108562 -0.642896 -0.416944 -0.16633 0.28044
## beta.Age 0.005106 0.007813 0.009507 0.01091 0.01374
## beta.Male -1.191406 -1.132530 -1.098185 -1.06565 -1.00809
## beta.Sport 0.225693 0.284683 0.315987 0.34871 0.40168
## beta.LogInc 0.178283 0.235760 0.266105 0.29965 0.36735
## beta.GoodHealth -1.164836 -1.104623 -1.070151 -1.04015 -0.98393
## beta.BadHealth 1.223310 1.324250 1.371644 1.42690 1.53306
##
## Iterations = 1001:10991
## Thinning interval = 10
## Number of chains = 1
## Sample size per chain = 1000
##
## 1. Empirical mean and standard deviation for each variable,
## plus standard error of the mean:
##
## Mean SD Naive SE Time-series SE
## VCV.(Intercept).(Intercept) 2.2409 0.09263 0.002929 0.009415
## VCV.Sozh.Sozh 0.7015 0.26915 0.008511 0.095633
##
## 2. Quantiles for each variable:
##
## 2.5% 25% 50% 75% 97.5%
## VCV.(Intercept).(Intercept) 2.0749 2.1709 2.238 2.303 2.422
## VCV.Sozh.Sozh 0.3536 0.4875 0.626 0.906 1.271
The results suggest that age, sports, income and a bad perception of health status increase the probability of visiting the physician, the posterior estimates have 95% symmetric credible intervals equal to (5.1e-03, 1.3e-02), (0.23, 0.40), (0.18, 0.37) and (1.22, 1.53), whereas men have a lower probability of visiting a physician, the 95% credible interval is (-1.19, -1.01), and individuals who have a good perception of their health status also have a lower probability of visiting the doctor, the 95% credible interval is (-1.16, -0.98). The 95% credible interval of the variances of the unobserved heterogeneity associated with the welfare program is (0.35, 1.27).
References
See http://qed.econ.queensu.ca/jae/2004-v19.4/winkelmann/ for details↩︎