10.6 Exercises

The Gaussian linear model specifies $\mathbf{y} = \alpha\boldsymbol{i}_N + \boldsymbol{X}_m\boldsymbol{\beta}_m + \boldsymbol{\mu}_m$ such that $\boldsymbol{\mu}_m \sim N(\boldsymbol{0}, \sigma^2\boldsymbol{I}_n)$ , and $\boldsymbol{X}_m$ does not have the column of ones. Assuming that $\pi(\sigma^2) \propto 1/{\sigma^2}$ , $\pi(\alpha) \propto 1$ , and $\boldsymbol{\beta}_m | \sigma^2 \sim N(\boldsymbol{0}_{k_m}, \sigma^2 (g_m\boldsymbol{X}_m^{\top}\boldsymbol{X}_m)^{-1})$ :
- Show that the posterior conditional distribution of $\boldsymbol{\beta}_m$ is $N(\boldsymbol{\beta}_{mn}, \sigma^2\boldsymbol{B}_{mn})$ , where $\boldsymbol{\beta}_{mn} = \boldsymbol{B}_{mn}\boldsymbol{X}_m^{\top}\mathbf{y}$ and $\boldsymbol{B}_{mn} = ((1+g_m)\boldsymbol{X}_m^{\top}\boldsymbol{X}_m)^{-1}$ .
- Show that the marginal likelihood associated with model $\mathcal{M}_m$ is proportional to
  
  $p(\mathbf{y} | \mathcal{M}_m) \propto \left(\frac{g_m}{1+g_m}\right)^{k_m/2} \left[(\mathbf{y} - \bar{y}\boldsymbol{i}_N)^{\top}(\mathbf{y} - \bar{y}\boldsymbol{i}_N) - \frac{1}{1+g_m}(\mathbf{y}^{\top}\boldsymbol{P}_{X_m}\mathbf{y})\right]^{-(N-1)/2},$ where all parameters are indexed to model $\mathcal{M}_m$ , $\boldsymbol{P}_{X_m} = \boldsymbol{X}_m(\boldsymbol{X}_m^{\top}\boldsymbol{X}_m)^{-1}\boldsymbol{X}_m$ is the projection matrix on the space generated by the columns of $\boldsymbol{X}_m$ , and $\bar{y}$ is the sample mean of $\mathbf{y}$ .
  
  Hint: Take into account that $\boldsymbol{i}_N^{\top}\boldsymbol{X}_m = \boldsymbol{0}_{k_m}$ due to all columns being centered with respect to their means.
Determinants of export diversification I
Jetter and Ramírez Hassan (2015) use BMA to study the determinants of export diversification. Use the dataset 10ExportDiversificationHHI.csv to perform BMA using the BIC approximation and MC3 to check if these two approaches agree.
Simulation exercise of the Markov Chain Monte Carlo model composition
Program an algorithm to perform MC3 where the final $S$ models are unique. Use the simulation setting of Section 10.2 increasing the number of regressors to 40, which implies approximately $1.1 \times 10^{12}$ models.
Simulation exercise of IV BMA
Use the simulation setting with endogeneity in Section 10.2 to perform BMA based on the BIC approximation and MC3.
Determinants of export diversification II
Use the datasets 11ExportDiversificationHHI.csv and 12ExportDiversificationHHIInstr.csv to perform IV BMA assuming that the log of per capita gross domestic product is endogenous (avglgdpcap). See Jetter and Ramírez Hassan (2015) for details.
Show that the link function in the case of the Bernoulli distribution is $\log(\theta / (1 - \theta))$ .
Ramı́rez-Hassan (2020), Ramı́rez-Hassan and Carvajal-Rendón (2021) perform variable selection using the file 13InternetMed.csv. In this dataset, the dependent variable is an indicator of Internet adoption (internet) for 5,000 households in Medellín (Colombia) during 2006–2014. This dataset contains 18 potential determinants, implying 262,144 ( $2^{18}$ ) potential models. Perform BMA using the logit link function with this dataset.
Serna Rodríguez, Ramírez Hassan, and Coad (2019) use 14ValueFootballPlayers.csv to analyze the market value of soccer players in Europe’s top leagues. There are 26 potential determinants of the market value of 335 soccer players. Use this dataset to perform BMA using the gamma distribution, setting default values for Occam’s window.
Use the dataset 15Fertile2.csv from Jeffrey M. Wooldridge (2012) to perform BMA using the Poisson model with the log link. The dataset contains 1,781 women from Botswana in 1988. The dependent variable is the number of children ever born (ceb), modeled as a function of 19 potential determinants.
Perform BMA in the logit model using MC3 and the BIC approximation using the simulation setting of Section 10.3.
Use 19ExchangeRateCOPUSD.csv to perform dynmaic BMA using four state-space models explaining annual variations in the COP to USD exchange rate:

Interest rate parity

$\Delta e_t = \beta_{1t}^{IRP} + \beta_{2t}^{IRP} (i_{t-1}^{Col}-i_{t-1}^{USA})+\mu_{t}^{IRP}$
Purchasing power parity

$\Delta e_t = \beta_{1t}^{PPP} + \beta_{2t}^{PPP} (\pi_{t-1}^{Col}-\pi_{t-1}^{USA})+\mu_{t}^{PPP}$
Taylor rule

$\Delta e_t = \beta_{1t}^{Taylor} + \beta_{2t}^{Taylor} (\pi_{t-1}^{Col}-\pi_{t-1}^{USA})+\beta_{2t}^{Taylor} (g_{t-1}^{Col}-g_{t-1}^{USA})+\mu_{t}^{IRP}$
Money supply

$\Delta e_t = \beta_{1t}^{Money} + \beta_{2t}^{Money} (g_{t-1}^{Col}-g_{t-1}^{USA})+\beta_{2t}^{Money} (m_{t-1}^{Col}-m_{t-1}^{USA})+\mu_{t}^{Money}$

where varTRM ( $\Delta e_t$ ) represents the annual variation rate of the exchange rate from COP to USD, TES_COL10 ( $i_{t}^{Col}$ ) and TES_USA10 ( $i_{t}^{USA}$ ) denote the annual return rates of Colombian and U.S. public debts over 10 years, inflation_COL ( $\pi_{t}^{Col}$ ) and inflation_USA ( $\pi_{t}^{USA}$ ) are the annual inflation rates for Colombia and the U.S., varISE_COL ( $g_{t}^{Col}$ ) and varISE_USA ( $g_{t}^{USA}$ ) represent the annual variations of economic activity indices, and varCOL_M3 ( $m_{t}^{Col}$ ) and varUSA_M3 ( $m_{t}^{USA}$ ) are the annual variations of the money supply. In addition, $\mu_{t}^{\cdot}$ is the stochastic error. The dataset includes monthly variations from January 2006 to November 2023.

Perform Bayesian model averaging using these models, calculate posterior model probabilities, and plot the posterior mean and credible interval of $\beta_{2t}^{Money}$ .

Perform a simulation of the dynamic logistic model, where there are 7 ( $2^3 - 1$ , excluding the model without regressors) competing models originating from 3 regressors: $x_{tk} \sim N(0.5, 0.8^2)$ , $k = 2, 3, 4$ , and $\beta_1 = 0.5$ , $\beta_{2t}$ is a sequence from 1 to 2 in steps given by $1/T$ , and $\beta_{3t} = \begin{Bmatrix} -1, & 1 < t \leq 0.5T \\ 0, & 0.5T < t \leq T \end{Bmatrix}$ , with $\beta_4 = 1.2$ . Then, $\boldsymbol{x}_t^{\top} \boldsymbol{\beta}_t = \beta_1 + \beta_{2t} x_{2t} + \beta_{3t} x_{3t} + \beta_4 x_{4t}$ , where $P[Y_t = 1 | \boldsymbol{x}_t, \boldsymbol{\beta}_t] = \frac{\exp(\boldsymbol{x}_t^{\top} \boldsymbol{\beta}_t)}{1 + \exp(\boldsymbol{x}_t^{\top} \boldsymbol{\beta}_t)}, \quad t = 1, 2, \dots, 1100.$ Use the function logistic.dma from the dma package to obtain the posterior model probabilities, first setting the forgetting parameter of the models to 0.99, and then to 0.95. Compare the results.
Show that

$\mathbb{E}\left[\frac{q(\boldsymbol{\theta})}{\pi(\boldsymbol{\theta} | \mathcal{M}_m)p(\mathbf{y} | \boldsymbol{\theta}_m, \mathcal{M}_m)} \bigg\rvert \mathbf{y}, \mathcal{M}_m\right] = \frac{1}{p(\mathbf{y} | \mathcal{M}_m)},$ where the expected value is with respect to the posterior distribution given model $\mathcal{M}_m$ , and $q(\boldsymbol{\theta})$ is the proposal distribution whose support is $\boldsymbol{\Theta}$ .

References

Jetter, M., and A. Ramírez Hassan. 2015. “Want Export Diversification? Educate the Kids First.” Economic Inquiry 53 (4): 1765–82.

Ramı́rez-Hassan, Andrés. 2020. “Dynamic Variable Selection in Dynamic Logistic Regression: An Application to Internet Subscription.” Empirical Economics 59 (2): 909–32.

Ramı́rez-Hassan, Andrés, and Daniela A. Carvajal-Rendón. 2021. “Specification Uncertainty in Modeling Internet Adoption: A Developing City Case Analysis.” Utilities Policy 70: 101218.

Serna Rodríguez, M., A. Ramírez Hassan, and A. Coad. 2019. “Uncovering Value Drivers of High Performance Soccer Players.” Journal of Sport Economics 20 (6): 819–49.

Wooldridge, Jeffrey M. 2012. Introductory Econometrics: A Modern Approach. Fifth. Mason, Ohio: South-Western: Cengage Learning.