1.3 Bayesian reports: Decision theory under uncertainty

The Bayesian framework allows reporting the full posterior distributions. However, some situations demand to report a specific value of the posterior distribution (point estimate), an informative interval (set), point or interval predictions and/or selecting a specific model. Decision theory offers an elegant framework to make a decision regarding what are the optimal posterior values to report (J. O. Berger 2013).

The point of departure is a loss function, which is a non-negative real value function whose arguments are the unknown state of nature at time t (Θ), and a set of actions to be made (A), that is, L(θ,a):Θ×AR+.

This function is a mathematical expression of the loss of making mistakes. In particular, selecting action aA when θΘ is the true. In our case, the unknown state of nature can be population parameters, functions of them, future or unknown dgp realizations, models, etc.

From a Bayesian perspective, we should choose the action (δ(y)) that minimizes the posterior expected loss, which is the posterior risk function (E[L(θ,a)|y]),

δ(y)=argminθΘ E[L(θ,a)|y],

where E[L(θ,a)|y]=ΘL(θ,a)π(θ|y)dθ.11

Obviously, different loss functions imply different optimal decisions. We illustrate this assuming θR.

  • L(θ,a)=[θa]2, then

E[θ|y]=argminθΘ Θ[θa]2π(θ|y)dθ.

Using the first condition order with respect to a, and interchanging differentiation with integrals, we get that the posterior mean is the Bayesian optimal action, that is, δ(y)=E[θ|y].

  • L(θ,a)=w(θ)[θa]2, where w(θ)>0 is a weighting function. Then using same steps as the previous result we have that δ(y)=E[w(θ)×θ|y]E[w(θ)|y], that is, the Bayesian optimal action is a weighted average driven by w(θ).

  • L(θ,a)=|θa|, then we have to find δ(y)=argminθΘ Θ|θa|π(θ|y)dθ, this means that aπ(θ|y)dθ=1/2, that is, δ(y) is the median (exercise).

  • Given the loss function,

L(θ,a)={K0(θa),θa0K1(aθ),θa<0},

then,

E[L(θ,a)|y]=aK1(aθ)π(θ|y)dθ+aK0(θa)π(θ|y)dθ.

Differenting w.r.t a, and equaliting to zero, K1aπ(θ|y)dθK0aπ(θ|y)dθ=0,

then, aπ(θ|y)dθ=K0K0+K1, that is, any K0/(K0+K1)-percentile of π(θ|y) is an optimal Bayesian estimate of θ.

We can also use decision theory under uncertatinty in hypothesis testing. In particular, testing H0:θΘ0 versus H1:θΘ1, Θ=Θ0Θ1 and =Θ0Θ1, there are two actions of interest, a0 and a1, where aj denotes no rejecting Hj, j={0,1}. Given the loss function,

L(θ,aj)={0,θΘjKj,θΘj,ji}.

The posterior expected loss associated with aj is KjP(Θi|y), ji. Therefore, the Bayes optimal decision is the one that gives the smallest posterior expected loss, that is, the null hypothesis is rejected (a1 is not rejected), when K0P(Θ1|y)>K1P(Θ0|y). Given our framework (Θ=Θ0Θ1,=Θ0Θ1), then P(Θ0|y)=1P(Θ1|y), and as a consequence, P(Θ1|y)>K1K1+K0, that is, the rejection region of the Bayesian test is R={y:P(Θ1|y)>K1K1+K0}.

Decision theory also helps to construct interval (region) estimates. Let ΘC(y)Θ a credible set for θ, and L(θ,ΘC(y))=1I{θΘC(y)}, where

I{θΘC(y)}={1,θΘC(y)0,θΘC(y)}.

Then,

L(θ,ΘC(y))={0,θΘC(y)1,θΘC(y)}.

Then, the risk function is 1P(θΘC(y)).

Given a measure of credibility (α(y)) that defines the level of trust that θΘC(y). We can measure the accuracy of the report by L(θ,α(y))=(I{θΘC(y)}α(y))2. This loss function could be used to suggest a choice of the report α(y). The Bayesian optimal action is P(θΘC(y)|y). This can be calculated given the posterior distirbution, that is, P(θΘC(y)|y)=ΘC(y)π(θ|y)dθ. This a measure of the belief that θΘC(y) given the prior beliefs and sample information.

The set ΘC(y)Θ is a 100(1α)% credible set with respect to π(θ|y) if P(θΘC(y)|y)=ΘC(y)π(θ|y)=1α.

The 100(1α)% highest posterior density set (HPD) for θ is a 100(1α)% credible interval for θ with the property that it has a smaller space than any other 100(1α)% credible interval for θ. That is, C(y)={θ:π(θ|y)k(α)}, where k(α) is the largest number such that θ:π(θ|y)k(α)π(θ|y)dθ=1α. The HPDs can be a collection of disjoint intervals when working with multimodal posterior densities. In addition, they have the limitation of not necessary being invariant under transformations.

Finally, decision theory can be used to perform prediction (point, sets or probabilistic). Suppose that one has a loss L(Y0,a) involving the prediction of Y0. Then, L(θ,a)=EY0θ[Y0,a]=Y0L(y0,a)g(y0|θ)dy0, where g(y0|θ) is the density function of Y0.

Predictive exercises can be based on predictive densities, that is, π(Y0|y). Then, the predictive density can be used to obtain a point prediction given a loss function L(Y0,y0), where y0 is a point prediction for Y0. We can seek y0 that minimizes the mathematical expectation of the loss function.

Other approach is to use scoring rules to assess the quality of the predictive (probabilistic) forecasts. This is assigning a numerical score based on the predictive distribution on the event that realizes (Gneiting and Raftery 2007). Then, we can use decision theory to define the most relevant scoring rule for the problem at hand, such that we assign a high ordinate to the realized value (calibration). In addition, it is possible to add some reward for accuracy in specific parts of the support of the density function (sharpness) (Diks, Panchenko, and Dijk 2011).


  1. (Chernozhukov and Hong 2003) propose Laplace type estimators (LTE) based on the quasi-posterior, p(θ)=exp{Ln(θ)}π(θ)Θexp{Ln(θ)}π(θ)dθ where Ln(θ) is not necessarily a log-likelihood function. The LTE minimizes the quasi-posterior risk.↩︎