Chapter 1 Probabilty and Inference

Bayes’ Rule: If A and B are events in event space F, then Bayes’ rule states that P(AB)=P(BA)P(A)P(B)=P(BA)P(A)P(BA)P(A)+P(BAc)P(Ac) Let

  • y be the data we will collect from an experiment,
  • K be everything we know for certain about the world (aside from y), and
  • θ be anything we don’t know for certain.

A Bayesian statistician is an individual who makes decisions based on the probability distribution of those things we don’t know conditional on what we know, i.e. p(θy,K).

  • Parameter estimation: p(θy,M), where M is the model with parameter vector θ
  • Hypothesis testing: p(Mjy,M)
  • Prediction: p(˜yy,M)
Parameter Estimation Example: exponential model

Let YθExp(θ), the likelihood is p(yθ)=θexp(θy). Let’s assume a prior θGa(a,b), p(θ)=baΓ(a)θa1ebθ, then prior predictive distribution is p(y)=p(yθ)p(θ)dθ=baΓ(a)Γ(a+1)(b+y)a+1 The posterior is p(θy)=p(yθ)p(θ)p(y)=(b+y)a+1Γ(a+1)θa+11e(b+y)θ thus θyGa(a+1,b+y).

If p(y)<, we can use p(θy)p(yθ)p(θ) to find the posterior. In the example, θae(b+y)θ is the kernel of a Ga(a+1,b+y) distribution.

Bayesian learning: p(θ)p(θy1)p(θy1,y2)

Model selection

Formally, to select a model, we use p(Mjy)p(yMj)p(Mj). Thus, a Bayesian approach provides a natural way to learn about models, i.e. p(Mj)p(Mjy).

Prediction

p(˜yy)=p(˜y,θy)dy=p(˜yθ)p(θy)dθ. From the previous example, let yiExp(θ), θGa(a,b), p(˜yy)=p(˜yθ)p(θy)dθ=θeθ˜y(b+nˉy)a+nΓ(a+1)θa+n1eθ(b+nˉy)dθ=(b+nˉy)a+nΓ(a+n)θa+n+11eθ(b+nˉy+˜y)dθ=(b+nˉy)a+nΓ(a+n)Γ(a+n+1)(b+nˉy+˜y)a+n+1=(a+n)(b+nˉy)a+n(˜y+b+nˉy)a+n+1 which is called the Lomax distribution for ˜y with parameters a+n and b+nˉy.

Probabilty: A subjective probability describes an individual’s personal judgement about how likely a particular event is to occur.

Rational individuals can differ about the probability of an event by having different knowledge, i.e. P(EK1)P(EK2). But given enough data, we might have P(EK1,Y)P(EK2,y).