3.2 Conditional distributions

Most interesting problems involve two or more81 random variables defined on the same probability space. In these situations, we can consider how the variables vary together, or jointly, and study their relationships. The joint distribution of random variables \(X\) and \(Y\) (defined on the same probability space) is a probability distribution on \((x, y)\) pairs, and describes how the values of \(X\) and \(Y\) vary together or jointly.

We can also study the conditional distribution of one random variable given another.

3.2.1 Discrete RVs: Conditional pmf

Definition 2.17 Let \(X\) and \(Y\) be two discrete random variables defined on a probability space with probability measure \(\textrm{P}\). For any fixed \(x\) with \(\textrm{P}(X=x)>0\), the conditional probability mass function (pmf) of \(Y\) given \(X=x\) is a function \(p_{Y|X}:\mathbb{R}\mapsto [0, 1]\) defined by \(p_{Y|X}(y|x)=\textrm{P}(Y=y|X=x)\). \[\begin{align*} p_{Y|X}(y|x) = \textrm{P}(Y=y|X=x) & = \frac{\textrm{P}(X=x,Y=y)}{\textrm{P}(X=x)} = \frac{p_{X,Y}(x,y)}{p_X(x)}& & \text{a function of $y$ for fixed $x$} \end{align*}\]

3.2.2 Continuous RVs: Conditional pdf

Definition 2.18 Let \(X\) and \(Y\) be two continuous random variables with joint pdf \(f_{X,Y}\) and marginal pdfs \(f_X, f_Y\). For any fixed \(x\) with \(f_X(x)>0\), the conditional probability density function (pdf) of \(Y\) given \(X=x\) is a function \(f_{Y|X}:\mathbb{R}\mapsto [0, \infty)\) defined by \[\begin{align*} f_{Y|X}(y|x) &= \frac{f_{X,Y}(x,y)}{f_X(x)}& & \text{a function of $y$ for fixed $x$} \end{align*}\]

The conditional pdf of \(Y\) given \(X=x\) can be thought of as the “slice” of the joint pdf of \((X, Y)\) corresponding to \(X=x\), renormalized so the density heights, corresponding to different \(y\) values, for the slice are such that the total area under the density slice is 1.

Multiplication rule for pdfs of continuous RVs \[ f_{X,Y}(x,y) = f_{Y|X}(y|x)f_X(x) \]

Be careful when conditioning with continuous random variables.

  • Remember to specify possible values! And to note how conditioning can change the possible values.
  • Remember, for continuous \(X\), \(\textrm{P}(X=x)=0\). When we condition on \(\{X=x\}\) we are really conditioning on \(\{|X-x|<\epsilon\}\) and seeing what happens in the idealized limit when \(\epsilon\to0\).
  • When simulating, never condition on \(\{X=x\}\); rather, condition on \(\{|X-x|<\epsilon\}\) where \(\epsilon\) represents some suitable degree of precision (e.g. \(\epsilon=0.005\) if rounding to two decimal places).
  • Remember pdfs do not return probabilities directly; \(f_{Y|X}(y|x)\) is not a probability of anything.
  • But \(f_{Y|X}(y|x)\) is related to the probability that \(Y\) is “close to” \(y\) given that \(X\) is “close to” \(x\): \[ \textrm{P}(y-\epsilon/2<Y < y+\epsilon/2\; |\; x-\epsilon/2<X < x+\epsilon/2) \approx \epsilon f_{Y|X}(y|x) \]

  1. We mostly focus on the case of two random variables, but analogous definitions and concepts apply for more than two (though the notation can get a bit messier).