4.5 Distributions of transformations of random variables

Recall that a function of a random variable is also a random variable. If X is a random variable, then Y=g(X) is also a random variable and so it has a probability distribution. Unless g represents a linear rescaling, a transformation will change the shape of the distribution. So the question is: what is the distribution of g(X)? We’ll focus on transformations of continuous random variables, in which case the key to answering the question is to work with cdfs.

Example 4.22 Recall the example in Section 3.8.2. Let U be a random variable with a Uniform(0, 1) distribution, and let X=log(1U). We approximated the distribution of X via simulation, and we saw some example calculations in Example 3.15 justifying why the distribution looks like it does. Now we will derive the pdf of X.

  1. Identify the possible values of X. (We have done this already, but this should always be your first step.)
  2. Let FX denote the cdf of X. Find FX(1).
  3. Find FX(2).
  4. Find the cdf FX(x).
  5. Find the pdf fX(x).
  6. Why should we not be surprised that X=log(1U) has cdf FX(x)=1ex? Hint: what is the function ulog(1u) in this case?
Solution. to Example 4.22

Show/hide solution

  1. As always, first determine the range of possible values. When u=0, log(1u)=0, and as u approaches 1, log(1u) approaches ; see the picture of the function below. So X takes values in [0,).
  2. FX(1)=P(X1), a probability statement involving X. Since we know the distribution of U, we express the event {X1} as an equivalent event involving U.
    {X1}={log(1U)1}={U1e1} The above follows since log(1u)1 if and only if u1e1; see Figure 4.9 below. Therefore FX(1)=P(X1)=P(log(1U)1)=P(U1e1) Now since U has a Uniform(0, 1) distribution, P(Uu)=u for 0<u<1. The value 1e10.632 is just a number between 0 and 1, so P(U1e1)=1e10.632. Therefore FX(1)=1e10.632. (This is represented in Figure 3.11 by the area of the region from 0 to 1, 63.2%.)
  3. Similar to the previous part FX(2)=P(X2)=P(log(1U)2)=P(U1e2)=1e20.865
  4. As suggested in the paragraph before the example, the key to finding the pdf is to work with cdfs. We basically repeat the calculation in the previous steps, but for a generic x instead of 1 or 2. Consider 0x<; we wish to find the cdf evaluated at x. FX(x)=P(Xx)=P(log(1U)x)=P(U1ex)=1ex The above follows since, for 0<x<, log(1u)x if and only if u1ex; see Figure 4.9 below (illustrated for x=1). Now since U has a Uniform(0, 1) distribution, P(Uu)=u for 0<u<1. For a fixed 0<x<, the value 1ex is just a number between 0 and 1, so P(U1ex)=1ex. Therefore FX(x)=1ex,0<x<.
  5. Differentiate the cdf with respective to x to find the pdf. fX(x)=F(x)=ddx(1ex)=ex,0<x< Thus we see that X has the pdf in Example 4.14.
  6. The function QX(u)=log(1u) is the quantile function (inverse cdf) corresponding to the cdf FX(x)=1ex. Therefore, since U has a Uniform(0, 1) distribution, the random variable QX(U) will have cdf FX be universality of the Uniform.
A plot of the function \(u\mapsto -\log(1-u)\). The dotted lines illustrate that \(-\log(1-u)\le 1\) if and only if \(u\le 1-e^{-1}\approx 0.632\).

Figure 4.9: A plot of the function ulog(1u). The dotted lines illustrate that log(1u)1 if and only if u1e10.632.

If X is a continuous random variable whose distribution is known, the cdf method can be used to find the pdf of Y=g(X)

  • Determine the possible values of Y. Let y represent a generic possible value of Y.
  • The cdf of Y is FY(y)=P(Yy)=P(g(X)y).
  • Rearrange {g(X)y} to get an event involving X. Warning: it is not always {Xg1(y)}. Sketching a picture of the function g helps.
  • Obtain an expression for the cdf of Y which involves FX and some transformation of the value y.
  • Differentiate the expression for FY(y) with respect to y, and use what is known about FX=fX, to obtain the pdf of Y. You will typically need to apply the chain rule when differentiating.

You will need to use information about X at some point in the last step above. You can either:

  • Plug in the cdf of X and then differentiate with respect to y.
  • Differentiate with respect to y and then plug in the pdf of X.

Either way gets you to the correct answer, but depending on the problem one way might be easier than the other. We’ll illustrate both methods in the next example.

Example 4.23 Let X be a random variable with a Uniform(-1, 1) distribution and Let Y=X2.

  1. Sketch the pdf of Y.
  2. Run a simulation to approximate the pdf of Y.
  3. Find FY(0.49).
  4. Use the cdf method to find the pdf of Y. Is the pdf consistent with your simulation results?
Solution. to Example 4.23

Show/hide solution

  1. First the possible values: since 1<X<1 we have 0<Y<1. The idea to the sketch is that squaring a number less than 1 in absolute value returns a smaller number. So the transformation “pushes values towards 0” making the density higher near 0. Consider the intervals [0,0.1] and [0.9,1] on the original scale; both intervals have probability 0.05 under the Uniform(1,1) distribution. On the squared scale, these intervals correspond to [0,0.01] and [0.81,1] respectively. So the 0.05 probability is “squished” into [0,0.01], resulting in a greater height, while it is “spread out” over [0.81,1] resulting in a smaller height. Remember: probability is represented by area.

  2. See the simulation results below. We see that the density is highest near 0 and lowest near 1.

  3. Since X can take negative values, we have to be careful; see Figure 4.10 below. {Y0.49}={X20.49}={0.49X0.49} Therefore, since X has a Uniform(1,1) distribution, FY(0.49)=P(Y0.49)=P(0.7X0.7)=1.42=0.7

  4. Fix 0<y<1. We now do the same calculation in the previous part in terms of a generic y, but it often helps to think of y as a particular number first. FY(y)=P(Yy)=P(X2y)=P(yXy)=FX(y)FX(y) Note that the event of interest is not just {Xy}; see Figure 4.10 below. From here we can either

    use the cdf of X and then differentiate, or differentiate and then use the pdf of X. We’ll illustrate both.

    1. Using the Uniform(-1, 1) cdf, the interval [y,y] has length 2y, and the total length of [1,1] is 2, so we have FY(y)=FX(y)FX(y)=2y2=y Now differentiate with respect to the argument y to obtain fY(y)=12y,0<y<1.
    2. Differentiate both sides of FY(y)=FX(y)FX(y), with respect to y. Differentiating the cdf FY yields its pdf fY, and differentiating the cdf FX yields its pdf fX. But don’t forget to use the chain rule when differentiating FX(y). FY(y)=FX(y)FX(y)ddyFY(y)=ddy(FX(y)FX(y))fY(y)=fX(y)12yfX(y)(12y)=12y(fX(y)+fX(y)) Since X has a Uniform(-1, 1) distribution, its pdf is fX(x)=1/2,1<x<1. But for 0<y<1, y and y are just numbers in [1,1], so fX(y)=1/2 and fX(y)=1/2. Therefore, fY(y)=12y,0<y<1. The histogram of simulated values seems consistent with this shape. (The density blows up at 0.)
A plot of the function \(x\mapsto x^2\) for \(-1<x<1\). The dotted lines illustrate that \(x^2\le 0.49\) if and only if \(-\sqrt{0.49}\le x\le \sqrt{0.49}\).

Figure 4.10: A plot of the function xx2 for 1<x<1. The dotted lines illustrate that x20.49 if and only if 0.49x0.49.


X = RV(Uniform(-1, 1))
Y = X ** 2

Y.sim(10000).plot()

# plot the density
from numpy import *
y = linspace(0.001, 1, 1000)
plt.plot(y, 0.5 / sqrt(y), 'k-');
plt.ylim(0, 10);

plt.show()

4.5.1 Transformations of multiple random variables

Cumulative distribution functions can also be used to derive the joint pdf of multiple random variables. If FX,Y is the joint cdf of X and Y then the joint pdf of X and Y is

fX,Y(x,y)=2xyFX,Y(x,y)

Remember: when taking a partial derivative with respect to one variable, treat the other variables like constants.

Example 4.24 Recall the example in Section 3.8.3. Let P be the probability space corresponding to two spins of the Uniform(1, 4) spinner, and let X be the sum of the two spins, and Y the larger spin (or the common value if a tie).

Solution. to Example 4.24
  1. Let FX,Y denote the joint cdf of X and Y. Find FX,Y(3.5,2).
  2. Find the joint cdf FX,Y.
  3. Find the joint pdf fX,Y.

Show/hide solution

  1. FX,Y(3.5,2)=P(X3.5,Y2). The (U1,U2) pairs take values in the square [1,4]×[1,4]. Figure 4.11 illustrates the event {X3.5,Y2}. The shaded region has area (1)(1)(1/2)(0.5)(0.5)=0.875. Since (U1,U2) pairs are uniformly distributed over the square region with area 9, P(X3.5,Y2)=3.875/9=0.0972.
  2. FX,Y(x,y)=P(Xx,Yy). We repeat the calculation from the previous part with a generic (x,y). Let (x,y) be a possible value of (X,Y); that is, 2<x<8, 1<y<4, and y+1<x<2y. The event {Xx,Yy} will have a shape like the one in Figure 4.11, with area (y1)2(1/2)(2yx)2. Since (U1,U2) pairs are uniformly distributed over the square region with area 9 FX,Y(x,y)=(1/9)((y1)2(1/2)(2yx)2),2<x<8,1<y<4,y+1<x<2y.
  3. Differentiate the cdf with respect to both x and y FX,Y(x,y)=(1/9)((y1)2(1/2)(2yx)2)xFX,Y(x,y)=x(1/9)((y1)2(1/2)(2yx)2)=(1/9)(2yx)2xyFX,Y(x,y)=y(1/9)(2yx)=2/9 Therefore fX,Y(x,y)={2/9,2<x<8,1<y<4,x/2<y<x1,0,otherwise
The event \(\{X \le 3.2, Y \le 2\}\) for \(X=U_1+U_2\), the sum, and \(Y=\max(U_1, U_2)\), the max, of two spins \(U_1, U_2\) of a Uniform(1, 4) spinner.

Figure 4.11: The event {X3.2,Y2} for X=U1+U2, the sum, and Y=max, the max, of two spins U_1, U_2 of a Uniform(1, 4) spinner.

Example 4.25 Continuing Example 2.40, let R be the random variable representing Regina’s arrival time in [0, 1], and Y for Cady. The random variable T=\min(R, Y) represents the time in (0, 1) at which the first person arrives. The random variable W = |R - Y| represents the amount of time the first person to arrive waits for the second person to arrive.

  1. Let F_W be the cdf of W. Find F_{W}(0.25).
  2. Find the cdf F_W.
  3. Find the pdf of W. What does this tell you about the distribution of waiting times?
  4. Let F_T be the cdf of T. Find F_{T}(0.25).
  5. Find the cdf F_T.
  6. Find the pdf of T. What does this tell you about the time of the first arrival?
  7. Are T and W the same random variable?
  8. Do T and W have the same distribution?

Solution. to Example 4.25

Show/hide solution

  1. F_W(0.25) = \textrm{P}(W \le 0.25). We computed this in Example 2.40; F_W(0.25) = 1 - (1-0.25)^2. See the plot on the left in Figure 4.12 below.
  2. We repeat the calculation in the previous part for a generic w in (0, 1). F_W(w) = 1 - (1 - w)^2, \quad 0 < w <1
  3. Differentiate the cdf with respect to w. f_W(w) = 2(1 - w), \quad 0 < w <1 Waiting time has highest density for short waiting times and lowest density for long waiting times.
  4. F_T(0.25) = \textrm{P}(T \le 0.25). See the plot on the right in Figure 4.12 below. \textrm{P}(T \le 0.25) = 1 - (1-0.25)^2.
  5. We repeat the calculation in the previous part for a generic t in (0, 1). F_T(t) = 1 - (1 - t)^2, \quad 0 < t <1
  6. Differentiate the pdf with respect to t. f_T(t) = 2(1 - t), \quad 0 < t <1 Time of first arrival has highest density near 0 and lowest density near 1.
  7. No, T and W are not the same random variable. For example, if they both arrive at time 0.5, then T is 0.5 but W is 0.
  8. Yes, they do have the same distribution. They have the same pdf (and cdf).
Illustration of the events \(\{W \le 0.25\}\) in Example 4.25. The square represents the sample space \(\Omega=[0,1]\times[0,1]\).Illustration of the events \(\{W \le 0.25\}\) in Example 4.25. The square represents the sample space \(\Omega=[0,1]\times[0,1]\).

Figure 4.12: Illustration of the events \{W \le 0.25\} in Example 4.25. The square represents the sample space \Omega=[0,1]\times[0,1].


R, Y = RV(Uniform(0, 1) ** 2)

W = abs(R - Y)

T = (R & Y).apply(min)

W.sim(10000).plot()
T.sim(10000).plot()
plt.show()