16 Cumulative Distribution Functions
Example 16.1 Maggie and Seamus are babies who have just turned one. At their one-year visits to their pediatrician:
- Maggie is 76cm tall and in the 75th percentile of height for girls.
- Seamus is 72cm tall and in the 10th percentile of height for boys.
Explain what these percentiles mean.
- Roughly, the value \(x\) is the \(p\)th percentile of a distribution of a random variable \(X\) if \(p\) percent of values of the variable are less than or equal to \(x\): \(\text{P}(X\le x) = p\).
- The cumulative distribution function (cdf) of a random variable fills in the blank for any given \(x\): \(x\) is the (blank) percentile. That is, for an input \(x\), the cdf outputs \(\text{P}(X\le x)\).
- The cumulative distribution function (cdf) (of a random variable \(X\) defined on a probability space with probability measure \(\text{P}\)) is the function, \(F_X: \mathbb{R}\mapsto[0,1]\), defined by \(F_X(x) = \text{P}(X\le x)\). A cdf is defined for all real numbers \(x\) regardless of whether \(x\) is a possible value of \(X\).
Example 16.2 According to data on students who took the SAT in 2018-2019, 1400 was the 94th percentile of SAT scores, while 1000 was the 40th percentile. Let \(X\) be the SAT score of a randomly selected student (from this cohort), and let \(F_X\) be the cdf of \(X\). Evaluate the cdf for each of the following. For the purposes of this exercise, interpret these quantities in terms of actual SAT scores, which take values in 400, 410, 420, \(\ldots\), 1590, 1600.
- \(F_X(1400)\)
- \(F_X(1405)\)
- \(F_X(1000)\)
- \(F_X(1003.7)\)
- \(F_X(-3.1)\)
- \(F_X(390)\)
- \(F_X(399.5)\)
- \(F_X(1600)\)
- \(F_X(1610)\)
- \(F_X(2307.4)\)
- \(F_X(1400)-F_X(1000)\)
Example 16.3 Recall Example 15.1, where the series system resistance \(X\) (ohms) has pdf \[ f_X(x) = \begin{cases} \frac{1}{33}\left(1-\frac{1}{33}|x - 330|\right), & 297 < x < 363\\ 0, & \text{otherwise.} \end{cases} \]
Find the cdf of \(X\) and sketch a plot of it. (Hint: consider \(297<x<330\) and \(330<x<363\) separately.)
Evaluate and interpret \(F_X(340)\).
Evaluate and interpret \(1 - F_X(340)\).
Evaluate and interpret \(F_X(340) - F_X(330)\).
Example 16.4 Recall Example 15.3 where the waiting time, measured continuously in hours, from now until the next earthquake (of any magnitude) occurs in southern CA is a continuous random variable \(X\) with an Exponential distribution with rate parameter 2. The pdf of \(X\) is
\[ f_X(x) = 2 e^{-2x}, \; x \ge0. \]
Find the cdf of \(X\), and sketch a plot of it.
Evaluate and interpret \(F_X(0.25)\).
Evaluate and interpret \(1 - F_X(0.25)\).
Evaluate and interpret \(F_X(1) - F_X(0.5)\).
Example 16.5 Database queries to the Cal Poly data warehouse occur randomly throughout the day. During regular business hours, queries arrive at rate 0.8 per second on average, so that the average number of queries that arrive during any \(t\) second time interval is \(0.8t\). Suppose that the number of queries that arrive during any \(t\) second time interval follows a marginal Poisson distribution with mean \(0.8t\).
We are interested in the distribution of \(T\), the time (seconds) until the next query arrives.
Interpret the event \(\{T>2\}\). How can you express this as an equivalent event involving the number of queries?
Compute \(\text{P}(T > 2)\).
Compute \(\text{P}(T > t)\) as a function of \(t>0\).
Find the cdf of \(T\).
Find the pdf of \(T\).
What is the name of the distribution of \(T\)? What is its mean and SD?
Example 16.6 Let \(X\) be the number of heads in 3 flips of a fair coin. \(X\) takes values 0, 1, 2, 3, with respective probability 1/8, 3/8, 3/8, 1/8.
Find the cdf of \(X\) and sketch a plot of it.
Let \(Y\) be the number of tails in 3 flips of a fair coin. Find the cdf of \(Y\).
- A cdf is defined for all values of \(x\), regardless if \(x\) is a possible value of the RV.
- A cdf is a non-decreasing function: if \(x_1 \le x_2\) then \(F_X(x_1)\le F_X(x_2)\).
- A cdf approaches 0 as the input approaches \(-\infty\): \(\lim_{x\to-\infty}F_X(x) = 0\).
- A cdf approaches 1 as the input approaches \(\infty\): \(\lim_{x\to\infty}F_X(x) = 1\).
- The cdf of a discrete random variable is a step function.
- The steps occur at the possible values of the random variable.
- The height of a particular step corresponds to the probability of that value, given by the pmf.
- The cdf of a continuous random variable is a continuous function.
- The cdf of a continuous random variable is obtained by integrating the pdf, so
- The pdf of a continuous random variable is obtained by differentiating the cdf \[ F_X' = f_X \qquad \text{if $X$ is continuous} \]
- For any random variable \(X\) with cdf \(F_X\) \[
F_X(b) - F_X(a) = \text{P}(a<X \le b)
\]
- Whether the inequalities in the above event are strict (\(<\)) or not (\(\le\)) matters for discrete random variables, but not for continuous.
- Random variables \(X\) and \(Y\) have the same distribution if their cdfs are the same, that is, if \(F_X(u) = F_Y(u)\) for all \(u\).
- That is, two random variables have the same distribution if all the percentiles are the same.