9 Continuous Random Variables: Probability Density Functions
- Simulated values of a continuous random variable are usually plotted in a histogram which groups the observed values into “bins” and plots densities or frequencies for each bin. (R:
hist
). - In a histogram areas of bars represent relative frequencies; the axis which represents the height of the bars is called “density” (R:
hist
withfreq = FALSE
). - The distribution of a continuous random variable can be described by a probability density function (pdf), for which areas under the density curve determine probabilities.
9.1 Uniform Distributions
Example 9.1
In the meeting problem assume that Regina’s arrival time
= 10000
N_rep
= runif(N_rep, 0, 60)
x
data.frame(1:N_rep, x) |>
head() |>
kbl(col.names = c("Repetition", "X")) |>
kable_styling(fixed_thead = TRUE)
Repetition | X |
---|---|
1 | 1.062914 |
2 | 17.729239 |
3 | 21.610468 |
4 | 27.190663 |
5 | 36.913450 |
6 | 19.604730 |
- Sketch a plot of the pdf of
.
- Use the pdf to find the probability that Regina arrives before 12:15.
- Use the pdf to find the probability that Regina arrives after 12:45.
- Use the pdf to find the probability that Regina arrives between 12:15:00 and 12:16:00.
- Use the pdf to find the probability that Regina arrives between 12:15:00 and 12:15:01.
- Compute
, the probability that Regina arrives at the exact time 12:15:00 (with infinite precision).
- The probability that a continuous random variable
equals any particular value is 0. That is, if is continuous then for all . - Even though any specific value of a continuous random variable has probability 0, intervals still can have positive probability.
- In practical applications involving continuous random variables, “equal to” really means “close to”, and “close to” probabilities correspond to intervals which can have positive probability.
- A continuous random variable
has a Uniform distribution with parameters and , with , if its probability density function satisfies - If
has a Uniform( , ) distribution then
9.2 Probability Density Functions
- The continuous analog of a probability mass function (pmf) is a probability density function (pdf).
- However, while pmfs and pdfs play analogous roles, they are different in one fundamental way; namely, a pmf outputs probabilities directly, while a pdf does not.
- A pdf of a continuous random variable must be integrated to find probabilities of related events.
- The probability density function (pdf) (a.k.a. density) of a continuous RV
is the function which satisfies - For a continuous random variable
with pdf , the probability that takes a value in the interval is the area under the pdf over the region . - The axioms of probability imply that a valid pdf must satisfy
Example 9.2 Continuing Example 9.1, we will now we assume Regina’s arrival time has pdf
where
- Sketch a plot of the pdf. What does this say about Regina’s arival time?
- Find the value of
and specify the pdf of .
- Find the probability that Regina arrives before 12:15.
- Find the probability that Regina arrives after 12:45. How does this compare to the previous part? What does that say about Regina’s arrival time?
- Find
, the probability that Regina arrives at the exact time 12:15 (with infinite precision).
- Find
, the probability that Regina arrives at the exact time 12:45 (with infinite precision).
- Find the probability that Regina arrives between 12:15 and 12:16.
- Find the probability that Regina arrives between 12:45 and 12:46. How does this compare to the probability for 12:15 to 12:16? What does that say about Regina’s arrival time?
- The probability that a continuous random variable
equals any particular value is 0. That is, if is continuous then for all .
- For continuous random variables, it doesn’t really make sense to talk about the probability that the random value is equal to a particular value. However, we can consider the probability that a random variable is close to a particular value.
- The density
at value is not a probability. - Rather, the density
at value is related to the probability that the RV takes a value “close to ” in the following sense - The quantity
is a small number that represents the desired degree of precision. For example, rounding to two decimal places corresponds to . - What’s important about a pdf is relative heights. For example, if
then is roughly “twice as likely to be near than to be near ” in the above sense.
9.3 Exponential Distributions
Example 9.3 Suppose that we model the waiting time, measured continuously in hours, from now until the next earthquake (of any magnitude) occurs in southern CA as a continuous random variable
This is the pdf of the “Exponential(2)” distribution.
- Sketch the pdf of
. What does this tell you about waiting times?
- Without doing any integration, approximate the probability that
rounded to the nearest minute is 0.5 hours.
- Without doing any integration determine how much more likely that
rounded to the nearest minute is to be 0.5 than 1.0.
- Compute and interpret
.
- Compute and interpret
.
- A continuous random variable
has an Exponential distribution with rate parameter if its pdf is - If
has an Exponential( ) distribution then - In R:
rexp(N_rep, rate)
to simulate valuesdexp(x, rate)
to compute the probability density functionpexp(x, rate)
to compute the cumulative distribution function .qexp(p, rate)
to compute the quantile function which returns for which .
- Exponential distributions are often used to model the waiting time in a random process until some event occurs.
is the average rate at which events occur over time (e.g., 2 per hour) is the mean time between events (e.g., 1/2 hour)
= 10000
N_rep
= rexp(N_rep, rate = 2)
x
head(x) |>
kbl()
x |
---|
0.4848519 |
0.0190915 |
1.3624062 |
0.0464219 |
0.4694797 |
0.8382753 |
sum(x <= 3) / N_rep
[1] 0.9975
pexp(3, 2)
[1] 0.9975212
sum(x > 0.25) / N_rep
[1] 0.6151
1 - pexp(0.25, 2)
[1] 0.6065307
9.4 Expected Values
- The expected value (a.k.a. expectation a.k.a. mean), of a random variable
is a number denoted representing the probability-weighted average value of . - The expected value of a continuous random variable with pdf
is defined as - Replace the generic bounds
with the possible values of the random variable - Note well that
represents a single number. - The expected value is the “balance point” (center of gravity) of a distribution.
- The expected value of a random variable
is defined by the probability-weighted average according to the underlying probability measure. But the expected value can also be interpreted as the long-run average value, and so can be approximated via simulation. - Read the symbol
as- Simulate lots of values of what’s inside
- Compute the average. This is a “usual” average; just sum all the simulated values and divide by the number of simulated values.
- Simulate lots of values of what’s inside
Example 9.4
Continuing Example 9.3, where
- Use simulation to approximate the long run average value of
.
- Set up the integral to compute
.
- Interpret
.
- Compute
.
- Compute
. (Is it equal to 50%?)
sum(x) / N_rep
[1] 0.5070349
mean(x)
[1] 0.5070349
sum(x <= mean(x)) / N_rep
[1] 0.6304
pexp(0.5, 2)
[1] 0.6321206
- If
has an Exponential( ) distribution then - For example, if events happen at rate
per hour, then the average waiting time between events is hour.
9.5 Law of the Unconscious Statistician
- The “law of the unconscious statistician” (LOTUS) says that the expected value of a transformed random variable can be found without finding the distribution of the transformed random variable, simply by applying the probability weights of the original random variable to the transformed values.
- LOTUS says we don’t have to first find the distribution of
to find ; rather, we just simply apply the transformation to each possible value of and then apply the corresponding weight for to . - Whether in the short run or the long run, in general
- In terms of expected values, in general
The left side represents first transforming the values and then averaging the transformed values. The right side represents first averaging the values and then plugging the average (a single number) into the transformation formula.
Example 9.5
Continuing Example 9.3, where
- Use simulation to approximate the variance of
.
- Set up the integral to compute
.
- Compute
. (The previous part shows )
- Compute
.
data.frame(x,
- mean(x),
x - mean(x)) ^ 2) |>
(x head() |>
kbl(col.names = c("Value", "Deviation from mean", "Squared deviation")) |>
kable_styling(fixed_thead = TRUE)
Value | Deviation from mean | Squared deviation |
---|---|---|
0.4848519 | -0.0221830 | 0.0004921 |
0.0190915 | -0.4879434 | 0.2380888 |
1.3624062 | 0.8553713 | 0.7316601 |
0.0464219 | -0.4606131 | 0.2121644 |
0.4694797 | -0.0375553 | 0.0014104 |
0.8382753 | 0.3312403 | 0.1097202 |
mean(x)
[1] 0.5070349
mean((x - mean(x)) ^ 2)
[1] 0.2489758
var(x)
[1] 0.2490007
sqrt(var(x))
[1] 0.4989997
sd(x)
[1] 0.4989997
mean(x ^ 2)
[1] 0.5060602
- The variance of a random variable
is - The standard deviation of a random variable is
- Variance is the long run average squared deviation from the mean.
- Standard deviation measures, roughly, the long run average distance from the mean. The measurement units of the standard deviation are the same as for the random variable itself.
- The definition
represents the concept of variance. However, variance is usually computed using the following equivalent but slightly simpler formula. - That is, variance is the expected value of the square of
minus the square of the expected value of .
- Variance has many nice theoretical properties. Whenever you need to compute a standard deviation, first find the variance and then take the square root at the end.
- If
has an Exponential( ) distribution then