## 2.1 Random variables and distributions

What is a **random variable**? It’s actually a *function* – one whose outcome we don’t know beforehand.

No, you don’t have to know any measure theory right now. You can do that in grad school :) The important thing is that a random variable “maps” a set of outcomes (which might not be numbers) to a set of numerical values.

Here’s the fancy *Measure Theory Definition*: A random variable \(X\colon \Omega \to E\) is a measurable function from a set of possible outcomes \(\Omega\) to a measurable space \(E\). \(\Omega\) is the set of all possible outcomes (sample space). \(E\) is often \(\Re\), the set of real numbers; it doesn’t have to be, but it’s a lot easier to do math with it if it is, so for our purposes we’ll mostly talk about that.

That means we’re either using quantitative variables, or mapping categorical ones to the real numbers, as we do with 0-or-1 (binary) indicator variables.

How can we talk about an RV’s **distribution**?

We can’t know the value of \(X\) at any given moment in advance, but we can talk about the *probability* that \(X\) will take on a given value.

### 2.1.1 Discrete RVs

Example: roll a die. Then let \(X\) be the random variable equal to the number of dots on the face of the die that you see. This is *discrete* (and finite too); it must take on one of a finite set of values, and can’t take on a value *in between* them – you can’t see 3.42 dots on the die.

The **probability mass function (pmf)** of \(X\) at a value \(x\) is

\[p(x)=P(X=x)\]

and don’t forget,

\[\sum_{x}p(x) = 1.\]

So in this case \(p(1) = p(2) = \dots = p(6) = 1/6\) and \(p(x)\) for all other \(x\) values is \(0\).

Note that we use capital \(X\) to refer to the random variable itself (a function) and lower-case \(x\) to talk about a specific value that \(X\) could have.

This one is a *roll model*. You’re welcome.

The PMF gives us all the possible values and their probabilities – describing the behavior of \(X\) as well as it is possible to do. We refer to this as the *distribution* of \(X\), or sometimes a *probability model*. This is a theoretical model, not an empirical one derived from observations like we were talking about earlier.

### 2.1.2 Continuous RVs

Okay but what if you’re, say, measuring the *weight* of the die? What values can this RV take on? The set of possible values can still be bounded, but it sure isn’t finite.

What is the probability that it takes on *any given one* of these values?

Here’s the problem: \(P(X=x)=0\) for all \(x\). There is 0 probability that your die weighs *exaaaaactly* 10 grams, or any other value. On the bright side, the probability that \(X\) takes a value in a given *range* is not necessarily zero. We’re going to have to go to calculus here.

For continuous RVs, instead of a pmf we have a *probability density function (pdf)* that’s defined on *intervals*:

\[P(a \le X \le b) = \int_{a}^b{f_X(t) dt } = F_X(b) - F_X(a)\]

where \(F_X(a)\) is defined as \(P(X\le a)\); it’s called the **cumulative distribution function (CDF)**. (It works the same way for discrete RVs, just without the calculus.) Dust off the ol’ calculus notes to see that \(F_X(a) = \int_{- \infty}^a {f_X(t)dt}\).

It can be helpful to draw a picture of this for yourself!

Note how the probability notation works with the integral notation. \(P(X\le a)+P(a \le X \le b)=P(X \le b)\), so \(P(a \le X \le b) = P( X \le b) - P(X\le a)\), matching the \(F_X(a)\) version. The integral subtraction matches up too.