1.14 Moments
We might like to describe a random variable’s behavior without specifying the entire distribution. Moments are ways to summarize something about how a random variable behaves.
1.14.1 First moment: Expected value (mean) – a measure of center
For discrete RVs: \[E(X) = \sum_{all~x} xp(x)\]
This is a weighted “average” of all the values \(X\) can take on, weighted by the probability. It balances the pmf at the center.
In some texts, they use curly braces { } to denote the expectation and variance operators. Some people use square brackets, some use parentheses. Some don’t use anything for expectations. Life is a challenge sometimes.
We use the letter \(\mu\), sometimes with the subscript of the random variable, \(\mu_X\), to denote the mean.
For any function of \(X\), say \(g(X)\), the expected value of the function of \(X\) is the weighted average of the function values:
\[E(g(X)) = \sum_{all~x} g(x)p(x)\] Note that the probabilities don’t change – only the values by which we multiply them.
Continuous RVs: \[ E(X) = \int_{-\infty}^{\infty} {xf(x)dx} \]
See the parallel? \(P(X=x)=0\) for all \(x\), and there’s infinitely many \(x\), so we have an infinite sum of infinitesimally small things…and you know that means it’s time for calculus.
The rule for expected values of functions of \(X\) is parallel too: \[E(g(X)) = \int_{-\infty}^{\infty} g(x) f(x)dx \] Notice that \(f(x)\), which is about the probability of each outcome, stays the same – only the values of \(g(x)\) change.
Properties of expected values: For constants \(a\) and \(b\) and random variable \(X\),
\[ E(aX) = aE(X)\] \[E(X+b) = E(X) + b\]
1.14.2 Second moment: Variance – a measure of spread
Let’s think about how spread out a distribution is. For our purposes it doesn’t matter where on the numberline it’s centered. What we want to know is how far the values tend to be from the center of that distribution. Let’s square them so they’re all positive (also for some math reasons) and take an average – that is, the expected value. Then:
\[Var(X) = E[(X-\mu)^2] = E(X^2) - [E(X)]^2 = E(X^2) - \mu^2\]
Fun algebra magic trick: where did the cross-terms go?
This tells us the “average” squared distance of \(X\) from its mean. We use the symbol \(Var(X) = \sigma_X^2\) or more generally \(\sigma^2\). It’s a measure of how spread out the distribution of \(X\) is. But it’s in squared units, so typically we talk about its square root, called the standard deviation: \(\sigma_X = \sqrt{Var(X)}\).
You may also see the notation \(\sigma^2\{X\}\) to denote \(Var(X)\).
For a continuous RV, \[Var(X) = E(X - \mu)^2 = \int_{-\infty}^{\infty} (x-\mu)^2 f(x)dx\]
Properties of variances: For constants \(a\) and \(b\) and random variable \(X\),
\[Var(X + b) = Var(X)\] \[Var(aX) = a^2Var(X)\]
Are there more moments? Oh my yes! See how the first moment involves \(E(X)\) and the second uses \(E(X^2)\)? You can just keep going. But we mostly won’t in this class.