7 Properties of random variables

Recall that a random variable is the assignment of a numerical outcome to a random process. A random variable can be discrete or continuous, depending on the values that it takes. Examples of random variables are:

  • The number of heads in three coin flips.

  • Time until the next earthquake.

  • Revenue made in a flight from an airline’s baggage fees.

  • The height of a randomly selected person. What makes this random is the sampling. If we took another sample of one person, we would’ve gotten another height.

  • The sample mean \(\overline{X}\) of a quantitative variable. Again, what makes this random is the sampling. Another sample would yield another \(\overline{X}.\)

  • The slope \(\hat{b}\) of a least squares line. Once more, what makes this random is the sampling. Another sample would yield another \(\hat{b}.\)

In section 6.1, we looked at functions that describe the distribution of a random variable: the PMF, PDF, and CDF. However, it is also important to be able to describe random variables with numerical summaries; for example, with measures of central tendency, spread, and association between two random variables. In this chapter, we focus on such measures and how to use them.

7.1 Expected value

Definition 7.1 (Expected value of a discrete random variable) Let \(X\) be a discrete random variable that takes values \(x_1, \dots , x_m\) with PMF \(f(x) = P(X = x)\). Then the expected value or mean of \(X\) is the sum of each possible value multiplied by its corresponding probability: \[E(X) = x_1 P(X = x_1) + \cdots + x_m P(X = x_m) = \sum_{i=1}^m x_k f(x_k).\]

Example 7.1 Let \(X\) be the number of heads in one coin flip. Notice that \(X\) is a Bernoulli random variable (see section 6.2 for definition). The expected value of \(X\) is: \[E(X) = 0\times \frac{1}{2} + 1\times\frac{1}{2} = \frac{1}{2}.\] In this case, the expected value of \(X\) is not a value that \(X\) can actually take; it is simply the weighted average of its possible values (with equal weights this time).

Definition 7.2 (Expected value of a continuous random variable) Let \(X\) be a continuous random variable with PDF \(f(x).\) The expected value (or mean) of \(X\) is: \[E(X) = \int_{-\infty}^{\infty} x f(x)dx.\]


Notation: The Greek letter \(\mu\) is also used in place of the notation \(E(X),\) for both discrete and continuous random variables.

Remark: We can think of the expected value as a weighted average of a random variable (with more weight given to values that have higher probabilities).

Example 7.2 Consider the random variable \(T\) from example 6.1 (Old Faithful). The PMF for \(T\) is the function:

\[ f(x) = \left\{ \begin{array}{l} \frac{1}{91}, \quad \mbox{if } 0\leq x \leq 91,\\ 0, \quad \mbox{otherwise.} \end{array} \right. \]

Therefore, the expected value of \(T\) is: \[ E(T) = \int_{-\infty}^{\infty}xf(x)dx = \int_0^{91}x\times \frac{1}{91} dx = \left.\frac{1}{91}\frac{1}{2}x^2 \right|_0^{91} = 45.5 \mbox{ minutes.} \]

Theorem 7.1 (Properties of expected value) Let \(X\) be a random variable (discrete or continuous). Then the following properties hold:

  1. \(E(aX+b) = aE(X) + b\), for any real numbers \(a\) and \(b\).
  2. \(E(g(x)) = \sum_k g(x_k)f(x_k)\), if \(X\) is discrete and takes values \(x_1, x_2, \dots.\)
    \(E(g(X)) = \int_{-\infty}^{\infty} g(x) f(x) dx\), if \(X\) is continuous.
  3. \(E(X+Y) = E(X) + E(Y)\), which implies that for any sequence of random variables \(X_1, X_2, \dots,\) \(E(\sum X_i) = \sum E(X_i).\) That is, “the expected value of the sum is the sum of the expected values.”

Proof. Here we prove the first two properties for the discrete case. Let \(X\) be a discrete random variable that takes values \(x_1, x_2, \dots.\) Then:

  1. \(E(aX+b) = \sum_k (ax_k+b) P(aX+b = ax_k+b) = \sum (ax_k+b) P(X = x_k)\)
    \(= a\sum_k x_k f(x_k) + b\sum_k f(x_k) = aE(X) + b.\)
    Here it was used that \(\sum_k f(x_k) = 1\) (see section 6.1 for properties of PMFs.)

  2. The random variable \(g(X)\) takes values \(g(x_1), g(x_2), \dots ,\) with probabilities \(f(x_1), f(x_2), \dots,\) respectivelly. Therefore, \(E(g(X)) = \sum_k g(x_k)f(x_k).\)

Example 7.3 Let \(Y\) be the number of heads in 100 coin flips. Notice that \(Y\) is a binomial random variable (see section 6.2 for definition.) Any binomial random variable can be written as a sum of independent Bernoulli trials, so we can write \[Y=\sum_{i=1}^{100} X_i,\] where each \(X_i\) is a Bernoulli random variable. Then we can use property 3 from theorem 7.1 and the calculation done in example 7.1 to find \(E(Y):\)

\[E(Y) = E\left(\sum_{i=1}^{100} X_i\right) = \sum_{i=1}^{100} E(X_i) = \sum_{i=1}^{100} \frac{1}{2} = 100\times\frac{1}{2} = 50.\]


In general, the expected value of any binomial random variable can be found in a similar way, as stated in the theorem 7.2 below.

Theorem 7.2 (Expected value of binomial) The expected value of a binomial random variable \(X \sim Binom(n,p)\) is \(E(X) = np.\)

Proof. Write \(X\) as a sum of \(n\) independent Bernoulli random variables: \(X = \sum_{i=1}^n X_i,\) where \(P(X_i = 1) = p\) (the probability of “success” is \(p\).) Then, \(E(X_i) = 0\times (1-p) + 1\times p = p.\) This gives: \[ E(X) = E\left(\sum_{i=1}^{n} X_i\right) = \sum_{i=1}^{n} E(X_i) = \sum_{i=1}^{n} p = n p. \]

Theorem 7.3 (Expected value of normal) The expected value of a normal random variable \(X \sim N(\mu,\sigma)\) is \(E(X) = \mu.\)

Proof. We first show that for a standard normal random variable \(Z\), \(E(Z)=0\). The PDF of \(Z\) is \[ f(x) = \frac{1}{\sqrt{2\pi}}e^{x^2/2}, \] and therefore the expected value of \(Z\) is \[ E(Z) = \int_{-\infty}^{\infty}x f(x)dx = \int_{-\infty}^{\infty}x \frac{1}{\sqrt{2\pi}}e^{x^2/2}dx = 0. \] The result above follows because the integrand \(x \frac{1}{\sqrt{2\pi}}e^{x^2/2}\) is an odd14 function of \(x\). Now recall the fact mentioned in section 6.3 that for any normal random variable \(X \sim N(\mu,\sigma)\), the transformation \(Z = \frac{X-\mu}{\sigma}\) is a standard normal, that is, \(Z \sim N(0,1).\) This implies that \(X\) can be written as \(X = \mu + \sigma Z\), and therefore, by the properties of expected value, \[ E(X) = E(\mu + \sigma Z) = \mu + \sigma E(Z) = \mu + \sigma\times 0 = \mu. \]


It should now be clear why the parameter \(\mu\) is called the mean of the normal distribution.

We end this section with an important property of expected values for independent random variables. The concept of independence for random variables follows from the concept of independence of processes introduced in section 5.2: Two random variables are independent if knowing the outcome of one provides no useful information about the outcome of the other.

Theorem 7.4 If two random variables \(X\) and \(Y\) are independent, then \(E(XY) = E(X)E(Y).\)

Proof. We provide a proof for the discrete case. Let \(X\) take values \(x_1, x_2, \dots,\) and let \(Y\) take values \(y_1, y_2, \dots .\) Then \[\begin{eqnarray} E(XY) &=& \sum_{i,j}x_i y_j P(X=x_i \cap Y =y_j) = \sum_i \sum_j x_i y_j P(X=x_i)P(Y =y_j)\\ &=& \sum_i x_i P(X=x_i)\sum_j y_j P(Y =y_j) = E(X) E(Y). \end{eqnarray}\]

7.2 Variance

The expected value of a random variable measures its central tendency. Now we look at a measure of dispersion of a random variable.

Definition 7.3 The variance of a random variable \(X\) (discrete or continuous) is defined as \[ Var(X) = E\left((X-E(X))^2\right). \] The standard deviation of a random variable is \(SD(X) = \sqrt{Var(X)}.\)


The following result provides an alternative way to calculate the variance of a random variable. It usually involves less operations when computing the variance. The formula from the definition 7.3 tends to be more helpful in proving results about variances, while the formula below is more helpful in calculating variances because it involves less operations compared to the definition.

Theorem 7.5 The variance of a random variable \(X\) can also be written as \(Var(X) = E(X^2) - E(X)^2.\)

Proof. Using theorem 7.1, \[\begin{eqnarray} Var(X) &=& E\left((X-E(X))^2\right) = E\left(X^2 - 2XE(X) + E(X)^2\right)\\ &=& E(X^2) - 2E(X)E(X) + E(X)^2 = E(X^2) - E(X)^2. \end{eqnarray}\]

Theorem 7.6 \(Var(aX+b) = a^2 Var(X),\) for any constants \(a\) and \(b.\)

Proof. Using properties of expected values from theorem 7.1, \[\begin{eqnarray} Var(aX+b) &=& E\left((aX+b) - E(aX+b)\right)^2 = E\left(aX+b - (aE(X)+b)\right)^2\\ &=& E\left(aX+b - aE(X)-b\right)^2 = E\left(a(X-E(X))\right)^2\\ &=& E\left(a^2(X-E(X))^2\right) = a^2E\left(X-E(X)\right)^2 = a^2 Var(X). \end{eqnarray}\]

Example 7.4 Let \(X\) be the number of heads in one coin flip. We want to compute \(Var(X)\). We have already found the expected value of \(X\) in example 7.1, which was \(E(X) = \frac{1}{2}.\) To find \(Var(X)\), we use the formula from theorem 7.5, which requires us to find \(E(X^2).\) In what follows, we use property 2 of theorem 7.1 with \(g(X)=X^2.\) \[ E(X^2) = 0^2\times\frac{1}{2} + 1^2\times \frac{1}{2} = \frac{1}{2}. \] This gives \[ Var(X) = E(X^2) - E(X)^2 = \frac{1}{2} - \left(\frac{1}{2}\right)^2 = \frac{1}{4}. \]


The example above calculates the variance of a Bernoulli random variable, which in general is given by the following theorem:

Theorem 7.7 (Variance of Bernoulli) The variance of a Bernoilli random variable \(X\) with probability of success \(p\) is \(Var(X) = p(1-p).\)

Proof. \(Var(X) = E(X^2) - E(X)^2 = 0^2\times (1-p) + 1^2\times p - p^2 = p-p^2 = p(1-p).\)


An immediate consequence of theorem 7.7 is the variance of a binomial random variable:

Theorem 7.8 (Variance of binomial) The variance of a binomial random variable \(Y\sim Binom(n,p)\) is \(Var(Y)=np(1-p).\)

Proof. Write \(Y\) as a sum of \(n\) independent Bernoulli random variables with probability of success \(p\), that is, \(Y = X_1 + X_2 + \cdots + X_n.\) Then \[ Var(Y) = Var\left(\sum_{i=1}^n X_i\right) = \sum_{i=1}^n Var(X_i) = \sum_{i=1}^n p(1-p) = np(1-p). \] Here it was used that \(Var(\sum X_i) = \sum Var(X_i)\) because the \(X_i\)s are independent (see theorem 7.10 below).

Theorem 7.9 (Variance of normal) The variance of a normal random variable \(X \sim N(\mu,\sigma)\) is \(Var(X) = \sigma^2.\)

Proof. We use the fact that for a standard normal random variable \(Z\), \(Var(Z) = 1.\)15 Then, writing \(X = \mu + \sigma Z\), and using theorem 7.6, it follows that: \[ Var(X) = Var(\mu + \sigma Z) = \sigma^2 Var(Z) = \sigma^2\times 1 = \sigma^2. \]

7.3 Covariance and correlation

This section focuses on measures that summarize the relationship between two random variables.

Definition 7.4 (covariance) The covariance between two random variables \(X\) and \(Y\) is the measure \[ Cov(X,Y) = E\left((X-E(X))(Y-E(Y))\right). \]


Alternatively, we can write the covariance with a formula that is easier for computation (are you able to prove this formula?): \[ Cov(X,Y) = E(XY) - E(X)E(Y). \]

An immediate consequence of this formula is that if \(X\) and \(Y\) are independent, then \(Cov(X,Y) = 0.\) (Is it clear why?)

We saw in theorem 7.1 that \(E(X+Y) = E(X) + E(Y)\) for any random variables \(X\) and \(Y\). Is this also the case for variance? What is \(Var(X+Y)\)? It turns out that \(Var(X+Y) = Var(X) + Var(Y)\) only if \(X\) and \(Y\) are independent, whereas for expected value the condition of independence was not necessary. This is a consequence of the following result:

Theorem 7.10 For two random variables \(X\) and \(Y\), \[Var(X+Y) = Var(X) + 2 Cov(X,Y) + Var(Y).\]

Proof. Using the definition of covariance (definition 7.4), it follows that \[\begin{eqnarray} Var(X+Y) &=& E\left((X+Y - E(X+Y))^2\right)\\ &=& E\left((X+Y - (E(X)+E(Y)))^2\right)\\ &=& E\left((X-E(X) + Y-E(Y))^2\right)\\ &=& E\left((X-E(X))^2 + 2(X-E(X))(Y-E(Y)) + (Y-E(Y))^2\right)\\ &=& E(X-E(X))^2 + 2E((X-E(X))(Y-E(Y)) + E(Y-E(Y))^2\\ &=& Var(X) + 2Cov(X,Y) + Var(Y). \end{eqnarray}\]


Therefore, if \(Cov(X,Y) = 0\), which is the case when \(X\) and \(Y\) are independent, then \(Var(X+Y) = Var(X) + Var(Y)\).

The following theorem has several properties of covariance that will be used to prove results in statistical inference.

Theorem 7.11 (Properties of covariance) The following are properties of covariance. They follow from the results presented so far about covariance. Here \(X\), \(Y\), \(X_i\), and \(Y_j\) are random variables.

  1. \(Cov(X,Y) = Cov(Y,X)\)
  2. \(Cov(X,X) = Var(X)\)
  3. \(Cov(aX,Y) = aCov(X,Y)\)
  4. \(Cov(X,bY) = bCov(X,Y)\)
  5. \(Cov(\sum X_i,\sum Y_j) = \sum_i\sum_j Cov(X_i, X_j).\)

Finally, the following definition gives the correlation coefficient between two random variables.

Definition 7.5 (Correlation between random variables) The correlation coefficient between two random variables \(X\) and \(Y\) is defined as \[ \rho(X,Y) = \frac{Cov(X,Y)}{SD(X)SD(Y)}. \]

Like in the case for Pearson’s sample correlation coefficient, it is also the case that the correlation between two random variables is a number between -1 and 1, that is, \(-1\leq \rho(X,Y) \leq 1.\) This result requires a proof, which we omit here.

The following example illustrates calculations of expected values, variance, covariance, and correlation for discrete random variables.

Example 7.5 Let \(D_1\) and \(D_2\) be the numbers obtained when rolling two dice and let \(S = D_1 + 2D_2.\) Find the expected values and variances of \(D_1\), \(D_2\), and \(S\), as well as \(Cov(D_1,S)\) and \(\rho(D_1,S).\)

  1. Expected values:
    \(E(D_1) = 1\times\frac{1}{6} + 2\times\frac{1}{6} + 3\times\frac{1}{6} + 4\times\frac{1}{6} + 5\times\frac{1}{6} + 6\times\frac{1}{6} = 3.5.\)
    \(E(D_2) = E(D_1) = 3.5.\)
    \(E(S) = E(D_1 + 2D_2) = E(D_1) + 2E(D_2) = 3.5 + 2\times 3.5 = 10.5.\)

  2. Variances:
    To find the variance of \(D_1\), we need \(E(D_1^2)\), which is:
    \(E(D_1^2) = 1^2\times\frac{1}{6} + 2^2\times\frac{1}{6} + 3^2\times\frac{1}{6} + 4^2\times\frac{1}{6} + 5^2\times\frac{1}{6} + 6^2\times\frac{1}{6} = 15.167.\)
    \(Var(D_1) = E(D_1^2) - E(D_1)^2 = 15.167 - 3.5^2 = 2.917.\)
    \(Var(D_2) = Var(D_1) = 2.917.\)
    \(Var(S) = Var(D_1 + 2D_2) = Var(D_1) + 2Cov(D_1,2D_2) + Var(2D_2)\)
    \(= Var(D_1) + 4Cov(D_1,D_2) + 2^2Var(D_2) = 2.917 + 4\times 0 + 4\times 2.197 = 10.985.\)

  3. Covariance and correlation:
    \(Cov(D_1,S) = Cov(D_1, D_1 + 2D_2) = Cov(D_1, D_1) + Cov(D_1, 2D_2)\)
    \(= Var(D_1) + 2Cov(D_1,D_2) = 2.917 + 2\times 0. = 2.917.\)
    \(\rho(D_1,S) = \frac{Cov(D_1,S)}{SD(D_1)SD(S)} = \frac{2.197}{\sqrt{2.197}\sqrt{10.985}} = 0.858.\)


  1. A function \(f\) is odd if \(f(-x) = -f(x)\) for all \(x\).↩︎

  2. This follows from the following calculation: \(Var(Z) = E(Z^2) - E(Z)^2 = E(Z^2) - 0^2 = \int_{-\infty}^{\infty} x^2 \frac{1}{\sqrt{2\pi}}e^{-x^2/2}dx = 1.\) The integral can be solved using integration by parts with \(u=x\) and \(dv = xe{-x^2/2}.\)↩︎