## 2.9 Variance and covariance

### 2.9.1 Variance and standard deviation

**Definition 2.15**The

**variance**of a random variable \(X\) is \[\begin{equation*} \textrm{Var}(X) = \textrm{E}\left[\left(X-\textrm{E}(X)\right)^2\right] \end{equation*}\] The

**standard deviation**of a random variable is \[\begin{equation*} \textrm{SD}(X) = \sqrt{\textrm{Var}(X)} \end{equation*}\]

Variance is usually computed using the following equivalent formula.

\[ \textrm{Var}(X) = \textrm{E}\left(X^2\right) - \left(\textrm{E}\left(X\right)\right)^2 \]

If \(X\) is a RV and \(a, b\) are non-random constants then
\[\begin{aligned}
\textrm{Var}(aX + b) & = a^2\textrm{Var}(X)\\
\textrm{SD}(aX + b) & = |a|\textrm{SD}(X)\end{aligned}\]
If \(X\) is a RV with expected value \(\textrm{E}(X)\) and standard
deviation \(\textrm{SD}(X)\), then the **standardized RV** is
\[Z = \frac{X - \textrm{E}(X)}{\textrm{SD}(X)}\] which has expected
\(\textrm{E}(Z)=0\) and \(\textrm{SD}(Z)=1\).

For each outcome \(\omega\), the standardized value \(Z(\omega)\) measures how far that value is away from the expected value relative to the degree of variability of values of the RV.

\(X\) is measured in measurement units, and the standardized RV \(Z\) is measured in standarized units — “standard deviations away from the mean”.

Chebyshev’s inequality puts bounds on the probability that a RV takes a value far from its expected value.

For any RV \(X\) and any constant \(a>0\) \[\textrm{P}\left(|X-\textrm{E}(X)|\ge a\right)\le \frac{\textrm{Var}(X)}{a^2}\]

The statement of Chebyshev’s inequality above bounds the probability
that an RV is \(a\) *measurement units* away from its mean. An alternative
expression of Chebyshev’s inequality bounds the probability that an RV
is \(a\) *standard deviations* away from its mean.
\[\textrm{P}\left(\frac{|X-\textrm{E}(X)|}{\textrm{SD}(X)}\ge a\right)\le \frac{1}{a^2}\]

### 2.9.2 Covariance and correlation

Quantities like expected value and variance summarize characteristics of the marginal distribution of a single RV.

When there are two RVs their joint distribution is of interest. Covariance summarizes in a single number a characteristic of the joint distribution of two RVs, namely, the degree to which they “vary together”.

The **covariance** between two RVs \(X\) and \(Y\) is
\[\textrm{Cov}(X,Y) = \textrm{E}\left[\left(X-\textrm{E}[X]\right)\left(Y-\textrm{E}[Y]\right)\right]\]

It is usually easier to compute covariance using the following short-cut
formula
\[\textrm{Cov}(X,Y) = \textrm{E}(XY) - \textrm{E}(X)\textrm{E}(Y)\] That
is, covariance is the expected value of the product minus the product of
expected values.

Variance is a measure of the degree of variability of a distribution, and this measure has many nice theoretical properties. However, standard deviation is often a better practical measure of variability.

Analogously, covariance is a measure of joint association between two
RVs that has many nice theoretical properties, but the *correlation
coefficient* is often a more practical measure.

The **correlation coefficient** between \(X\) and \(Y\), often denoted
\(\rho_{X,Y}\), is
\[\textrm{Corr}(X,Y) = \frac{\textrm{Cov}(X,Y)}{{\textrm{SD}(X)}{\textrm{SD}(Y)}}\]

It can be shown that \[\textrm{Corr}(X,Y) = \textrm{Cov}\left(\frac{X-\textrm{E}(X)}{\textrm{SD}(X)},\frac{Y-\textrm{E}(Y)}{\textrm{SD}(Y)}\right).\] That is, the correlation coefficient for two RVs is the covariance between the corresponding standardized RVs.

A correlation coefficient is a standardized measure of the strength of
the *linear* association between two RVs

For any RVs, \(-1\le \textrm{Corr}(X,Y)\le 1\)

\(\textrm{Corr}(X,Y) = 1\) if and only if \(Y=aX+b\) for some \(a>0\)

\(\textrm{Corr}(X,Y) = -1\) if and only if \(Y=aX+b\) for some \(a<0\)

The sign of the correlation coefficient indicates the overall direction of the association

\(\textrm{Corr}(X,Y)>0\) if above average values of \(X\) tend to be associated with above average values of \(Y\) (positive association)

\(\textrm{Corr}(X,Y)<0\) if above average values of \(X\) tend to be associated with below average values of \(Y\) (negative association)

A correlation coefficient has no units

A correlation coefficient is not affected by linear changes of scale

### 2.9.3 Variance of sums

\[\begin{align*} \textrm{Var}(X + Y) & = \textrm{Var}(X) + \textrm{Var}(Y) + 2\textrm{Cov}(X, Y)\\ \textrm{Var}(X - Y) & = \textrm{Var}(X) + \textrm{Var}(Y) - 2\textrm{Cov}(X, Y) \end{align*}\]

### 2.9.4 Other moments

Expected values and variance are summary characteristics of a distribution. While these are typically the two most important characteristics, there are other summary characteristics like “skewness” or “kurtosis”. Higher order characteristics of a distribution are defined via “moments”.

The \(k\)**th moment** of \(X\) is \(\textrm{E}(X^k\)).

The \(k\)th moment of a RV exists as long as \(\textrm{E}(|X|^k)<\infty\).

Whether or not a certain moment exists depends on how quickly the tails of the distribution go to 0. (The tails of a distribution refer to the probabilities of values large in magnitude.)

If the \(k\)th moment of a distribution exists, then the \(j\)th moment exists for all \(j<k\).

The third moment is related to “skewness”, and the fourth moment is related to “kurtosis”.

Expected value, variance, and moments are summary characteristics of a distribution, so if two random variables have the same distribution then they have all the same summary characteristics. Conversely, if for any summary characteristic you pick the random variables yield the same value, it seems reasonable that they must share the same distribution. Unfortunately, it’s not enough to just compare (polynomial) moments.

Random variables \(X\) and \(Y\) have the same distribution if and only if \(\textrm{E}[g(X)]=\textrm{E}[g(Y)]\) for all functions \(g\) (for which the expected values are defined).

**Theorem 2.2**Random variables \(X\) and \(Y\) have the same distribution if and only if \(\textrm{E}[g(X)]=\textrm{E}[g(Y)]\) for all functions \(g\) (for which the expected values are defined).

Expected values are summary characteristics of a distribution, so if two random variables have the same distribution then they have all the same summary characteristics. Conversely, if for any summary characteristic you pick the random variables yield the same value, it seems reasonable that they must share the same distribution.