## 2.3 General Math

Chebyshev’s Inequality Let X be a random variable with mean $$\mu$$ and standard deviation $$\sigma$$. Then for any positive number k:

$P(|X-\mu| < k\sigma) \ge 1 - \frac{1}{k^2}$

Chebyshev’s Inequality does not require that X be normally distributed

Maclaurin series expansion for

$e^z = 1 + z + \frac{z^2}{2!} + \frac{z^3}{3!} + ...$

Geometric series:

$s_n=\sum_{k=1}^{n}ar^{n-1}=\frac{a(1-r^n)}{1-r}$

if |r| < 1

$s=\sum_{k=1}^{\infty}ar^{n-1}=\frac{a}{1-r}$

### 2.3.1 Law of large numbers

Let $$X_1,X_2,...$$ be an infinite sequence of independent and identically distributed (i.i.d) Then, the sample average is

$\bar{X}_n =\frac{1}{n} (X_1 + ... + X_n)$

converges to the expected value ($$\bar{X}_n \rightarrow \mu$$) as $$n \rightarrow \infty$$

$Var(X_i) = Var(\frac{1}{n}(X_1 + ... + X_n)) = \frac{1}{n^2}Var(X_1 + ... + X_n)= \frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n}$

The difference between Weak Law and Strong Law regards the mode of convergence

#### 2.3.1.1 Weak Law

The sample average converges in probability towards the expected value

$\bar{X}_n \rightarrow^{p} \mu$

when $$n \rightarrow \infty$$

$\lim_{n\to \infty}P(|\bar{X}_n - \mu| > \epsilon) = 0$

The sample mean from a iid random sample ($$\{ x_i \}_{i=1}^n$$) from any population with a finite mean and finite variance $$\sigma^2$$ is ca consistent estimation for the population mean $$\mu$$

$plim(\bar{x})=plim(n^{-1}\sum_{i=1}^{n}x_i) =\mu$

#### 2.3.1.2 Strong Law

The sample average converges almost surely to the expected value

$\bar{X}_n \rightarrow^{a.s} \mu$

when $$n \rightarrow \infty$$

Equivalently,

$P(\lim_{n\to \infty}\bar{X}_n =\mu) =1$

### 2.3.2 Law of Iterated Expectation

Let X, Y be random variables. Then,

$E(X) = E(E(X|Y))$

means that the expected value of X can be calculated from the probability distribution of X|Y and Y

### 2.3.3 Convergence

#### 2.3.3.1 Convergence in Probability

• $$n \rightarrow \infty$$, an estimator (random variable) that is close to the true value.
• The random variable $$\theta_n$$ converges in probability to a constant c if

$\lim_{n\to \infty}P(|\theta_n - c| \ge \epsilon) = 0$

for any positive $$\epsilon$$

Notation

$plim(\theta_n)=c$

Equivalently,

$\theta_n \rightarrow^p c$

Properties of Convergence in Probability

• Slutsky’s Theorem: for a continuous function g(.), if $$plim(\theta_n)= \theta$$ then $$plim(g(\theta_n)) = g(\theta)$$

• if $$\gamma_n \rightarrow^p \gamma$$ then

• $$plim(\theta_n + \gamma_n)=\theta + \gamma$$ + $$plim(\theta_n \gamma_n) = \theta \gamma$$ + $$plim(\theta_n/\gamma_n) = \theta/\gamma$$ if $$\gamma \neq 0$$
• Also hold for random vectors/ matrices

#### 2.3.3.2 Convergence in Distribution

• As $$n \rightarrow \infty$$, the distribution of a random variable may converge towards another (“fixed”) distribution.
• The random variable $$X_n$$ with CDF $$F_n(x)$$ converges in distribution to a random variable X with CDF $$F(X)$$ if

$\lim_{n\to \infty}|F_n(x) - F(x)| = 0$

at all points of continuity of $$F(X)$$

Notation F(x) is the limiting distribution of $$X_n$$ or $$X_n \rightarrow^d X$$

• E(X) is the limiting mean (asymptotic mean)
• Var(X) is the limiting variance (asymptotic variance)

Note

$E(X) \neq \lim_{n\to \infty}E(X_n) \\ Avar(X_n) \neq \lim_{n\to \infty}Var(X_n)$

Properties of Convergence in Distribution

• Continuous Mapping Theorem: for a continuous function g(.), if $$X_n \to^{d} g(X)$$ then $$g(X_n) \to^{d} g(X)$$

• if $$Y_n\to^{d} c$$, then

• $$X_n + Y_n \to^{d} X + c$$

• $$Y_nX_n \to^{d} cX$$

• $$X_nY_n \to^{d} X/c$$ if $$c \neq 0$$

• also hold for random vectors/matrices

#### 2.3.3.3 Summary

Properties of Convergence

Probability Distribution
Slutsky’s Theorem: for a continuous function g(.), if $$plim(\theta_n)= \theta$$ then $$plim(g(\theta_n)) = g(\theta)$$ Continuous Mapping Theorem: for a continuous function g(.), if $$X_n \to^{d} g(X)$$ then $$g(X_n) \to^{d} g(X)$$
if $$\gamma_n \rightarrow^p \gamma$$ then if $$Y_n\to^{d} c$$, then
$$plim(\theta_n + \gamma_n)=\theta + \gamma$$ $$X_n + Y_n \to^{d} X + c$$
$$plim(\theta_n \gamma_n) = \theta \gamma$$ $$Y_nX_n \to^{d} cX$$
$$plim(\theta_n/\gamma_n) = \theta/\gamma$$ if $$\gamma \neq 0$$ $$X_nY_n \to^{d} X/c$$ if $$c \neq 0$$

Convergence in Probability is stronger than Convergence in Distribution. However, Convergence in Distribution does not guarantee Convergence in Probability

### 2.3.4 Sufficient Statistics

Likelihood

• describes the extent to which the sample provides support for any particular parameter value.
• Higher support corresponds to a higher value for the likelihood
• The exact value of any likelihood is meaningless,
• The relative value, (i.e., comparing two values of $$\theta$$), is informative.

$L(\theta_0; y) = P(Y = y | \theta = \theta_0) = f_Y(y;\theta_0)$

Likelihood Ratio

$\frac{L(\theta_0;y)}{L(\theta_1;y)}$

Likelihood Function

For a given sample, you can create likelihoods for all possible values of $$\theta$$, which is called likelihood function

$L(\theta) = L(\theta; y) = f_Y(y;\theta)$

In a sample of size n, the likelihood function takes the form of a product

$L(\theta) = \prod_{i=1}^{n}f_i (y_i;\theta)$

Equivalently, the log likelihood function

$l(\theta) = \sum_{i=1}^{n} logf_i(y_i;\theta)$

Sufficient statistics

• A statistic, T(y), is any quantity that can be calculated purely from a sample (independent of $$\theta$$)
• A statistic is sufficient if it conveys all the available information about the parameter.

$L(\theta; y) = c(y)L^*(\theta;T(y))$

Nuisance parameters If we are interested in a parameter (e.g., mean). Other parameters requiring estimation (e.g., standard deviation) are nuisance parameters. We can replace nuisance parameters in likelihood function with their estimates to create a **profile likelihood*.

### 2.3.5 Parameter transformations

log-odds transformation

$Log odds = g(\theta)= ln[\frac{\theta}{1-\theta}]$

log transformation