## 2.3 General Math

### 2.3.1 Number Sets

Notation Denotes Examples
$$\emptyset$$ Empty set No members
$$\mathbb{N}$$ Natural numbers $$\{1, 2, ...\}$$
$$\mathbb{Z}$$ Integers $$\{ ..., -1, 0, 1, ...\}$$
$$\mathbb{Q}$$ Rational numbers including fractions
$$\mathbb{R}$$ Real numbers
$$\mathbb{C}$$ Complex numbers

### 2.3.2 Summation Notation and Series

Chebyshev’s Inequality Let $$X$$ be a random variable with mean $$\mu$$ and standard deviation $$\sigma$$. Then for any positive number $$k$$:

$P(|X-\mu| < k\sigma) \ge 1 - \frac{1}{k^2}$

Chebyshev’s Inequality does not require that $$X$$ be normally distributed

Geometric sum

$\sum_{k=0}^{n-1} ar^k = a\frac{1-r^n}{1-r}$

where $$r \neq 1$$

Geometric series

$\sum_{k=0}^\infty ar^k = \frac{a}{1-r}$

where $$|r| <1$$

Binomial theorem

$(x + y)^n = \sum_{k=0}^n \binom{n}{k} x^{n-k} y^k$

where $$n \ge 0$$

Binomial series

$\sum_k \binom{\alpha}{k} x^k = (1 +x)^\alpha$

$$|x| < 1$$ if $$\alpha \neq n \ge 0$$

Telescoping sum

When terms of a sum cancel each other out, leaving one term (i.e., it collapses like a telescope), we call it a telescoping sum

$\sum_{a \le k < b} \Delta F(k) = F(b) - F(a)$

where $$a \le b$$ and $$a, b \in \mathbb{Z}$$

Vandermonde convolution

$\sum_k \binom{r}{k} \binom{s}{n-k} = \binom{r+s}{n}$

$$n \in \mathbb{Z}$$

Exponential series

$\sum_{k=0}^\infty \frac{x^k}{k!} = e^x$

where $$x \in \mathbb{C}$$

Taylor series

$\sum_{k=0}^{\infty} \frac{f^{(k)}(a)}{k!} (x-a)^k = f(x)$

where $$|x-a| < R =$$ radius of convergence

when $$a = 0$$, we have

Maclaurin series expansion for

$e^z = 1 + z + \frac{z^2}{2!} + \frac{z^3}{3!} + ...$

Euler’s summation formula

$\sum_{a \le k < b} f(k) = \int_a^b f(x) dx + \sum_{k=1}^m\frac{B_k}{k!} f^{(k-1)}(x) |_a^b \\+ (-1)^{m+1} \int^b_a \frac{B_m (x-|x|)}{m!} f^{(m)}(x)dx$ where $$a,b, c \in \mathbb{Z}$$ and $$a \le b, m \ge 1$$

when $$m = 1$$, we have trapezoidal rule

$\sum_{a \le k < b} f(k) \approx \int_a^b f(x) dx - \frac{1}{2} (f(b) - f(a))$

### 2.3.3 Taylor Expansion

A differentiable function, $$G(x)$$ can be written as an infinite sum of its derivatives.

More specifically, an infinitely differentiable $$G(x)$$ evaluated at $$a$$ is

$G(x) = G(a) + \frac{G'(a)}{1!} (x-a) + \frac{G''(a)}{2!}(x-a) + \frac{G'''(a)}{3!}(x-a)^3 + \dots$

### 2.3.4 Law of large numbers

Let $$X_1,X_2,...$$ be an infinite sequence of independent and identically distributed (i.i.d)

Then, the sample average is

$\bar{X}_n =\frac{1}{n} (X_1 + ... + X_n)$

converges to the expected value ($$\bar{X}_n \rightarrow \mu$$) as $$n \rightarrow \infty$$

$Var(X_i) = Var(\frac{1}{n}(X_1 + ... + X_n)) = \frac{1}{n^2}Var(X_1 + ... + X_n)= \frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n}$

Note: The connection between the LLN and the normal distribution lies in the Central Limit Theorem. The CLT states that, regardless of the original distribution of a dataset, the distribution of the sample means will tend to follow a normal distribution as the sample size becomes larger.

The difference between Weak Law and Strong Law regards the mode of convergence

#### 2.3.4.1 Weak Law

The sample average converges in probability towards the expected value

$\bar{X}_n \rightarrow^{p} \mu$

when $$n \rightarrow \infty$$

$\lim_{n\to \infty}P(|\bar{X}_n - \mu| > \epsilon) = 0$

The sample mean of an iid random sample ($$\{ x_i \}_{i=1}^n$$) from any population with a finite mean and finite variance $$\sigma^2$$ iis a consistent estimator of the population mean $$\mu$$

$plim(\bar{x})=plim(n^{-1}\sum_{i=1}^{n}x_i) =\mu$

#### 2.3.4.2 Strong Law

The sample average converges almost surely to the expected value

$\bar{X}_n \rightarrow^{a.s} \mu$

when $$n \rightarrow \infty$$

Equivalently,

$P(\lim_{n\to \infty}\bar{X}_n =\mu) =1$

### 2.3.5 Law of Iterated Expectation

Let $$X, Y$$ be random variables. Then,

$E(X) = E(E(X|Y))$

means that the expected value of X can be calculated from the probability distribution of $$X|Y$$ and $$Y$$

### 2.3.6 Convergence

#### 2.3.6.1 Convergence in Probability

• $$n \rightarrow \infty$$, an estimator (random variable) that is close to the true value.
• The random variable $$\theta_n$$ converges in probability to a constant $$c$$ if

$\lim_{n\to \infty}P(|\theta_n - c| \ge \epsilon) = 0$

for any positive $$\epsilon$$

Notation

$plim(\theta_n)=c$

Equivalently,

$\theta_n \rightarrow^p c$

Properties of Convergence in Probability

• Slutsky’s Theorem: for a continuous function g(.), if $$plim(\theta_n)= \theta$$ then $$plim(g(\theta_n)) = g(\theta)$$

• if $$\gamma_n \rightarrow^p \gamma$$ then

• $$plim(\theta_n + \gamma_n)=\theta + \gamma$$ + $$plim(\theta_n \gamma_n) = \theta \gamma$$ + $$plim(\theta_n/\gamma_n) = \theta/\gamma$$ if $$\gamma \neq 0$$
• Also hold for random vectors/ matrices

#### 2.3.6.2 Convergence in Distribution

• As $$n \rightarrow \infty$$, the distribution of a random variable may converge towards another (“fixed”) distribution.
• The random variable $$X_n$$ with CDF $$F_n(x)$$ converges in distribution to a random variable $$X$$ with CDF $$F(X)$$ if

$\lim_{n\to \infty}|F_n(x) - F(x)| = 0$

at all points of continuity of $$F(X)$$

Notation $$F(x)$$ is the limiting distribution of $$X_n$$ or $$X_n \rightarrow^d X$$

• $$E(X)$$ is the limiting mean (asymptotic mean)
• $$Var(X)$$ is the limiting variance (asymptotic variance)

Note

\begin{aligned} E(X) &\neq \lim_{n\to \infty}E(X_n) \\ Avar(X_n) &\neq \lim_{n\to \infty}Var(X_n) \end{aligned}

Properties of Convergence in Distribution

• Continuous Mapping Theorem: for a continuous function g(.), if $$X_n \to^{d} g(X)$$ then $$g(X_n) \to^{d} g(X)$$

• If $$Y_n\to^{d} c$$, then

• $$X_n + Y_n \to^{d} X + c$$

• $$Y_nX_n \to^{d} cX$$

• $$X_nY_n \to^{d} X/c$$ if $$c \neq 0$$

• also hold for random vectors/matrices

#### 2.3.6.3 Summary

Properties of Convergence

Probability Distribution
Slutsky’s Theorem: for a continuous function g(.), if $$plim(\theta_n)= \theta$$ then $$plim(g(\theta_n)) = g(\theta)$$ Continuous Mapping Theorem: for a continuous function g(.), if $$X_n \to^{d} g(X)$$ then $$g(X_n) \to^{d} g(X)$$
if $$\gamma_n \rightarrow^p \gamma$$ then if $$Y_n\to^{d} c$$, then
$$plim(\theta_n + \gamma_n)=\theta + \gamma$$ $$X_n + Y_n \to^{d} X + c$$
$$plim(\theta_n \gamma_n) = \theta \gamma$$ $$Y_nX_n \to^{d} cX$$
$$plim(\theta_n/\gamma_n) = \theta/\gamma$$ if $$\gamma \neq 0$$ $$X_nY_n \to^{d} X/c$$ if $$c \neq 0$$

Convergence in Probability is stronger than Convergence in Distribution.

Hence, Convergence in Distribution does not guarantee Convergence in Probability

### 2.3.7 Sufficient Statistics

Likelihood

• describes the extent to which the sample provides support for any particular parameter value.
• Higher support corresponds to a higher value for the likelihood
• The exact value of any likelihood is meaningless,
• The relative value, (i.e., comparing two values of $$\theta$$), is informative.

$L(\theta_0; y) = P(Y = y | \theta = \theta_0) = f_Y(y;\theta_0)$

Likelihood Ratio

$\frac{L(\theta_0;y)}{L(\theta_1;y)}$

Likelihood Function

For a given sample, you can create likelihoods for all possible values of $$\theta$$, which is called likelihood function

$L(\theta) = L(\theta; y) = f_Y(y;\theta)$

In a sample of size n, the likelihood function takes the form of a product

$L(\theta) = \prod_{i=1}^{n}f_i (y_i;\theta)$

Equivalently, the log likelihood function

$l(\theta) = \sum_{i=1}^{n} logf_i(y_i;\theta)$

Sufficient statistics

• A statistic, $$T(y)$$, is any quantity that can be calculated purely from a sample (independent of $$\theta$$)
• A statistic is sufficient if it conveys all the available information about the parameter.

$L(\theta; y) = c(y)L^*(\theta;T(y))$

Nuisance parameters If we are interested in a parameter (e.g., mean). Other parameters requiring estimation (e.g., standard deviation) are nuisance parameters. We can replace nuisance parameters in likelihood function with their estimates to create a profile likelihood.

### 2.3.8 Parameter transformations

log-odds transformation

$Log odds = g(\theta)= ln[\frac{\theta}{1-\theta}]$