2.3 General Math
2.3.1 Number Sets
Notation | Denotes | Examples |
---|---|---|
\(\emptyset\) | Empty set | No members |
\(\mathbb{N}\) | Natural numbers | \(\{1, 2, ...\}\) |
\(\mathbb{Z}\) | Integers | \(\{ ..., -1, 0, 1, ...\}\) |
\(\mathbb{Q}\) | Rational numbers | including fractions |
\(\mathbb{R}\) | Real numbers | |
\(\mathbb{C}\) | Complex numbers |
2.3.2 Summation Notation and Series
Chebyshev’s Inequality Let \(X\) be a random variable with mean \(\mu\) and standard deviation \(\sigma\). Then for any positive number \(k\):
\[ P(|X-\mu| < k\sigma) \ge 1 - \frac{1}{k^2} \]
Chebyshev’s Inequality does not require that \(X\) be normally distributed
Geometric sum
\[ \sum_{k=0}^{n-1} ar^k = a\frac{1-r^n}{1-r} \]
where \(r \neq 1\)
Geometric series
\[ \sum_{k=0}^\infty ar^k = \frac{a}{1-r} \]
where \(|r| <1\)
Binomial theorem
\[ (x + y)^n = \sum_{k=0}^n \binom{n}{k} x^{n-k} y^k \]
where \(n \ge 0\)
Binomial series
\[ \sum_k \binom{\alpha}{k} x^k = (1 +x)^\alpha \]
\(|x| < 1\) if \(\alpha \neq n \ge 0\)
Telescoping sum
When terms of a sum cancel each other out, leaving one term (i.e., it collapses like a telescope), we call it a telescoping sum
\[ \sum_{a \le k < b} \Delta F(k) = F(b) - F(a) \]
where \(a \le b\) and \(a, b \in \mathbb{Z}\)
Vandermonde convolution
\[ \sum_k \binom{r}{k} \binom{s}{n-k} = \binom{r+s}{n} \]
\(n \in \mathbb{Z}\)
Exponential series
\[ \sum_{k=0}^\infty \frac{x^k}{k!} = e^x \]
where \(x \in \mathbb{C}\)
Taylor series
\[ \sum_{k=0}^{\infty} \frac{f^{(k)}(a)}{k!} (x-a)^k = f(x) \]
where \(|x-a| < R =\) radius of convergence
when \(a = 0\), we have
Maclaurin series expansion for
\[ e^z = 1 + z + \frac{z^2}{2!} + \frac{z^3}{3!} + ... \]
Euler’s summation formula
\[\sum_{a \le k < b} f(k) = \int_a^b f(x) dx + \sum_{k=1}^m\frac{B_k}{k!} f^{(k-1)}(x) |_a^b \\+ (-1)^{m+1} \int^b_a \frac{B_m (x-|x|)}{m!} f^{(m)}(x)dx\] where \(a,b, c \in \mathbb{Z}\) and \(a \le b, m \ge 1\)
when \(m = 1\), we have trapezoidal rule
\[ \sum_{a \le k < b} f(k) \approx \int_a^b f(x) dx - \frac{1}{2} (f(b) - f(a)) \]
2.3.3 Taylor Expansion
A differentiable function, \(G(x)\) can be written as an infinite sum of its derivatives.
More specifically, an infinitely differentiable \(G(x)\) evaluated at \(a\) is
\[ G(x) = G(a) + \frac{G'(a)}{1!} (x-a) + \frac{G''(a)}{2!}(x-a) + \frac{G'''(a)}{3!}(x-a)^3 + \dots \]
2.3.4 Law of large numbers
Let \(X_1,X_2,...\) be an infinite sequence of independent and identically distributed (i.i.d)
Then, the sample average is
\[ \bar{X}_n =\frac{1}{n} (X_1 + ... + X_n) \]
converges to the expected value (\(\bar{X}_n \rightarrow \mu\)) as \(n \rightarrow \infty\)
\[ Var(X_i) = Var(\frac{1}{n}(X_1 + ... + X_n)) = \frac{1}{n^2}Var(X_1 + ... + X_n)= \frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n} \]
Note: The connection between the LLN and the normal distribution lies in the Central Limit Theorem. The CLT states that, regardless of the original distribution of a dataset, the distribution of the sample means will tend to follow a normal distribution as the sample size becomes larger.
The difference between Weak Law and Strong Law regards the mode of convergence
2.3.4.1 Weak Law
The sample average converges in probability towards the expected value
\[ \bar{X}_n \rightarrow^{p} \mu \]
when \(n \rightarrow \infty\)
\[ \lim_{n\to \infty}P(|\bar{X}_n - \mu| > \epsilon) = 0 \]
The sample mean of an iid random sample (\(\{ x_i \}_{i=1}^n\)) from any population with a finite mean and finite variance \(\sigma^2\) iis a consistent estimator of the population mean \(\mu\)
\[ plim(\bar{x})=plim(n^{-1}\sum_{i=1}^{n}x_i) =\mu \]
2.3.5 Law of Iterated Expectation
Let \(X, Y\) be random variables. Then,
\[ E(X) = E(E(X|Y)) \]
means that the expected value of X can be calculated from the probability distribution of \(X|Y\) and \(Y\)
2.3.6 Convergence
2.3.6.1 Convergence in Probability
- \(n \rightarrow \infty\), an estimator (random variable) that is close to the true value.
- The random variable \(\theta_n\) converges in probability to a constant \(c\) if
\[ \lim_{n\to \infty}P(|\theta_n - c| \ge \epsilon) = 0 \]
for any positive \(\epsilon\)
Notation
\[ plim(\theta_n)=c \]
Equivalently,
\[ \theta_n \rightarrow^p c \]
Properties of Convergence in Probability
Slutsky’s Theorem: for a continuous function g(.), if \(plim(\theta_n)= \theta\) then \(plim(g(\theta_n)) = g(\theta)\)
if \(\gamma_n \rightarrow^p \gamma\) then
- \(plim(\theta_n + \gamma_n)=\theta + \gamma\) + \(plim(\theta_n \gamma_n) = \theta \gamma\) + \(plim(\theta_n/\gamma_n) = \theta/\gamma\) if \(\gamma \neq 0\)
Also hold for random vectors/ matrices
2.3.6.2 Convergence in Distribution
- As \(n \rightarrow \infty\), the distribution of a random variable may converge towards another (“fixed”) distribution.
- The random variable \(X_n\) with CDF \(F_n(x)\) converges in distribution to a random variable \(X\) with CDF \(F(X)\) if
\[ \lim_{n\to \infty}|F_n(x) - F(x)| = 0 \]
at all points of continuity of \(F(X)\)
Notation \(F(x)\) is the limiting distribution of \(X_n\) or \(X_n \rightarrow^d X\)
- \(E(X)\) is the limiting mean (asymptotic mean)
- \(Var(X)\) is the limiting variance (asymptotic variance)
Note
\[ \begin{aligned} E(X) &\neq \lim_{n\to \infty}E(X_n) \\ Avar(X_n) &\neq \lim_{n\to \infty}Var(X_n) \end{aligned} \]
Properties of Convergence in Distribution
Continuous Mapping Theorem: for a continuous function g(.), if \(X_n \to^{d} g(X)\) then \(g(X_n) \to^{d} g(X)\)
If \(Y_n\to^{d} c\), then
\(X_n + Y_n \to^{d} X + c\)
\(Y_nX_n \to^{d} cX\)
\(X_nY_n \to^{d} X/c\) if \(c \neq 0\)
also hold for random vectors/matrices
2.3.6.3 Summary
Properties of Convergence
Probability | Distribution |
---|---|
Slutsky’s Theorem: for a continuous function g(.), if \(plim(\theta_n)= \theta\) then \(plim(g(\theta_n)) = g(\theta)\) | Continuous Mapping Theorem: for a continuous function g(.), if \(X_n \to^{d} g(X)\) then \(g(X_n) \to^{d} g(X)\) |
if \(\gamma_n \rightarrow^p \gamma\) then | if \(Y_n\to^{d} c\), then |
\(plim(\theta_n + \gamma_n)=\theta + \gamma\) | \(X_n + Y_n \to^{d} X + c\) |
\(plim(\theta_n \gamma_n) = \theta \gamma\) | \(Y_nX_n \to^{d} cX\) |
\(plim(\theta_n/\gamma_n) = \theta/\gamma\) if \(\gamma \neq 0\) | \(X_nY_n \to^{d} X/c\) if \(c \neq 0\) |
Convergence in Probability is stronger than Convergence in Distribution.
Hence, Convergence in Distribution does not guarantee Convergence in Probability
2.3.7 Sufficient Statistics
Likelihood
- describes the extent to which the sample provides support for any particular parameter value.
- Higher support corresponds to a higher value for the likelihood
- The exact value of any likelihood is meaningless,
- The relative value, (i.e., comparing two values of \(\theta\)), is informative.
\[ L(\theta_0; y) = P(Y = y | \theta = \theta_0) = f_Y(y;\theta_0) \]
Likelihood Ratio
\[ \frac{L(\theta_0;y)}{L(\theta_1;y)} \]
Likelihood Function
For a given sample, you can create likelihoods for all possible values of \(\theta\), which is called likelihood function
\[ L(\theta) = L(\theta; y) = f_Y(y;\theta) \]
In a sample of size n, the likelihood function takes the form of a product
\[ L(\theta) = \prod_{i=1}^{n}f_i (y_i;\theta) \]
Equivalently, the log likelihood function
\[ l(\theta) = \sum_{i=1}^{n} logf_i(y_i;\theta) \]
Sufficient statistics
- A statistic, \(T(y)\), is any quantity that can be calculated purely from a sample (independent of \(\theta\))
- A statistic is sufficient if it conveys all the available information about the parameter.
\[ L(\theta; y) = c(y)L^*(\theta;T(y)) \]
Nuisance parameters If we are interested in a parameter (e.g., mean). Other parameters requiring estimation (e.g., standard deviation) are nuisance parameters. We can replace nuisance parameters in likelihood function with their estimates to create a profile likelihood.