Chapter 18: statistics

18.1 Hung Hung

https://www.youtube.com/playlist?list=PLTpF-A8hKVUOqfNyA6mOD6lo2cc6clZZP

https://www.youtube.com/watch?v=3S2r4XBzKts

population

普查 vs. 統計

random variable

\[ X \]

sample has randomness

probability function

\[ \mathrm{P}_{{\scriptscriptstyle X}}\left(E\right)\in\left[0,1\right] \]

event = subset of sample space

\[ E \]

input event with output probability in 0 to 1

\[ \mathrm{P}_{{\scriptscriptstyle X}}:\left\{ E_{{\scriptscriptstyle i}}\right\} _{{\scriptscriptstyle i\in I}}\rightarrow\left[0,1\right] \]

target of interest

  • probability function

but events are hard to be listed or enumerated

\[ X:\left\{ \omega_{{\scriptscriptstyle i}}\right\} _{{\scriptscriptstyle i}\in I}\rightarrow\mathbb{R} \]

CDF = cumulative distribution function

\[ F_{{\scriptscriptstyle X}}\left(x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x\right]\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right) \]

real function is much easier to be operable, there is differentiation or difference operation

target of interest

  • CDF = cumulative distribution function
  • probability function

target of interest

\[ \begin{array}{ccccccc} X & \sim & F_{{\scriptscriptstyle X}}\left(x\right) & \overset{\text{FToC}}{\longleftrightarrow} & f_{{\scriptscriptstyle X}}\left(x\right) & \leftrightarrow & \mathrm{P}_{{\scriptscriptstyle X}}\\ & \text{inversion formula}\ : & \updownarrow & \forall\xi\approx0\left[M_{{\scriptscriptstyle X}}\left(\xi\right)\in\mathbb{R}\right]\ \wedge & \uparrow\downarrow & \searrow\nwarrow & \looparrowleft\wedge\ \mathrm{supp}\left(f_{{\scriptscriptstyle X}}\right)\text{ is bounded}\\ & & \varphi_{{\scriptscriptstyle X}}\left(\xi\right) & \leftrightarrows & M_{{\scriptscriptstyle X}}\left(\xi\right) & \rightarrow & \left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} \end{array} \]

In population,

\[ X\sim F_{{\scriptscriptstyle X}}\left(x\right) \]

by sampling,

\[ X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle i}},\cdots,X_{{\scriptscriptstyle n}}=X_{{\scriptscriptstyle i}}\sim F_{{\scriptscriptstyle X}}\left(x\right) \]

or

\[ X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle i}},\cdots,X_{{\scriptscriptstyle n}}=X_{{\scriptscriptstyle i}}\overset{\text{i.i.d.}}{\sim}F_{{\scriptscriptstyle X}}\left(x\right) \]

\(\text{i.i.d.}\) = independently identically distributed

and inference back

parametrically

\[ \widehat{X}\sim\widehat{F}_{{\scriptscriptstyle X}}\left(x\right)=\widehat{F}_{{\scriptscriptstyle X}}\left(x|\theta\right) \]

or nonparametrically

\[ \hat{X}\sim\hat{F}_{{\scriptscriptstyle X}}\left(x\right) \]

inference is function of samples, or called random function, to estimate unknown parameters

\[ \widehat{\Theta}\leftarrow\left(X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle i}},\cdots,X_{{\scriptscriptstyle n}}\right)=T\left(X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle n}}\right)=T\left(\cdots,X_{{\scriptscriptstyle i}},\cdots\right)=T\left(X_{{\scriptscriptstyle i}}\right) \]

correspondng CDF for inference or estimation function of sampling random variables

\[ T\left(X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle n}}\right)=T\sim F_{{\scriptscriptstyle T}}\left(t\right) \]

wish to be unbiased and consistent

\[ \begin{cases} \mathrm{E}\left(\widehat{\Theta}\right)=\theta\Leftrightarrow\mathrm{E}\left(\widehat{\Theta}\right)-\theta=0 & \text{unbiasedness}\\ \mathrm{V}\left(\widehat{\Theta}\right)=0 & \text{consistency} \end{cases} \]

unbiasedness usually harder than consistency, thus usually first considered consistency.

modeling or parameterizing with unknown parameter \(\theta\)

\[ F_{{\scriptscriptstyle X}}\left(x\right)\overset{M}{=}F_{{\scriptscriptstyle X}}\left(x|\theta\right)=F_{{\scriptscriptstyle X}}\left(x;\theta\right) \]

parameterization is to reduce unknown parameters from infinite ones to finite ones

e.g. for normally distributed data

\[ f_{{\scriptscriptstyle X}}\left(x\right)\overset{M}{=}f_{{\scriptscriptstyle X}}\left(x|\theta\right)=\dfrac{\mathrm{e}^{{\scriptscriptstyle \frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}}}{\sqrt{2\pi\sigma^{2}}}=f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma^{2}\right)=f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma\right) \]

the price of parameterization is guess wrong model.

For some non-negative data, instead of normal distribution, consider distributions skewed to the right

topics

  1. \(\mathrm{P}_{{\scriptscriptstyle X}}\) probability theory
  2. \(f_{{\scriptscriptstyle X}}\left(x\right)\overset{M}{=}f_{{\scriptscriptstyle X}}\left(x|\theta\right)=f_{{\scriptscriptstyle X}}\left(x;\theta\right)\) various univariable distribution
  3. \(f_{\boldsymbol{{\scriptscriptstyle X}}}\left(\boldsymbol{x}\right)\overset{M}{=}f_{\boldsymbol{{\scriptscriptstyle X}}}\left(\boldsymbol{x}|\boldsymbol{\theta}\right)=f_{\boldsymbol{{\scriptscriptstyle X}}}\left(\boldsymbol{x};\boldsymbol{\theta}\right)\) multivariable distribution
  4. \(T\left(X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle n}}\right)\) inference
    • point estimation \[ \widehat{\mu}=\begin{cases} \overline{X} & \rightarrow\mu\\ \mathrm{median}\left(X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle n}}\right) & \rightarrow\mu\\ \vdots \end{cases} \]
    • interval estimation = hypothesis testing \[ \begin{cases} H_{{\scriptscriptstyle 0}}:\theta=\theta_{{\scriptscriptstyle 0}}\\ H_{{\scriptscriptstyle 1}}:\theta\ne\theta_{{\scriptscriptstyle 0}} \end{cases}\leftarrow T\in\left\{ 0,1\right\} \]
  5. how to find \(T\)
  6. behavior of random function \(T\left(X_{{\scriptscriptstyle 1}},\cdots,X_{{\scriptscriptstyle n}}\right)=T\sim F_{{\scriptscriptstyle T}}\left(t\right)\)
    • statistical properties of \(T\)
    • asymptotic properties \[ n\rightarrow\infty\begin{cases} \text{CLT}=\text{central limit theorem}\\ \text{LLN}=\text{law of large number} \end{cases} \]

18.1.1 probability theory

https://www.youtube.com/watch?v=HBmTDtMBr3c

Definition 18.1 sample space: The set \(S\) of all possible outcomes of an experiment is called the sample space

\[ S=\left\{ \omega_{{\scriptscriptstyle i}}\right\} _{{\scriptscriptstyle i}\in I} \]

Definition 18.2 event: An event \(E\) is any collection of possible outcomes of an experiment, i.e. any subset of \(S\)

\[ E \subseteq S \] set operation

commutativity, associativity, distributivity

De Morgan law

pairwise disjoint = mutually exclusive

partition

18.1.1.1 probability function

probability function axioms = probability function definition

Kolmogorov axioms of probability6 p.72

Definition 18.3 probability function: Given a sample space \(S\) and its event \(E\), a probability function is a function \(\mathrm{P}\) satisfying

\[ \begin{cases} \mathrm{P}\left(S\right)=1\\ \forall E\subseteq S\left(\mathrm{P}\left(E\right)\ge0\right)\\ E_{{\scriptscriptstyle 1}},\cdots,E_{{\scriptscriptstyle i}},\cdots\text{ are pairwise disjoint}\Rightarrow\mathrm{P}\left(\bigcup\limits _{i\in I}E_{{\scriptscriptstyle i}}\right)=\sum\limits _{i\in I}E_{{\scriptscriptstyle i}} \end{cases} \]

tossing a dice

theorems

\[ \mathrm{P}\left(\emptyset\right)=0 \]

\[ \mathrm{P}\left(E\right)\le1 \]

\[ \mathrm{P}\left(E^{\mathrm{C}}\right)=\mathrm{P}\left(\overline{E}\right)=1-\mathrm{P}\left(E\right) \]

\[ \mathrm{P}\left(E_{{\scriptscriptstyle 2}}\cap\overline{E}_{{\scriptscriptstyle 1}}\right)=\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\right)-\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\cap E_{{\scriptscriptstyle 1}}\right) \]

\[ E_{{\scriptscriptstyle 1}}\subseteq E_{{\scriptscriptstyle 2}}\Rightarrow\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\right)\le\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\right) \]

addition rule6 p.75 and extended addition rule6 p.76

inclusion-exclusion principle = sieve principle

\[ \mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cup E_{{\scriptscriptstyle 2}}\right)=\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\right)+\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\right)-\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right) \]

\[ \mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cup E_{{\scriptscriptstyle 2}}\cup E_{{\scriptscriptstyle 3}}\right)=\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\right)+\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\right)+\mathrm{P}\left(E_{{\scriptscriptstyle 3}}\right)-\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right)-\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\cap E_{{\scriptscriptstyle 3}}\right)-\mathrm{P}\left(E_{{\scriptscriptstyle 3}}\cap E_{{\scriptscriptstyle 1}}\right)+\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\cap E_{{\scriptscriptstyle 3}}\right) \]

\[ \mathrm{P}\left(\bigcup\limits _{i=1}^{n}E_{{\scriptscriptstyle i}}\right)=\sum\limits _{k=1}^{n}\left(\left(-1\right)^{k-1}\sum\limits _{1\le i_{1}<\cdots<i_{k}\le n}\mathrm{P}\left(\bigcap\limits _{i\in\left\{ i_{1},\dots,i_{k}\right\} }E_{{\scriptscriptstyle i}}\right)\right) \]

symmetric difference6 p.75

union probability upper-bounded by sum of indivisual probability

\[ \mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cup E_{{\scriptscriptstyle 2}}\right)=\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\right)+\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\right)-\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right)\le\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\right)+\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\right) \]

\[ E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}=\emptyset\Leftrightarrow\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\cup E_{{\scriptscriptstyle 2}}\right)=\mathrm{P}\left(E_{{\scriptscriptstyle 1}}\right)+\mathrm{P}\left(E_{{\scriptscriptstyle 2}}\right) \]

Boole inequality

\[ \mathrm{P}\left(\bigcup\limits _{i\in I}E_{{\scriptscriptstyle i}}\right)\le\sum\limits _{i\in I}E_{{\scriptscriptstyle i}} \]

\[ \mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}}\middle|H_{{\scriptscriptstyle 0}}\right)=\mathrm{P}\left(\text{reject }H_{{\scriptscriptstyle 0}}\middle|H_{{\scriptscriptstyle 0}}\text{ is true}\right)=\alpha=\text{type 1 error} \]

multiple hypothesis testing

How to control the family-wise error rate?

Ideally,

FWER = family-wise error rate

\[ \begin{aligned} \alpha= & \mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle 1}}}\cup\cdots\cup\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle m}}}\middle|H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle 1}}\cap\cdots\cap H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle m}}\right)=\mathrm{P}\left(\text{reject any }H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}\middle|\text{any }H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\text{ is true}\right)\\ = & \mathrm{P}\left(\bigcup\limits _{i=1}^{m}\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=1-\mathrm{P}\left(\overbrace{\bigcup\limits _{i=1}^{m}\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)\\ = & 1-\mathrm{P}\left(\bigcap\limits _{i=1}^{m}\underbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=1-\mathrm{P}\left(\text{not to reject any }H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)\\ & \overset{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\text{ pairwise independent}}{=}1-\prod\limits _{i=1}^{m}\mathrm{P}\left(\underbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=1-\prod\limits _{i=1}^{m}\left(1-\mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)\right)\\ & \overset{\forall i,j\left[\mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=\alpha_{{\scriptscriptstyle 0}}\right]}{=}1-\prod\limits _{i=1}^{m}\left(1-\alpha_{{\scriptscriptstyle 0}}\right)=1-\left(1-\alpha_{{\scriptscriptstyle 0}}\right)^{m}\\ \alpha= & 1-\left(1-\alpha_{{\scriptscriptstyle 0}}\right)^{m}\\ \alpha_{{\scriptscriptstyle 0}}= & 1-\left(1-\alpha\right)^{\frac{1}{m}}=1-\sqrt[m]{1-\alpha}\\ \Downarrow\\ \text{set } & \mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=\alpha_{{\scriptscriptstyle 0}}=1-\sqrt[m]{1-\alpha} \end{aligned} \]

But condition \(H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\text{ pairwise independent}\) is too strong.

Practically,

\[ \begin{aligned} \alpha= & \mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle 1}}}\cup\cdots\cup\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle m}}}\middle|H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle 1}}\cap\cdots\cap H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle m}}\right)=\mathrm{P}\left(\text{reject any}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}\middle|\text{any }H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\text{ is true}\right)\\ = & \mathrm{P}\left(\bigcup\limits _{i=1}^{m}\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)\overset{\mathrm{P}\left(\bigcup\limits _{i\in I}E_{{\scriptscriptstyle i}}\right)\le\sum\limits _{i\in I}E_{{\scriptscriptstyle i}}}{\underset{\text{Boole inequality}}{\le}}\sum\limits _{i=1}^{m}\mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)\underset{\Uparrow}{=}\sum\limits _{i=1}^{m}\alpha_{{\scriptscriptstyle 0}}=m\alpha_{{\scriptscriptstyle 0}}\underset{\Downarrow}{=}\alpha\\ \text{let }\forall i,j & \left[\mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=\alpha_{{\scriptscriptstyle 0}}\right]\Rightarrow\sum\limits _{i=1}^{m}\mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=\sum\limits _{i=1}^{m}\alpha_{{\scriptscriptstyle 0}}=m\alpha_{{\scriptscriptstyle 0}}\Rightarrow\alpha_{{\scriptscriptstyle 0}}=\dfrac{\alpha}{m}\\ \Downarrow\\ \text{set } & \mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=\alpha_{{\scriptscriptstyle 0}}=\dfrac{\alpha}{m} \end{aligned} \]

Bonferroni correction

\[ \mathrm{P}\left(\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)=\dfrac{\alpha}{m}\Rightarrow\mathrm{P}\left(\bigcup\limits _{i=1}^{m}\overbrace{H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle i}}}\middle|\bigcap\limits _{j=1}^{m}H_{{\scriptscriptstyle 0}}^{{\scriptscriptstyle j}}\right)\le\alpha \]

Bonferroni inequality6 p.77

Bonferroni inequality and Boole inequality are equivalent inequalities

birthday problem6 p.78

18.1.1.2 conditional probability

18.1.2 univariable distribution

\[ \begin{cases} \mathrm{P}_{{\scriptscriptstyle X}}\left(X\in E\right) & \forall E\subseteq S\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right) & \forall x\in\mathbb{R} \end{cases} \]

\[ \begin{cases} \mathrm{P}_{{\scriptscriptstyle X}}\left(X\in E\right) & \forall E\subseteq S\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x\right]\right)=F_{{\scriptscriptstyle X}}\left(x\right) & \forall x\in\mathbb{R} \end{cases} \]

\[ \begin{aligned} & \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x\right]\right)\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(\bigcup\limits _{\epsilon>0}\left(-\infty,x-\epsilon\right]\right)=\lim_{\epsilon\rightarrow0}\mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x-\epsilon\right]\right)\\ \leftrightarrow & \mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x\right)\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(E\right),E=\left(-\infty,x\right) \end{aligned} \]

18.1.2.1 cumulative distribution function

CDF = cumulative distribution function

\[ F_{{\scriptscriptstyle X}}\left(x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x\right]\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right) \] \[ X\sim\mathrm{P}_{{\scriptscriptstyle X}}\leftrightarrow F_{{\scriptscriptstyle X}}\left(x\right) \]

\[ X\sim F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow\mathrm{P}_{{\scriptscriptstyle X}} \]


Definition 18.4 CDF = cumulative distribution function: A cumulative distribution function is a function \(F:\mathbb{R} \rightarrow \left[0,1\right]\) satisfying

\[ F_{{\scriptscriptstyle X}}\left(x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x\right]\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right) \]

Theorem 18.1 CDF = cumulative distribution function: \(F\left(x\right)\) is a cumulative distribution function iff

\[ \begin{cases} \begin{cases} \lim\limits _{x\rightarrow-\infty}F\left(x\right)=0 & \lim\limits _{x\rightarrow+\infty}F\left(x\right)=1\end{cases} & \left(01\right)\left[0,1\right]\\ \forall x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\left[F\left(x_{{\scriptscriptstyle 1}}\right)\le F\left(x_{{\scriptscriptstyle 2}}\right)\right] & \left(nd\right)\text{non-decreasing}\\ \lim\limits _{x\rightarrow x_{{\scriptscriptstyle 0}}^{+}}F\left(x\right)=F\left(x_{{\scriptscriptstyle 0}}\right) & \left(rc\right)\text{right-continuous} \end{cases} \]


Definition 18.5 RV = r.v. = random variable

\[ \begin{cases} X\text{ is a continuous RV} & \lim\limits _{x\rightarrow x_{{\scriptscriptstyle 0}}}F_{{\scriptscriptstyle X}}\left(x\right)=F_{{\scriptscriptstyle X}}\left(x_{{\scriptscriptstyle 0}}\right)\\ X\text{ is a discrete RV} & F_{{\scriptscriptstyle X}}\text{ is a step function of }x \end{cases} \]

6 p.103

Definition 18.6 RV = r.v. = random variable

6 p.104

Definition 18.7 range of r.v. = range of RV = the range of a random variable

\[ \begin{aligned} \mathcal{R}_{{\scriptscriptstyle X}}=&\left\{ x\middle|\begin{cases} \omega\in S\\ x\in X\left(\omega\right) \end{cases}\right\} \\=&\left\{ x\middle|\forall\omega\in S\left[x\in X\left(\omega\right)\right]\right\} \\=&\left\{ x\middle|x\in X\left(\Omega\right)\right\} =X\left(\Omega\right) \end{aligned} \]

18.1.2.2 probability density function

\[ \begin{cases} \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(\left(-\infty,x\right]\right)=F_{{\scriptscriptstyle X}}\left(x\right)\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(X=x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(x\right)=? \end{cases} \]

Definition 18.8 PDF = probability density function

PMF = probability mass function

\[ \begin{cases} f_{{\scriptscriptstyle X}}\left(x\right)=\dfrac{\mathrm{d}}{\mathrm{d}x}F_{{\scriptscriptstyle X}}\left(x\right) & X\text{ continuous RV}\\ f_{{\scriptscriptstyle X}}\left(x\right)=F_{{\scriptscriptstyle X}}\left(x\right)-F_{{\scriptscriptstyle X}}\left(x^{-}\right) & X\text{ discrete RV} \end{cases} \]

\[ \begin{cases} f_{{\scriptscriptstyle X}}\left(x\right)=\text{derivative of }F_{{\scriptscriptstyle X}}\left(x\right) & X\text{ continuous}\\ f_{{\scriptscriptstyle X}}\left(x\right)=\text{difference of }F_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

\[ \begin{cases} f_{{\scriptscriptstyle X}}\left(x\right)=\dfrac{\mathrm{d}}{\mathrm{d}x}F_{{\scriptscriptstyle X}}\left(x\right) & \Leftrightarrow F_{{\scriptscriptstyle X}}\left(x\right)=\intop\limits _{-\infty}^{x}f_{{\scriptscriptstyle X}}\left(t\right)\mathrm{d}t\\ f_{{\scriptscriptstyle X}}\left(x\right)=F_{{\scriptscriptstyle X}}\left(x\right)-F_{{\scriptscriptstyle X}}\left(x^{-}\right) & \Leftrightarrow F_{{\scriptscriptstyle X}}\left(x\right)=\sum\limits _{t\le x}f_{{\scriptscriptstyle X}}\left(t\right) \end{cases} \]

\[ \begin{cases} X\sim\mathrm{P}_{{\scriptscriptstyle X}}\leftrightarrow F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow f_{{\scriptscriptstyle X}}\left(x\right) & \text{ e.g. probability theory}\\ X\sim F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow\mathrm{P}_{{\scriptscriptstyle X}} & \Rightarrow F_{{\scriptscriptstyle X}}\left(x\right)\overset{M}{=}F_{{\scriptscriptstyle X}}\left(x|\theta\right)\text{ e.g. survial analysis}\\ X\sim f_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow\mathrm{P}_{{\scriptscriptstyle X}} & \Rightarrow f_{{\scriptscriptstyle X}}\left(x\right)\overset{M}{=}f_{{\scriptscriptstyle X}}\left(x|\theta\right)\text{ e.g. general statistics} \end{cases} \]


Theorem 18.2 PDF = probability density function or PMF = probability mass function: \(f\left(x\right)\) is a probability density function or probability mass function iff

\[ \begin{cases} \forall x\in\mathbb{R}\left[f\left(x\right)\ge0\right]\\ \begin{cases} \intop\limits _{-\infty}^{+\infty}f\left(x\right)\mathrm{d}x=1 & t\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}f\left(x\right)=1 & t\text{ discrete} \end{cases} \end{cases} \]


\[ \forall E\subseteq S\left[\mathrm{P}_{{\scriptscriptstyle X}}\left(X\in E\right)=\begin{cases} \int_{x\in E}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in E}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}\right] \]


https://www.youtube.com/watch?v=KIXBlj-3M2k

\[ \begin{aligned} \mathrm{P}_{{\scriptscriptstyle X}}\left(X=x\right)= & \lim_{\epsilon\rightarrow0}\mathrm{P}_{{\scriptscriptstyle X}}\left(\left[x-\epsilon,x+\epsilon\right]\right)\\ = & \lim_{\epsilon\rightarrow0}\mathrm{P}_{{\scriptscriptstyle X}}\left(x-\epsilon\le X\le x+\epsilon\right)\\ = & \lim_{\epsilon\rightarrow0}\left[F_{{\scriptscriptstyle X}}\left(x+\epsilon\right)-F_{{\scriptscriptstyle X}}\left(x-\epsilon\right)\right]\\ = & \begin{cases} F_{{\scriptscriptstyle X}}\left(x\right)-F_{{\scriptscriptstyle X}}\left(x\right)=0 & X\text{ continuous}\\ F_{{\scriptscriptstyle X}}\left(x\right)-F_{{\scriptscriptstyle X}}\left(x^{-}\right)=f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \end{aligned} \]

\[ X\sim F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow\mathrm{P}_{{\scriptscriptstyle X}} \]

\[ Y=g\left(X\right) \]

\[ \begin{cases} Y\sim F_{{\scriptscriptstyle Y}}\left(y\right)\leftrightarrow f_{{\scriptscriptstyle Y}}\left(y\right) & \Rightarrow F_{{\scriptscriptstyle Y}}\left(y\right)\overset{M}{=}F_{{\scriptscriptstyle Y}}\left(y|\theta\right)\\ Y\sim f_{{\scriptscriptstyle Y}}\left(y\right)\leftrightarrow F_{{\scriptscriptstyle Y}}\left(y\right)\leftrightarrow\mathrm{P}_{{\scriptscriptstyle Y}} & \Rightarrow f_{{\scriptscriptstyle Y}}\left(y\right)\overset{M}{=}f_{{\scriptscriptstyle Y}}\left(y|\theta\right) \end{cases} \]

18.1.2.3 range vs. support

Definition 18.9 range of r.v. = range of RV = the range of a random variable

\[ \begin{aligned} \mathcal{R}_{{\scriptscriptstyle X}}=&\left\{ x\middle|\begin{cases} \omega\in S\\ x\in X\left(\omega\right) \end{cases}\right\} \\=&\left\{ x\middle|\forall\omega\in S\left[x\in X\left(\omega\right)\right]\right\} \\=&\left\{ x\middle|x\in X\left(\Omega\right)\right\} =X\left(\Omega\right) \end{aligned} \]

Definition 18.10 support

\[ \mathrm{supp}\left(f\right)=\left\{ x\middle|\begin{cases} f:D\rightarrow\mathcal{R}\\ x\in D\\ f_{{\scriptscriptstyle X}}\left(x\right)\ne0 \end{cases}\right\} \]

Definition 18.11 support of r.v. = support of RV = the support of a random variable

\[ \mathrm{supp}\left(f_{{\scriptscriptstyle X}}\right)=\left\{ x\middle|\begin{cases} x\in X\left(\Omega\right)\\ f_{{\scriptscriptstyle X}}\left(x\right)\ne0 \end{cases}\right\} \overset{f_{{\scriptscriptstyle X}}\left(x\right)\ge0}{=}\left\{ x\middle|\begin{cases} x\in X\left(\Omega\right)\\ f_{{\scriptscriptstyle X}}\left(x\right)>0 \end{cases}\right\} \]

18.1.2.4 continuous monotone transformation

Theorem 18.3 Random variable \(Y\) is monotone transformation of random variable \(X\), i.e. \(\begin{cases} X\sim F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow f_{{\scriptscriptstyle X}}\left(x\right)\\ Y=g\left(X\right)\begin{cases} \forall x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\left[g\left(x_{{\scriptscriptstyle 1}}\right)<g\left(x_{{\scriptscriptstyle 2}}\right)\right]\\ \forall x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\left[g\left(x_{{\scriptscriptstyle 1}}\right)>g\left(x_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\Rightarrow & \exists g^{-1}:Y\rightarrow X \end{cases}\), then

\[ f_{{\scriptscriptstyle Y}}\left(y\right)=f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\left|\dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y}\right| \]

Proof:

\[ \begin{aligned} F_{{\scriptscriptstyle Y}}\left(y\right)= & \mathrm{P}_{{\scriptscriptstyle Y}}\left(Y\le y\right)\\ = & \mathrm{P}\left(g\left(X\right)\le y\right)\begin{cases} \forall x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\left[g\left(x_{{\scriptscriptstyle 1}}\right)<g\left(x_{{\scriptscriptstyle 2}}\right)\right] & \Leftrightarrow\forall g\left(x_{{\scriptscriptstyle 1}}\right)<g\left(x_{{\scriptscriptstyle 2}}\right)\left[x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\right]\\ \forall x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\left[g\left(x_{{\scriptscriptstyle 1}}\right)>g\left(x_{{\scriptscriptstyle 2}}\right)\right] & \Leftrightarrow\forall g\left(x_{{\scriptscriptstyle 1}}\right)>g\left(x_{{\scriptscriptstyle 2}}\right)\left[x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\right] \end{cases}\\ = & \begin{cases} \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le g^{-1}\left(y\right)=x\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right]\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(X\ge g^{-1}\left(y\right)=x\right) & \forall y_{{\scriptscriptstyle 1}}>y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ = & \begin{cases} \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le g^{-1}\left(y\right)=x\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right]\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(X\ge g^{-1}\left(y\right)=x\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ = & \begin{cases} F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right]\\ 1-F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ F_{{\scriptscriptstyle Y}}\left(y\right)= & \begin{cases} F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right]\\ 1-F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases} \end{aligned} \]

\[ \begin{aligned} f_{{\scriptscriptstyle Y}}\left(y\right)=\dfrac{\mathrm{d}}{\mathrm{d}y}F_{{\scriptscriptstyle Y}}\left(y\right)= & \begin{cases} \dfrac{\mathrm{d}}{\mathrm{d}y}F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right) & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right]\\ \dfrac{\mathrm{d}}{\mathrm{d}y}\left[1-F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\right] & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ = & \begin{cases} \dfrac{\mathrm{d}F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)}{\mathrm{d}g^{-1}\left(y\right)}\dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right]\\ \dfrac{-\mathrm{d}F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)}{\mathrm{d}g^{-1}\left(y\right)}\dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ = & \begin{cases} \dfrac{\mathrm{d}F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)}{\mathrm{d}g^{-1}\left(y\right)}\dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right]\\ \dfrac{\mathrm{d}F_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)}{\mathrm{d}g^{-1}\left(y\right)}\dfrac{-\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ = & \begin{cases} f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \begin{cases} \dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y}\ge0\\ f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\ge0 \end{cases}\Rightarrow f_{{\scriptscriptstyle Y}}\left(y\right)\ge0\\ f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\dfrac{-\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \begin{cases} \dfrac{-\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y}\ge0\\ f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\ge0 \end{cases}\Rightarrow f_{{\scriptscriptstyle Y}}\left(y\right)\ge0 \end{cases}\\ f_{{\scriptscriptstyle Y}}\left(y\right)= & \begin{cases} f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y}\ge0\\ f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\dfrac{-\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y} & \dfrac{-\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y}\ge0 \end{cases}\\ f_{{\scriptscriptstyle Y}}\left(y\right)= & f_{{\scriptscriptstyle X}}\left(g^{-1}\left(y\right)\right)\left|\dfrac{\mathrm{d}g^{-1}\left(y\right)}{\mathrm{d}y}\right| \end{aligned} \]

\[ \tag*{$\Box$} \]

segment \(g\left(X\right)\) into monotone functions

For example, \(\begin{cases} g\left(x\right)=x^{2}\\ Y=g\left(X\right) \end{cases}\Rightarrow Y=g\left(X\right)=X^{2}\),

\[ \begin{cases} Y=g\left(X\right)=X^{2}\\ X\in\left(-\infty,+\infty\right) \end{cases} \]

\[ \begin{aligned} & Y=g\left(X\right)=X^{2}\\ \Rightarrow & Y=\begin{cases} X^{2}=g\left(X\right) & X\ge0\Leftrightarrow X\in\left[0,+\infty\right)\Rightarrow\forall X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}^{2}<X_{{\scriptscriptstyle 2}}^{2}\right]\\ X^{2}=g\left(X\right) & X<0\Leftrightarrow X\in\left(-\infty,0\right)\Rightarrow\forall X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}^{2}>X_{{\scriptscriptstyle 2}}^{2}\right] \end{cases}\\ \Rightarrow & X=\begin{cases} \sqrt{Y}=g^{-1}\left(Y\right) & X\ge0\Rightarrow\forall X_{{\scriptscriptstyle 1}}^{2}<X_{{\scriptscriptstyle 2}}^{2}\left[X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\right]\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\right]\\ -\sqrt{Y}=g^{-1}\left(Y\right) & X<0\Rightarrow\forall X_{{\scriptscriptstyle 1}}^{2}<X_{{\scriptscriptstyle 2}}^{2}\left[X_{{\scriptscriptstyle 1}}>X_{{\scriptscriptstyle 2}}\right]\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}>X_{{\scriptscriptstyle 2}}\right] \end{cases}\\ \Rightarrow & X=\begin{cases} \sqrt{Y}=g^{-1}\left(Y\right) & Y\in\left[0,\infty\right)\Rightarrow X\ge0\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(Y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(Y_{{\scriptscriptstyle 2}}\right)\right]\\ -\sqrt{Y}=g^{-1}\left(Y\right) & Y\in\left[0,\infty\right)\Rightarrow X<0\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(Y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(Y_{{\scriptscriptstyle 2}}\right)\right] \end{cases} \end{aligned} \]

\[ \begin{aligned} F_{{\scriptscriptstyle Y}}\left(y\right)= & \mathrm{P}_{{\scriptscriptstyle Y}}\left(Y\le y\right)=\mathrm{P}\left(X^{2}\le y\right)\\ = & \mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left(\left\{ X<0\right\} \cup\left\{ X\ge0\right\} \right)\right)\\ = & \mathrm{P}\left(\left(\left\{ X^{2}\le y\right\} \cap\left\{ X<0\right\} \right)\cup\left(\left\{ X^{2}\le y\right\} \cap\left\{ X\ge0\right\} \right)\right)\\ = & \mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X\ge0\right\} \right)\\ & -\mathrm{P}\left(\left(\left\{ X^{2}\le y\right\} \cap\left\{ X<0\right\} \right)\cap\left(\left\{ X^{2}\le y\right\} \cap\left\{ X\ge0\right\} \right)\right)\\ = & \mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X\ge0\right\} \right)-\mathrm{P}\left(\emptyset\right)\\ = & \mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X\ge0\right\} \right)-0\\ = & \mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X\ge0\right\} \right)\\ = & \mathrm{P}\left(\left\{ -X\le\sqrt{y}\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X\le\sqrt{y}\right\} \cap\left\{ X\ge0\right\} \right)\\ = & \mathrm{P}\left(\left\{ X\ge-\sqrt{y}\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X\le\sqrt{y}\right\} \cap\left\{ X\ge0\right\} \right)\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(-\sqrt{y}\le X<0\right)+\mathrm{P}_{{\scriptscriptstyle X}}\left(0\le X\le\sqrt{y}\right)\\ = & \left[F_{{\scriptscriptstyle X}}\left(0\right)-F_{{\scriptscriptstyle X}}\left(-\sqrt{y}\right)\right]+\left[F_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)-F_{{\scriptscriptstyle X}}\left(0\right)\right]\\ = & F_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)-F_{{\scriptscriptstyle X}}\left(-\sqrt{y}\right) \end{aligned} \]

\[ \tag*{$\Box$} \]

Another example, \(\begin{cases} Y=g\left(X\right)=X^{2}\\ X\in\left[-1,\infty\right) \end{cases}\),

\[ \begin{aligned} & Y=g\left(X\right)=X^{2}\\ \Rightarrow & Y=\begin{cases} X^{2}=g\left(X\right) & X\ge0\Leftrightarrow X\in\left[0,+\infty\right)\Rightarrow\forall X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}^{2}<X_{{\scriptscriptstyle 2}}^{2}\right]\\ X^{2}=g\left(X\right) & -1\le X<0\Leftrightarrow X\in\left[-1,0\right)\Rightarrow\forall X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}^{2}>X_{{\scriptscriptstyle 2}}^{2}\right] \end{cases}\\ \Rightarrow & X=\begin{cases} \sqrt{Y}=g^{-1}\left(Y\right) & X\in\left[0,\infty\right)\Rightarrow\forall X_{{\scriptscriptstyle 1}}^{2}<X_{{\scriptscriptstyle 2}}^{2}\left[X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\right]\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}<X_{{\scriptscriptstyle 2}}\right]\\ -\sqrt{Y}=g^{-1}\left(Y\right) & X\in\left[-1,0\right)\Rightarrow\forall X_{{\scriptscriptstyle 1}}^{2}<X_{{\scriptscriptstyle 2}}^{2}\left[X_{{\scriptscriptstyle 1}}>X_{{\scriptscriptstyle 2}}\right]\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[X_{{\scriptscriptstyle 1}}>X_{{\scriptscriptstyle 2}}\right] \end{cases}\\ \Rightarrow & X=\begin{cases} \sqrt{Y}=g^{-1}\left(Y\right) & Y\in\left[0,\infty\right)\Rightarrow X\in\left[0,\infty\right)\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(Y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(Y_{{\scriptscriptstyle 2}}\right)\right]\\ -\sqrt{Y}=g^{-1}\left(Y\right) & Y\in\left(0,1\right]\Rightarrow X\in\left[-1,0\right)\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(Y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(Y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ \Rightarrow & X=\begin{cases} \sqrt{Y}=g^{-1}\left(Y\right) & Y\in\left(1,\infty\right)\Rightarrow X\in\left(1,\infty\right)\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(Y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(Y_{{\scriptscriptstyle 2}}\right)\right]\\ \sqrt{Y}=g^{-1}\left(Y\right) & Y\in\left[0,1\right]\Rightarrow X\in\left[0,1\right]\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(Y_{{\scriptscriptstyle 1}}\right)<g^{-1}\left(Y_{{\scriptscriptstyle 2}}\right)\right]\\ -\sqrt{Y}=g^{-1}\left(Y\right) & Y\in\left(0,1\right]\Rightarrow X\in\left[-1,0\right)\Rightarrow\forall Y_{{\scriptscriptstyle 1}}<Y_{{\scriptscriptstyle 2}}\left[g^{-1}\left(Y_{{\scriptscriptstyle 1}}\right)>g^{-1}\left(Y_{{\scriptscriptstyle 2}}\right)\right] \end{cases} \end{aligned} \]

\[ \begin{aligned} & F_{{\scriptscriptstyle Y}}\left(y\right)=\mathrm{P}_{{\scriptscriptstyle Y}}\left(Y\le y\right)=\mathrm{P}\left(X^{2}\le y\right)\begin{cases} Y=g\left(X\right)=X^{2}\\ X\in\left[-1,\infty\right) \end{cases}\\ = & \mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left(\left\{ X<0\right\} \cup\left\{ X\ge0\right\} \right)\right)=\cdots\text{ as }\begin{cases} Y=g\left(X\right)=X^{2}\\ X\in\left(-\infty,+\infty\right) \end{cases}\\ = & \mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X^{2}\le y\right\} \cap\left\{ X\ge0\right\} \right)\\ = & \mathrm{P}\left(\left\{ -X\le\sqrt{y}\right\} \cap\left\{ X<0\right\} \right)+\mathrm{P}\left(\left\{ X\le\sqrt{y}\right\} \cap\left\{ X\ge0\right\} \right)\\ = & \begin{cases} \mathrm{P}\left(\left\{ -X\le\sqrt{y}\right\} \cap\left\{ X<0\right\} \cap\left\{ X>1\right\} \right)+\mathrm{P}\left(\left\{ X\le\sqrt{y}\right\} \cap\left\{ X\ge0\right\} \cap\left\{ X>1\right\} \right) & Y\in\left(1,\infty\right)\Rightarrow X\in\left(1,\infty\right)\\ \mathrm{P}\left(\left\{ -X\le\sqrt{y}\right\} \cap\left\{ X<0\right\} \cap\left\{ X\ge-1\right\} \right)+\mathrm{P}\left(\left\{ X\le\sqrt{y}\right\} \cap\left\{ X\ge0\right\} \cap\left\{ X\le1\right\} \right) & Y\in\left[0,1\right]\Rightarrow\begin{cases} X\in\left[0,1\right]\\ X\in\left[-1,0\right) \end{cases} \end{cases}\\ = & \begin{cases} \mathrm{P}\left(\emptyset\right)+\mathrm{P}_{{\scriptscriptstyle X}}\left(1<X\le\sqrt{y}\right) & Y\in\left(1,\infty\right)\Rightarrow X\in\left(1,\infty\right)\\ \mathrm{P}\left(\left\{ X\ge-\sqrt{y}\right\} \cap\left\{ X<0\right\} \cap\left\{ X\ge-1\right\} \right)+\mathrm{P}_{{\scriptscriptstyle X}}\left(0\le X\le\min\left\{ \sqrt{y},1\right\} \right) & Y\in\left[0,1\right]\Rightarrow\begin{cases} X\in\left[0,1\right]\\ X\in\left[-1,0\right) \end{cases} \end{cases}\\ = & \begin{cases} 0+\left[F_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)-F_{{\scriptscriptstyle X}}\left(1\right)\right] & Y\in\left(1,\infty\right)\Rightarrow X\in\left(1,\infty\right)\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(\max\left\{ -1,-\sqrt{y}\right\} \le X<0\right)+\mathrm{P}_{{\scriptscriptstyle X}}\left(0\le X\le\sqrt{y}\right) & Y\in\left[0,1\right]\Rightarrow\begin{cases} X\in\left[0,1\right]\\ X\in\left[-1,0\right) \end{cases} \end{cases}\\ = & \begin{cases} F_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)-F_{{\scriptscriptstyle X}}\left(1\right) & Y\in\left(1,\infty\right)\Rightarrow X\in\left(1,\infty\right)\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(-\sqrt{y}\le X<0\right)+\left[F_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)-F_{{\scriptscriptstyle X}}\left(0\right)\right] & Y\in\left[0,1\right]\Rightarrow\begin{cases} X\in\left[0,1\right]\\ X\in\left[-1,0\right) \end{cases} \end{cases}\\ = & \begin{cases} F_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)-F_{{\scriptscriptstyle X}}\left(1\right) & y>1\\ F_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)-F_{{\scriptscriptstyle X}}\left(-\sqrt{y}\right) & -1\le y\le1 \end{cases} \end{aligned} \]

\[ \tag*{$\Box$} \]

18.1.2.5 discrete monotone transformation

\[ \begin{cases} Y=g\left(X\right)=X^{2}\\ X\text{ discrete}\Rightarrow & Y\text{ discrete} \end{cases} \]

\[ \begin{aligned} f_{{\scriptscriptstyle Y}}\left(y\right)= & \mathrm{P}_{{\scriptscriptstyle Y}}\left(Y=y\right)\\ = & \mathrm{P}\left(X^{2}=y\right)\\ = & \mathrm{P}\left(\left\{ X=\sqrt{y}\right\} \cup\left\{ X=-\sqrt{y}\right\} \right)\\ = & \mathrm{P}\left(\left\{ X=\sqrt{y}\right\} \right)+\mathrm{P}\left(\left\{ X=-\sqrt{y}\right\} \right)-\mathrm{P}\left(\left\{ X=\sqrt{y}\right\} \cap\left\{ X=-\sqrt{y}\right\} \right)\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(X=\sqrt{y}\right)+\mathrm{P}_{{\scriptscriptstyle X}}\left(X=-\sqrt{y}\right)-\mathrm{P}\left(\emptyset\right)\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(X=\sqrt{y}\right)+\mathrm{P}_{{\scriptscriptstyle X}}\left(X=-\sqrt{y}\right)-0\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(X=\sqrt{y}\right)+\mathrm{P}_{{\scriptscriptstyle X}}\left(X=-\sqrt{y}\right)\\ = & f_{{\scriptscriptstyle X}}\left(\sqrt{y}\right)+f_{{\scriptscriptstyle X}}\left(-\sqrt{y}\right) \end{aligned} \]

\[ \tag*{$\Box$} \]

Theorem 18.4 discrete monotone transformation

\[ \begin{array}{c} \begin{cases} Y=g\left(X\right)\\ X\text{ discrete}\Rightarrow & Y\text{ discrete} \end{cases}\\ \Downarrow\\ f_{{\scriptscriptstyle Y}}\left(y\right)=\sum\limits _{\left\{ x\middle|g\left(x\right)=y\right\} }f_{{\scriptscriptstyle X}}\left(x\right)=\sum\limits _{\left\{ x\middle|x=g^{-1}\left(y\right)\right\} }f_{{\scriptscriptstyle X}}\left(x\right) \end{array} \]

Proof:

\[ \begin{aligned} f_{{\scriptscriptstyle Y}}\left(y\right)= & \mathrm{P}_{{\scriptscriptstyle Y}}\left(Y=y\right)\\ = & \mathrm{P}\left(g\left(X\right)=y\right)=\sum_{t\in\left\{ x\middle|g\left(x\right)=y\right\} }f_{{\scriptscriptstyle X}}\left(t\right)=\sum_{x\in\left\{ x\middle|g\left(x\right)=y\right\} }f_{{\scriptscriptstyle X}}\left(x\right)=\sum\limits _{\left\{ x\middle|g\left(x\right)=y\right\} }f_{{\scriptscriptstyle X}}\left(x\right)\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(X=g^{-1}\left(y\right)\right)=\sum_{t\in\left\{ x\middle|x=g^{-1}\left(y\right)\right\} }f_{{\scriptscriptstyle X}}\left(t\right)=\sum_{x\in\left\{ x\middle|x=g^{-1}\left(y\right)\right\} }f_{{\scriptscriptstyle X}}\left(x\right)=\sum\limits _{\left\{ x\middle|x=g^{-1}\left(y\right)\right\} }f_{{\scriptscriptstyle X}}\left(x\right) \end{aligned} \]

\[ \begin{array}{ccccccccc} f_{{\scriptscriptstyle Y}}\left(y\right) & = & \mathrm{P}_{{\scriptscriptstyle Y}}\left(Y=y\right)\\ & = & \mathrm{P}\left(g\left(X\right)=y\right) & = & \sum\limits _{t\in\left\{ x\middle|g\left(x\right)=y\right\} }f_{{\scriptscriptstyle X}}\left(t\right) & = & \sum\limits _{x\in\left\{ x\middle|g\left(x\right)=y\right\} }f_{{\scriptscriptstyle X}}\left(x\right) & = & \sum\limits _{\left\{ x\middle|g\left(x\right)=y\right\} }f_{{\scriptscriptstyle X}}\left(x\right)\\ & = & \mathrm{P}_{{\scriptscriptstyle X}}\left(X=g^{-1}\left(y\right)\right) & = & \sum\limits _{t\in\left\{ x\middle|x=g^{-1}\left(y\right)\right\} }f_{{\scriptscriptstyle X}}\left(t\right) & = & \sum\limits _{x\in\left\{ x\middle|x=g^{-1}\left(y\right)\right\} }f_{{\scriptscriptstyle X}}\left(x\right) & = & \sum\limits _{\left\{ x\middle|x=g^{-1}\left(y\right)\right\} }f_{{\scriptscriptstyle X}}\left(x\right) \end{array} \]

\[ \tag*{$\Box$} \]

Theorem 18.5 probability integral transformation

\[ \begin{array}{c} \begin{cases} \begin{cases} X\text{ continuous} & \left(c\right)\\ X\sim F_{{\scriptscriptstyle X}}\left(x\right) & \left(d\right) \end{cases}\\ Y=F_{{\scriptscriptstyle X}}\left(X\right) & \left(t\right) \end{cases}\\ \Downarrow\\ F_{{\scriptscriptstyle Y}}\left(y\right)=y,\forall y\in\left[0,1\right]\\ \Updownarrow\text{def.}\\ Y\sim U=U\left(y\right)\Leftrightarrow Y\sim U\left(y\right)\Leftrightarrow Y\text{ is uniformly distributed on }\left[0,1\right] \end{array} \]

Proof:

\[ \begin{aligned} F_{{\scriptscriptstyle Y}}\left(y\right)= & \mathrm{P}_{{\scriptscriptstyle Y}}\left(Y\le y\right)\overset{\left(t\right)}{=}\mathrm{P}\left(F_{{\scriptscriptstyle X}}\left(X\right)\le y\right),\forall x_{{\scriptscriptstyle 1}}<x_{{\scriptscriptstyle 2}}\left[F_{{\scriptscriptstyle X}}\left(x_{{\scriptscriptstyle 1}}\right)<F_{{\scriptscriptstyle X}}\left(x_{{\scriptscriptstyle 2}}\right)\right]\Rightarrow\begin{cases} \exists F_{{\scriptscriptstyle X}}^{-1}:Y\rightarrow X\\ \forall y_{{\scriptscriptstyle 1}}<y_{{\scriptscriptstyle 2}}\left[F_{{\scriptscriptstyle X}}^{-1}\left(y_{{\scriptscriptstyle 1}}\right)<F_{{\scriptscriptstyle X}}^{-1}\left(y_{{\scriptscriptstyle 2}}\right)\right] \end{cases}\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le F_{{\scriptscriptstyle X}}^{-1}\left(y\right)=x\right)=\mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right),x=F_{{\scriptscriptstyle X}}^{-1}\left(y\right)\\ = & \mathrm{P}_{{\scriptscriptstyle X}}\left(X\le x\right)=F_{{\scriptscriptstyle X}}\left(x\right)\overset{x=F_{{\scriptscriptstyle X}}^{-1}\left(y\right)}{=}F_{{\scriptscriptstyle X}}\left(F_{{\scriptscriptstyle X}}^{-1}\left(y\right)\right)=y\\ F_{{\scriptscriptstyle Y}}\left(y\right)= & y \end{aligned} \]

\[ \tag*{$\Box$} \]

Note:

According to Theorem 18.5,

\[ \begin{aligned} & U=F_{{\scriptscriptstyle X}}\left(X\right)\overset{\ref{thm:probability-integral-transformation}}{\sim}U\left(u\right)\text{ on }\left[0,1\right]\\ \Rightarrow & X=F_{{\scriptscriptstyle X}}^{-1}\left(U\right)\wedge X\sim F_{{\scriptscriptstyle X}}\left(x\right)\Rightarrow F_{{\scriptscriptstyle X}}^{-1}\left(U\right)=X\sim F_{{\scriptscriptstyle X}}\left(x\right)\Rightarrow F_{{\scriptscriptstyle X}}^{-1}\left(U\right)\sim F_{{\scriptscriptstyle X}}\left(x\right)\\ \Rightarrow & X=F_{{\scriptscriptstyle X}}^{-1}\left(U\right)\sim F_{{\scriptscriptstyle X}}\left(x\right)\\ \text{i.e. } & \text{uniform random variables subsituted into the inverse of }F_{{\scriptscriptstyle X}}\text{,}\\ & \text{ we can get random variables follow }F_{{\scriptscriptstyle X}}\left(x\right) \end{aligned} \]

18.1.2.6 expected value

\[ \mathrm{E}\left(g\left(X\right)\right)=\mathrm{E}\left[g\left(X\right)\right]=\mathrm{E}g\left(X\right)=\mathbb{E}\left[g\left(X\right)\right]=\mathbb{E}g\left(X\right) \]

Definition 18.12 expected value: The expected value of a random variable \(g\left(X\right)\) is

\[ \mathrm{E}\left(g\left(X\right)\right)=\mathrm{E}\left[g\left(X\right)\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}g\left(x\right)f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}g\left(x\right)f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

6 p.126

Definition 18.13 expected value or expectation function: The expected value of a random variable \(X\) is

\[ \mathrm{E}\left(X\right)=\mathrm{E}\left[X\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}xf_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x=1 & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}xf_{{\scriptscriptstyle X}}\left(x\right)=1 & X\text{ discrete} \end{cases} \]

Theorem 18.6 the rule of the lazy statistician

the law of the unconscious statistician = the LOTUS

\[ \mathrm{E}\left(g\left(X\right)\right)=\mathrm{E}\left[g\left(X\right)\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}g\left(x\right)f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}g\left(x\right)f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

Proof:7 p.162 for p.119

Discrete case:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

Continuous case:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]


By linearity of \(\intop\) and \(\sum\), expected values have the following properties or theorems,

  • \(\mathrm{E}\left[a_{{\scriptscriptstyle 1}}g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)+a_{{\scriptscriptstyle 2}}g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)+c\right]=a_{{\scriptscriptstyle 1}}\mathrm{E}\left[g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}\left[g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)\right]+c\)
  • \(\forall x\in\mathbb{R}\left[g\left(x\right)\ge0\right]\Rightarrow\mathrm{E}\left[g\left(X\right)\right]\ge0\)
  • \(\forall x\in\mathbb{R}\left[g_{{\scriptscriptstyle 1}}\left(x\right)\ge g_{{\scriptscriptstyle 2}}\left(x\right)\right]\Rightarrow\mathrm{E}\left[g_{{\scriptscriptstyle 1}}\left(X\right)\right]\ge\mathrm{E}\left[g_{{\scriptscriptstyle 2}}\left(X\right)\right]\)
  • \(\forall x\in\mathbb{R}\left[a\le g\left(x\right)\le b\right]\Rightarrow a\le\mathrm{E}\left[g\left(X\right)\right]\le b\)

Theorem 18.7 \(\mathrm{E}\left[X\right]\) minimizes Euclidean distance \(\mathrm{E}\left[\left(X-b\right)^{2}\right]\) over \(b\), i.e.

\[ \mathrm{E}\left[X\right]=\underset{b}{\arg\min}\thinspace\mathrm{E}\left[\left(X-b\right)^{2}\right] \]

Proof:

\[ \begin{aligned} \mathrm{E}\left[\left(X-b\right)^{2}\right]= & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]+\mathrm{E}\left[X\right]-b\right)^{2}\right]\\ = & \mathrm{E}\left[\left\{ \left(X-\mathrm{E}\left[X\right]\right)+\left(\mathrm{E}\left[X\right]-b\right)\right\} ^{2}\right]\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}+2\left(X-\mathrm{E}\left[X\right]\right)\left(\mathrm{E}\left[X\right]-b\right)+\left(\mathrm{E}\left[X\right]-b\right)^{2}\right]\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]+2\left(\mathrm{E}\left[X\right]-b\right)\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)\right]+\mathrm{E}\left[\left(\mathrm{E}\left[X\right]-b\right)^{2}\right]\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]+2\left(\mathrm{E}\left[X\right]-b\right)\mathrm{E}\left[X-\mathrm{E}\left[X\right]\right]+\left(\mathrm{E}\left[X\right]-b\right)^{2}\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]+2\left(\mathrm{E}\left[X\right]-b\right)\left(\mathrm{E}\left[X\right]-\mathrm{E}\left[X\right]\right)+\left(\mathrm{E}\left[X\right]-b\right)^{2}\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]+2\left(\mathrm{E}\left[X\right]-b\right)0+\left(\mathrm{E}\left[X\right]-b\right)^{2}\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]+0+\left(\mathrm{E}\left[X\right]-b\right)^{2}\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]+\left(\mathrm{E}\left[X\right]-b\right)^{2}\overset{\left(\mathrm{E}\left[X\right]-b\right)^{2}\ge0}{\ge}\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]\\ \mathrm{E}\left[\left(X-b\right)^{2}\right]\ge & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]\\ \Downarrow\\ \mathrm{E}\left[\left(X-b\right)^{2}\right]= & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]\text{ holds if }\left(\mathrm{E}\left[X\right]-b\right)^{2}=0\Rightarrow b=\mathrm{E}\left[X\right]\Rightarrow\mathrm{E}\left[X\right]=\underset{b}{\arg\min}\thinspace\mathrm{E}\left[\left(X-b\right)^{2}\right] \end{aligned} \]

\[ \tag*{$\Box$} \]

Note:

When \(b=\mathrm{E}\left[X\right]\), \(\mathrm{E}\left[\left(X-b\right)^{2}\right]\) has minimum loss \(\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]=\mathrm{V}\left[X\right]=\mathrm{V}\left(X\right)\), i.e. defintion of variance appears.


Theorem 18.8 \(\mathrm{median}\left[X\right]\) minimizes \(\mathrm{E}\left[\left|X-b\right|\right]\) over \(b\), i.e.

\[ \mathrm{median}\left[X\right]=\underset{b}{\arg\min}\thinspace\mathrm{E}\left[\left|X-b\right|\right] \]

Proof:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]

Note:

When \(b=\mathrm{median}\left[X\right]\), \(\mathrm{E}\left[\left(X-b\right)^{2}\right]\) has minimum loss \(\mathrm{E}\left[\left|X-\mathrm{median}\left[X\right]\right|\right]\), i.e. defintion of MAD(mean absolute deviation) in robust statistics appears.


Definition 18.14 indicator function

\[ \begin{aligned} 1\left(E\right)=1\left(x\in E\right)=1\left(\left\{ x\in E\right\} \right)=1\left(\left\{ x\middle|x\in E\right\} \right)= & \begin{cases} 1 & E\\ 0 & \overline{E} \end{cases}=\begin{cases} 1 & \text{if }E\\ 0 & \text{if }\overline{E}=E^{\mathrm{C}} \end{cases}\\ = & \begin{cases} 1 & \text{if event }E\text{ occurs}\\ 0 & \text{if event }E\text{ does not occur} \end{cases} \end{aligned} \]

Note:

Theorem 18.9 probability as expected value

\[ \mathrm{P}_{{\scriptscriptstyle X}}\left(E\right)=\mathrm{P}\left(x\in E\right)=\mathrm{E}\left[1\left(X\in E\right)\right] \]

Proof:

\[ \begin{aligned} \mathrm{P}_{{\scriptscriptstyle X}}\left(E\right)=\mathrm{P}\left(x\in E\right)= & \int_{x\in E}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x=\int_{E}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x\\ = & \int1\left(x\in E\right)f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x\\ = & \int g\left(x\right)f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x,g\left(x\right)=1\left(x\in E\right)\\ = & \mathrm{E}\left[g\left(X\right)\right],g\left(X\right)=1\left(X\in E\right)\\ = & \mathrm{E}\left[1\left(X\in E\right)\right]\\ \mathrm{P}_{{\scriptscriptstyle X}}\left(E\right)=\mathrm{P}\left(x\in E\right)= & \mathrm{E}\left[1\left(X\in E\right)\right] \end{aligned} \]

\[ \tag*{$\Box$} \]


Iverson bracket https://en.wikipedia.org/wiki/Iverson_bracket

\[ \begin{cases} v\left(p\left(x\right)\right)=\mathrm{T} & \Leftrightarrow\left[p\left(x\right)\right]=1\\ v\left(p\left(x\right)\right)=\mathrm{F} & \Leftrightarrow\left[p\left(x\right)\right]=0 \end{cases} \]

\[ \left[p\left(x\right)\right]=\begin{cases} 1 & v\left(p\left(x\right)\right)=\mathrm{T}\\ 0 & v\left(\neg p\left(x\right)\right)=\mathrm{T} \end{cases}=\begin{cases} 1 & p\left(x\right)\\ 0 & \neg p\left(x\right) \end{cases} \]

negation = NOT

\[ \left[\neg p\right]=1-\left[p\right] \]

in set theory or domain of events,

\[ 1\left(\overline{E}\right)=1-1\left(E\right) \]

conjunction = AND

\[ \left[p\wedge q\right]=\left[p\right]\left[q\right] \]

in set theory or domain of events,

\[ 1\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right)=1\left(E_{{\scriptscriptstyle 1}}\right)1\left(E_{{\scriptscriptstyle 2}}\right) \]

disjunction = OR

\[ \left[p\vee q\right]=\left[p\right]+\left[q\right]-\left[p\right]\left[q\right]=\left[p\right]+\left[q\right]-\left[p\wedge q\right] \]

Proof:

in set theory or domain of events,

\[ \begin{aligned} 1\left(E_{{\scriptscriptstyle 1}}\cup E_{{\scriptscriptstyle 2}}\right)\overset{\text{de Moivre}}{=} & 1\left(\overline{\overline{E}_{{\scriptscriptstyle 1}}\cap\overline{E}_{{\scriptscriptstyle 2}}}\right)\\ = & 1-1\left(\overline{E}_{{\scriptscriptstyle 1}}\cap\overline{E}_{{\scriptscriptstyle 2}}\right)=1-1\left(\overline{E}_{{\scriptscriptstyle 1}}\right)1\left(\overline{E}_{{\scriptscriptstyle 2}}\right)\\ = & 1-\left[1-1\left(E_{{\scriptscriptstyle 1}}\right)\right]\left[1-1\left(E_{{\scriptscriptstyle 2}}\right)\right]\\ = & 1-\left[1-1\left(E_{{\scriptscriptstyle 1}}\right)\right]\left[1-1\left(E_{{\scriptscriptstyle 2}}\right)\right]\\ = & 1-\left[1-1\left(E_{{\scriptscriptstyle 1}}\right)-1\left(E_{{\scriptscriptstyle 2}}\right)+1\left(E_{{\scriptscriptstyle 1}}\right)1\left(E_{{\scriptscriptstyle 2}}\right)\right]\\ = & 1\left(E_{{\scriptscriptstyle 1}}\right)+1\left(E_{{\scriptscriptstyle 2}}\right)-1\left(E_{{\scriptscriptstyle 1}}\right)1\left(E_{{\scriptscriptstyle 2}}\right)\\ = & 1\left(E_{{\scriptscriptstyle 1}}\right)+1\left(E_{{\scriptscriptstyle 2}}\right)-1\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right) \end{aligned} \]

\[ 1\left(E_{{\scriptscriptstyle 1}}\cup E_{{\scriptscriptstyle 2}}\right)=1\left(E_{{\scriptscriptstyle 1}}\right)+1\left(E_{{\scriptscriptstyle 2}}\right)-1\left(E_{{\scriptscriptstyle 1}}\right)1\left(E_{{\scriptscriptstyle 2}}\right)=1\left(E_{{\scriptscriptstyle 1}}\right)+1\left(E_{{\scriptscriptstyle 2}}\right)-1\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right) \]

\[ \tag*{$\Box$} \]

implication = conditional

\[ \begin{aligned} \left[p\rightarrow q\right]= & \left[\neg p\vee q\right]\\ = & \left[\neg p\right]+\left[q\right]-\left[\neg p\right]\left[q\right]\\ = & 1-\left[p\right]+\left[q\right]-\left(1-\left[p\right]\right)\left[q\right]\\ = & 1-\left[p\right]+\left[p\right]\left[q\right] \end{aligned} \]

exclusive disjunction = XOR

\[ \begin{aligned} \left[p\veebar q\right]=\left[p\oplus q\right]= & \left|\left[p\right]-\left[q\right]\right|=\left(\left[p\right]-\left[q\right]\right)^{2}\\ = & \left[p\right]\left(1-\left[q\right]\right)+\left(1-\left[p\right]\right)\left[q\right] \end{aligned} \]

biconditional = XNOR

\[ \left[p\leftrightarrow q\right]=\left[p\odot q\right]=\left[\neg\left(p\oplus q\right)\right]=\left[\neg\left(p\veebar q\right)\right]=\left(\left[p\right]+\left(1-\left[q\right]\right)\right)\left(\left(1-\left[p\right]\right)+\left[q\right]\right) \]

Kronecker delta

\[ \delta_{ij}=\left[i=j\right] \]

single-argument notation

\[ \delta_{i}=\delta_{i0}=\begin{cases} 1 & i=j=0\\ 0 & i\neq j=0 \end{cases} \]

sign function

\[ \mathrm{sgn}\left(x\right)=\begin{cases} 1 & x>0\\ 0 & x=0=\left[x>0\right]-\left[x<0\right]\\ -1 & x<0 \end{cases} \]

absolute function

\[ \begin{aligned} \left|x\right|= & \begin{cases} x & x\ge0\\ -x & x<0 \end{cases}=\begin{cases} x & x>0\\ -x & x\le0 \end{cases}=\begin{cases} x & x>0\\ 0 & x=0\\ -x & x<0 \end{cases}\\ = & \begin{cases} x\cdot1 & x>0\\ x\cdot0 & x=0\\ x\cdot\left(-1\right) & x<0 \end{cases}=\begin{cases} x\cdot\mathrm{sgn}\left(x\right) & x>0\\ x\cdot\mathrm{sgn}\left(x\right) & x=0\\ x\cdot\mathrm{sgn}\left(x\right) & x<0 \end{cases}\\ = & x\cdot\mathrm{sgn}\left(x\right)=x\left(\left[x>0\right]-\left[x<0\right]\right)=x\left[x>0\right]-x\left[x<0\right] \end{aligned} \]

binary min and max function

\[ \max\left(x,y\right)=x\left[x>y\right]+y\left[x\le y\right] \]

\[ \min\left(x,y\right)=x\left[x\le y\right]+y\left[x>y\right] \]

binary max function

\[ \max\left(x,y\right)=\dfrac{x+y+\left|x-y\right|}{2} \]

floor and ceiling functions

floor function

\[ \begin{aligned} \left\lfloor x\right\rfloor = & n,\,n\le x<n+1\\ = & \sum_{n\in\mathbb{N}}n\left[n\le x<n+1\right] \end{aligned} \]

ceiling function

\[ \begin{aligned} \left\lceil x\right\rceil = & n,\,n-1<x\le n\\ = & \sum_{n\in\mathbb{N}}n\left[n-1<x\le n\right] \end{aligned} \]

Heaviside step function

\[ \mathrm{H}\left(x\right)=\begin{cases} 1 & x>0\\ 0 & x\le0 \end{cases}=\left[x>0\right]=1_{\left(0,\infty\right)}\left(x\right) \]

or conveniently define “unit step function”

\[ u\left(x\right)=\begin{cases} 1 & x\ge0\\ 0 & x<0 \end{cases}=\left[x\ge0\right]=1_{\left[0,\infty\right)}\left(x\right) \]

ramp function = rectified linear unit activation function = ReLU

\[ \mathrm{ReLU}\left(x\right)=\begin{cases} x & x\ge0\\ 0 & x<0 \end{cases}=x\left[x\ge0\right] \]

indicator function

\[ A\subseteq X\Rightarrow\begin{cases} 1_{A}:X\rightarrow\left\{ 0,1\right\} & \Leftrightarrow x\in X\overset{1_{A}}{\rightarrow}\left\{ 0,1\right\} \\ 1_{A}\left(x\right)=\begin{cases} 1 & x\in A\\ 0 & x\notin A \end{cases} & =\left[x\in A\right]=\begin{cases} 1 & v\left(x\in A\right)=\mathrm{T}\\ 0 & v\left(\neg\left(x\in A\right)\right)=\mathrm{T} \end{cases} \end{cases} \]

\(A,B\subseteq\Omega\),

\[ A=B\Leftrightarrow1_{A}=1_{B} \]

\[ A=\Omega\Leftrightarrow1_{A}\left(x\right)=1 \]

\[ A=\emptyset\Leftrightarrow1_{A}\left(x\right)=0 \]

Theorem 18.10 subset indicator order

\[ A\subset B\Rightarrow1_{A}\left(x\right)\le1_{B}\left(x\right) \]

Proof:

\[ \begin{aligned} & \forall x\left(1_{A}\left(x\right)=1\Rightarrow1_{B}\left(x\right)=1\right)\\ \Leftrightarrow & \forall x\left(\neg1_{A}\left(x\right)=1\vee1_{B}\left(x\right)=1\right)\\ \Leftrightarrow & \forall x\neg\left(1_{A}\left(x\right)=1\wedge\neg1_{B}\left(x\right)=1\right)\\ \Rightarrow & \neg\exists x\left(1_{A}\left(x\right)=1\wedge1_{B}\left(x\right)=0\right)\\ \Rightarrow & \neg\exists x\left(1_{B}\left(x\right)=0<1=1_{A}\left(x\right)\right)\\ \Rightarrow & \neg\exists x\left(1_{B}\left(x\right)<1_{A}\left(x\right)\right)\\ \Rightarrow & \forall x\left(1_{B}\left(x\right)\ge1_{A}\left(x\right)\right) \end{aligned} \]

\[ \tag*{$\Box$} \]


in set theory or domain of events,

\[ 1\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right)=1\left(E_{{\scriptscriptstyle 1}}\right)1\left(E_{{\scriptscriptstyle 2}}\right) \]

\[ 1\left(\overline{E}\right)=1-1\left(E\right) \]

\[ 1\left(E_{{\scriptscriptstyle 1}}\cup E_{{\scriptscriptstyle 2}}\right)=1\left(E_{{\scriptscriptstyle 1}}\right)+1\left(E_{{\scriptscriptstyle 2}}\right)-1\left(E_{{\scriptscriptstyle 1}}\right)1\left(E_{{\scriptscriptstyle 2}}\right)=1\left(E_{{\scriptscriptstyle 1}}\right)+1\left(E_{{\scriptscriptstyle 2}}\right)-1\left(E_{{\scriptscriptstyle 1}}\cap E_{{\scriptscriptstyle 2}}\right) \]


expectation in many perspectives

\[ Y=g\left(X\right) \]

\[ \intop\limits _{-\infty}^{+\infty}y\thinspace f_{{\scriptscriptstyle Y}}\left(y\right)\mathrm{d}y=\mathrm{E}\left[Y\right]=\mathrm{E}\left[g\left(X\right)\right]=\intop\limits _{-\infty}^{+\infty}g\left(x\right)f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x \]

\[ \mathrm{E}_{{\scriptscriptstyle Y}}\left[Y\right]=\intop\limits _{-\infty}^{+\infty}y\thinspace f_{{\scriptscriptstyle Y}}\left(y\right)\mathrm{d}y=\mathrm{E}\left[Y\right]=\mathrm{E}\left[g\left(X\right)\right]=\intop\limits _{-\infty}^{+\infty}g\left(x\right)f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x=\mathrm{E}_{{\scriptscriptstyle X}}\left[g\left(X\right)\right] \]

\[ \mathrm{E}_{{\scriptscriptstyle Y}}\left[Y\right]=\mathrm{E}_{{\scriptscriptstyle X}}\left[g\left(X\right)\right] \]


\[ \begin{aligned} & \mathrm{E}\left[a_{{\scriptscriptstyle 1}}g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)+a_{{\scriptscriptstyle 2}}g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)+c\right],\begin{cases} Y_{{\scriptscriptstyle 1}}=g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)\\ Y_{{\scriptscriptstyle 2}}=g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right) \end{cases}\\ = & \mathrm{E}\left[a_{{\scriptscriptstyle 1}}Y_{{\scriptscriptstyle 1}}+a_{{\scriptscriptstyle 2}}Y_{{\scriptscriptstyle 2}}+c\right] \end{aligned} \]

\[ \begin{aligned} & \mathrm{E}\left[a_{{\scriptscriptstyle 1}}g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)+a_{{\scriptscriptstyle 2}}g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)+c\right]=a_{{\scriptscriptstyle 1}}\mathrm{E}\left[g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}\left[g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)\right]+c\\ = & \mathrm{E}\left[a_{{\scriptscriptstyle 1}}Y_{{\scriptscriptstyle 1}}+a_{{\scriptscriptstyle 2}}Y_{{\scriptscriptstyle 2}}+c\right]=a_{{\scriptscriptstyle 1}}\mathrm{E}\left[Y_{{\scriptscriptstyle 1}}\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}\left[Y_{{\scriptscriptstyle 2}}\right]+c \end{aligned} \]

\[ a_{{\scriptscriptstyle 1}}\mathrm{E}\left[g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}\left[g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)\right]+c=a_{{\scriptscriptstyle 1}}\mathrm{E}_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left[g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}_{{\scriptscriptstyle X_{{\scriptscriptstyle 2}}}}\left[g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)\right]+c \]

\[ a_{{\scriptscriptstyle 1}}\mathrm{E}\left[Y_{{\scriptscriptstyle 1}}\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}\left[Y_{{\scriptscriptstyle 2}}\right]+c=a_{{\scriptscriptstyle 1}}\mathrm{E}_{{\scriptscriptstyle Y_{{\scriptscriptstyle 1}}}}\left[Y_{{\scriptscriptstyle 1}}\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}_{{\scriptscriptstyle Y_{{\scriptscriptstyle 1}}}}\left[Y_{{\scriptscriptstyle 2}}\right]+c \]

\[ a_{{\scriptscriptstyle 1}}\mathrm{E}_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left[g_{{\scriptscriptstyle 1}}\left(X_{{\scriptscriptstyle 1}}\right)\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}_{{\scriptscriptstyle X_{{\scriptscriptstyle 2}}}}\left[g_{{\scriptscriptstyle 2}}\left(X_{{\scriptscriptstyle 2}}\right)\right]+c=a_{{\scriptscriptstyle 1}}\mathrm{E}_{{\scriptscriptstyle Y_{{\scriptscriptstyle 1}}}}\left[Y_{{\scriptscriptstyle 1}}\right]+a_{{\scriptscriptstyle 2}}\mathrm{E}_{{\scriptscriptstyle Y_{{\scriptscriptstyle 1}}}}\left[Y_{{\scriptscriptstyle 2}}\right]+c \]

18.1.2.7 moment

Definition 18.15 \(n^\text{th}\) moment: For each integer \(n\), the \(n^\text{th}\) moment of \(X\) is \(\mathrm{E}\left[X^{n}\right]\).

The \(n^\text{th}\) central moment of \(X\) is \(\mu_{{\scriptscriptstyle n}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{n}\right]\).

\[ \mathrm{E}\left[X^{n}\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}x^{n}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}x^{n}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

\[ \mu=\mathrm{E}\left[X^{1}\right]=\mathrm{E}\left[X\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}x^{1}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}x^{1}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}=\begin{cases} \intop\limits _{-\infty}^{+\infty}x\thinspace f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}x\thinspace f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

\[ \begin{aligned} \mu_{{\scriptscriptstyle n}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{n}\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{n}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{n}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \end{aligned} \]


\(1^\text{st}\) moment of \(X\) = mean

\[ \mu=\mathrm{E}\left[X^{1}\right]=\mathrm{E}\left[X\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}x^{1}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}x^{1}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}=\begin{cases} \intop\limits _{-\infty}^{+\infty}x\thinspace f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}x\thinspace f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

\(1^\text{st}\) central moment of \(X\) = \(0\)

\[ \begin{aligned} \mu_{{\scriptscriptstyle 1}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{1}\right]= & \begin{cases} \intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{1}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{1}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}=\begin{cases} \intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}\\ = & \begin{cases} \intop\limits _{-\infty}^{+\infty}x\thinspace f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x-\intop\limits _{-\infty}^{+\infty}\mu\thinspace f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}x\thinspace f_{{\scriptscriptstyle X}}\left(x\right)-\sum\limits _{x\in X\left(\Omega\right)}\mu\thinspace f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}\\ = & \begin{cases} \mathrm{E}\left[X\right]-\mu\intop\limits _{-\infty}^{+\infty}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \mathrm{E}\left[X\right]-\mu\sum\limits _{x\in X\left(\Omega\right)}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}=\begin{cases} \mathrm{E}\left[X\right]-\mu\cdot1 & X\text{ continuous}\\ \mathrm{E}\left[X\right]-\mu\cdot1 & X\text{ discrete} \end{cases}\\ = & \begin{cases} \mathrm{E}\left[X\right]-\mu & X\text{ continuous}\\ \mathrm{E}\left[X\right]-\mu & X\text{ discrete} \end{cases}=\begin{cases} \mathrm{E}\left[X\right]-\mathrm{E}\left[X\right] & X\text{ continuous}\\ \mathrm{E}\left[X\right]-\mathrm{E}\left[X\right] & X\text{ discrete} \end{cases}\\ = & \begin{cases} 0 & X\text{ continuous}\\ 0 & X\text{ discrete} \end{cases}=0\\ \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)\right]= & 0\\ \mathrm{E}\left[X-\mathrm{E}\left[X\right]\right]= & 0 \end{aligned} \]

\[ \forall X\left(\mathrm{E}\left[X-\mathrm{E}\left[X\right]\right]=0\right) \]

For normal distribution, actually for any distribution,

\[ \begin{array}{c} X\sim\mathrm{n}\left(0,1\right)=\mathcal{N}\left(0,1^{2}\right)\\ \Downarrow\\ \mathrm{E}\left[X-\mathrm{E}\left[X\right]\right]=0 \end{array} \]


\(2^\text{nd}\) central moment of \(X\) = variance

\[ \begin{aligned} \mu_{{\scriptscriptstyle 2}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]= & \begin{cases} \intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{2}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{2}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases}\\ = & \mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]=\mathrm{V}\left[X\right]=\mathrm{V}\left(X\right) \end{aligned} \]

For normal distribution,

\[ \begin{array}{c} X\sim\mathrm{n}\left(0,1\right)=\mathcal{N}\left(0,1^{2}\right)=\mathcal{N}\left(\mu=0,\mathrm{V}^{2}\left[X\right]=1^{2}\right)\\ \Downarrow\\ \mathrm{V}\left[X\right]=\mathrm{V}\left(X\right)=1 \end{array} \]

variance properties

\[ \begin{aligned} \mathrm{V}\left[aX+b\right]=a^{2}\mathrm{V}\left[X\right] \end{aligned} \]

Proof:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]


\(3^\text{rd}\) central moment of \(X\)

\[ \mu_{{\scriptscriptstyle 3}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{3}\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{3}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{3}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

skewness

偏度

\[ \begin{aligned} \mathrm{skewness}\left[X\right]= & \dfrac{\mu_{{\scriptscriptstyle 3}}}{\mu_{{\scriptscriptstyle 2}}^{\frac{3}{2}}}=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{3}\right]}{\left(\mathrm{V}\left[X\right]\right)^{\frac{3}{2}}}=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{3}\right]}{\left(\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]\right)^{\frac{3}{2}}}\\ = & \begin{cases} \dfrac{\intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{3}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x}{\left(\intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{2}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x\right)^{\frac{3}{2}}} & X\text{ continuous}\\ \dfrac{\sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{3}f_{{\scriptscriptstyle X}}\left(x\right)}{\left(\sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{2}f_{{\scriptscriptstyle X}}\left(x\right)\right)^{\frac{3}{2}}} & X\text{ discrete} \end{cases} \end{aligned} \]

For normal distribution,

\[ \begin{array}{c} X\sim\mathrm{n}\left(0,1\right)=\mathcal{N}\left(0,1^{2}\right)=\mathcal{N}\left(\mu=0,\mathrm{V}^{2}\left[X\right]=1^{2}\right)\\ \Downarrow\\ \mathrm{skewness}\left[X\right]=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{3}\right]}{\left(\mathrm{V}\left[X\right]\right)^{\frac{3}{2}}}=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{3}\right]}{1^{\frac{3}{2}}}=0 \end{array} \]

Proof:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]


\(4^\text{th}\) central moment of \(X\)

\[ \mu_{{\scriptscriptstyle 4}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{4}\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{4}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x & X\text{ continuous}\\ \sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{4}f_{{\scriptscriptstyle X}}\left(x\right) & X\text{ discrete} \end{cases} \]

kurtosis

峰度

\[ \begin{aligned} \mathrm{kurtosis}\left[X\right]= & \dfrac{\mu_{{\scriptscriptstyle 4}}}{\mu_{{\scriptscriptstyle 2}}^{2}}=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{4}\right]}{\left(\mathrm{V}\left[X\right]\right)^{2}}=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{4}\right]}{\left(\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]\right)^{2}}\\ = & \begin{cases} \dfrac{\intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{4}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x}{\left(\intop\limits _{-\infty}^{+\infty}\left(x-\mu\right)^{2}f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x\right)^{2}} & X\text{ continuous}\\ \dfrac{\sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{4}f_{{\scriptscriptstyle X}}\left(x\right)}{\left(\sum\limits _{x\in X\left(\Omega\right)}\left(x-\mu\right)^{2}f_{{\scriptscriptstyle X}}\left(x\right)\right)^{2}} & X\text{ discrete} \end{cases} \end{aligned} \]

For normal distribution,

\[ \begin{array}{c} X\sim\mathrm{n}\left(0,1\right)=\mathcal{N}\left(0,1^{2}\right)=\mathcal{N}\left(\mu=0,\mathrm{V}^{2}\left[X\right]=1^{2}\right)\\ \Downarrow\\ \mathrm{kurtosis}\left[X\right]=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{4}\right]}{\left(\mathrm{V}\left[X\right]\right)^{2}}=\dfrac{\left[\left(X-\mathrm{E}\left[X\right]\right)^{4}\right]}{1^{2}}=3 \end{array} \]

Proof:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]


For normal distribution,

\[ \begin{array}{c} X\sim\mathrm{n}\left(0,1\right)=\mathcal{N}\left(0,1^{2}\right)=\mathcal{N}\left(\mu=0,\mathrm{V}^{2}\left[X\right]=1^{2}\right)\\ \Downarrow\\ \end{array} \]

\[ \begin{cases} \mu=\mathrm{E}\left[X\right] & =0\\ \mu_{{\scriptscriptstyle 1}}=\mathrm{E}\left[X-\mathrm{E}\left[X\right]\right] & =0\\ \mathrm{variance}\left[X\right]=\mathrm{V}\left[X\right]=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right] & =1\\ \mathrm{skewness}\left[X\right]=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{3}\right]}{\left(\mathrm{V}\left[X\right]\right)^{\frac{3}{2}}} & =0\\ \mathrm{kurtosis}\left[X\right]=\dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{4}\right]}{\left(\mathrm{V}\left[X\right]\right)^{2}} & =3 \end{cases} \]

\[ \begin{aligned} \mu= & \mathrm{E}\left[X\right]=0\\ \mu_{{\scriptscriptstyle 1}}= & \mathrm{E}\left[X-\mathrm{E}\left[X\right]\right]=0\\ \mathrm{variance}\left[X\right]= & \mathrm{V}\left[X\right]=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{2}\right]=1\\ \mathrm{skewness}\left[X\right]= & \dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{3}\right]}{\left(\mathrm{V}\left[X\right]\right)^{\frac{3}{2}}}=0\\ \mathrm{kurtosis}\left[X\right]= & \dfrac{\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{4}\right]}{\left(\mathrm{V}\left[X\right]\right)^{2}}=3 \end{aligned} \]


\[ X\sim F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow f_{{\scriptscriptstyle X}}\left(x\right)\rightarrow\left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} =\left\{ \mu_{{\scriptscriptstyle n}}\middle|\begin{cases} n\in\mathbb{N}\\ \mu_{{\scriptscriptstyle n}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{n}\right] \end{cases}\right\} \]


18.1.2.7.1 moment generating function

Definition 18.16 MGF = moment generating function: The moment generating function of \(X\) is \(M\left(\xi\right)=M_{{\scriptscriptstyle X}}\left(\xi\right)=\mathrm{E}\left[\mathrm{e}^{\xi X}\right]\), provided that the expression exists for \(t\approx0\).

\[ M\left(t\right)=M_{{\scriptscriptstyle X}}\left(t\right)=\mathrm{E}\left[\mathrm{e}^{tX}\right] \]

\[ M\left(\xi\right)=M_{{\scriptscriptstyle X}}\left(\xi\right)=\mathrm{E}\left[\mathrm{e}^{\xi X}\right] \]

Theorem 18.11 moment generating function(MGF) generating moment

\[ M_{{\scriptscriptstyle X}}^{\left(n\right)}\left(\xi\right)=\mathrm{E}\left[X^{n}\right] \]

where

\[ M_{{\scriptscriptstyle X}}^{\left(n\right)}\left(\xi\right)=\dfrac{\mathrm{d}^{n}}{\mathrm{d}\xi^{n}}M_{{\scriptscriptstyle X}}\left(\xi\right) \]

Proof:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]

\[ X\sim F_{{\scriptscriptstyle X}}\left(x\right)\leftrightarrow f_{{\scriptscriptstyle X}}\left(x\right)\rightarrow\left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} =\left\{ \mu_{{\scriptscriptstyle n}}\middle|\begin{cases} n\in\mathbb{N}\\ \mu_{{\scriptscriptstyle n}}=\mathrm{E}\left[\left(X-\mathrm{E}\left[X\right]\right)^{n}\right] \end{cases}\right\} \]

\[ \begin{array}{ccccccc} X & \sim & F_{{\scriptscriptstyle X}}\left(x\right) & \leftrightarrow & f_{{\scriptscriptstyle X}}\left(x\right) & \rightarrow & \left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} \\ & & & & \downarrow & \nearrow\\ & & & & M_{{\scriptscriptstyle X}}\left(\xi\right) \end{array} \]

\[ \begin{array}{ccccccc} X & \sim & F_{{\scriptscriptstyle X}}\left(x\right) & \leftrightarrow & f_{{\scriptscriptstyle X}}\left(x\right)\\ & & & & \downarrow & \searrow\\ & & & & M_{{\scriptscriptstyle X}}\left(\xi\right) & \rightarrow & \left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} \end{array} \]

Theorem 18.12 If \(X\) and \(Y\) have bounded support, then \(\forall u\left[F_{{\scriptscriptstyle X}}\left(u\right)=F_{{\scriptscriptstyle Y}}\left(u\right)\right]\) iff \(\forall n\in\mathbb{N}\left(\mathrm{E}\left[X^{n}\right]=\mathrm{E}\left[Y^{n}\right]\right)\).

\[ \forall u\left[F_{{\scriptscriptstyle X}}\left(u\right)=F_{{\scriptscriptstyle Y}}\left(u\right)\right]\Rightarrow\forall n\in\mathbb{N}\left(\mathrm{E}\left[X^{n}\right]=\mathrm{E}\left[Y^{n}\right]\right) \]

\[ \begin{cases} \forall n\in\mathbb{N}\left(\mathrm{E}\left[X^{n}\right]=\mathrm{E}\left[Y^{n}\right]\right)\\ \begin{cases} \mathrm{supp}\left(f_{{\scriptscriptstyle X}}\right)\text{ is bounded}\\ \mathrm{supp}\left(f_{{\scriptscriptstyle Y}}\right)\text{ is bounded} \end{cases} \end{cases}\Rightarrow \forall u\left[F_{{\scriptscriptstyle X}}\left(u\right)=F_{{\scriptscriptstyle Y}}\left(u\right)\right] \]

Proof:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]

Theorem 18.13 If \(M_{{\scriptscriptstyle X}}\left(t\right)\) and \(M_{{\scriptscriptstyle Y}}\left(t\right)\) exist, then \(\forall u\left[F_{{\scriptscriptstyle X}}\left(u\right)=F_{{\scriptscriptstyle Y}}\left(u\right)\right]\) iff \(\forall t\approx0\left[M_{{\scriptscriptstyle X}}\left(t\right)=M_{{\scriptscriptstyle Y}}\left(t\right)\right]\).

\[ \forall u\left[F_{{\scriptscriptstyle X}}\left(u\right)=F_{{\scriptscriptstyle Y}}\left(u\right)\right]\Rightarrow\forall t\approx0\left[M_{{\scriptscriptstyle X}}\left(t\right)=M_{{\scriptscriptstyle Y}}\left(t\right)\right] \]

\[ \begin{cases} \forall t\approx0\left[M_{{\scriptscriptstyle X}}\left(t\right)=M_{{\scriptscriptstyle Y}}\left(t\right)\right]\\ \begin{cases} \exists M_{{\scriptscriptstyle X}}\left(t\right)\in\mathbb{R}\\ \exists M_{{\scriptscriptstyle Y}}\left(t\right)\in\mathbb{R} \end{cases} \end{cases}\Rightarrow\forall u\left[F_{{\scriptscriptstyle X}}\left(u\right)=F_{{\scriptscriptstyle Y}}\left(u\right)\right] \]

Proof:

\[ \begin{aligned} \text{to be proved} \end{aligned} \]

\[ \tag*{$\Box$} \]

\[ \begin{array}{ccccccc} X & \sim & F_{{\scriptscriptstyle X}}\left(x\right) & \leftrightarrow & f_{{\scriptscriptstyle X}}\left(x\right)\\ & & & \looparrowright & \uparrow\downarrow & \searrow\nwarrow & \looparrowleft\wedge\ \mathrm{supp}\left(f_{{\scriptscriptstyle X}}\right)\text{ is bounded}\\ & & \forall\xi\approx0\left[M_{{\scriptscriptstyle X}}\left(\xi\right)\in\mathbb{R}\right] & \wedge & M_{{\scriptscriptstyle X}}\left(\xi\right) & \rightarrow & \left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} \end{array} \]

18.1.2.7.2 characteristic function

Definition 18.17 CF = characteristic function: The characteristic function of \(X\) is \(\varphi\left(\xi\right)=\varphi_{{\scriptscriptstyle X}}\left(\xi\right)=\mathrm{E}\left[\mathrm{e}^{\mathrm{i}\xi X}\right]\), provided that the expression always exists.

\[ \varphi\left(t\right)=\varphi_{{\scriptscriptstyle X}}\left(t\right)=\mathrm{E}\left[\mathrm{e}^{\mathrm{i}tX}\right] \]

\[ \varphi\left(\xi\right)=\varphi_{{\scriptscriptstyle X}}\left(\xi\right)=\mathrm{E}\left[\mathrm{e}^{\mathrm{i}\xi X}\right] \]

Note:

  1. \(\varphi\left(\xi\right)=\varphi_{{\scriptscriptstyle X}}\left(\xi\right)=\mathrm{E}\left[\mathrm{e}^{\mathrm{i}\xi X}\right]\) always exists.

\[ \forall X\left(\varphi\left(\xi\right)=\varphi_{{\scriptscriptstyle X}}\left(\xi\right)=\mathrm{E}\left[\mathrm{e}^{\mathrm{i}\xi X}\right]\in\mathbb{R}\right) \]

  1. moment generating function to characteristic function

\[ M\left(\xi\right)=M_{{\scriptscriptstyle X}}\left(\xi\right)=\mathrm{E}\left[\mathrm{e}^{\xi X}\right]\in\mathbb{R}\Rightarrow M_{{\scriptscriptstyle X}}\left(\mathrm{i}\xi\right)=\varphi_{{\scriptscriptstyle X}}\left(\xi\right) \]

  1. inversion theorem or inversion formula

For \(a<b\),

\[ \begin{aligned} & \lim_{T\rightarrow\infty}\dfrac{1}{2\pi}\intop_{-T}^{+T}\dfrac{\mathrm{e}^{-\mathrm{i}ta}-\mathrm{e}^{-\mathrm{i}tb}}{\mathrm{i}t}\varphi_{{\scriptscriptstyle X}}\left(t\right)\mathrm{d}t\\ = & \mathrm{P}\left(a<X<b\right)+\dfrac{1}{2}\left[\mathrm{P}\left(X=a\right)+\mathrm{P}\left(X=b\right)\right] \end{aligned} \]

i.e. \(\varphi_{{\scriptscriptstyle X}}\left(\xi\right)\) determines \(F_{{\scriptscriptstyle X}}\left(x\right)\).

\[ \begin{array}{ccccccc} X & \sim & F_{{\scriptscriptstyle X}}\left(x\right) & \leftrightarrow & f_{{\scriptscriptstyle X}}\left(x\right)\\ & & \updownarrow & \forall\xi\approx0\left[M_{{\scriptscriptstyle X}}\left(\xi\right)\in\mathbb{R}\right]\ \wedge & \uparrow\downarrow & \searrow\nwarrow & \looparrowleft\wedge\ \mathrm{supp}\left(f_{{\scriptscriptstyle X}}\right)\text{ is bounded}\\ & & \varphi_{{\scriptscriptstyle X}}\left(\xi\right) & \leftrightarrows & M_{{\scriptscriptstyle X}}\left(\xi\right) & \rightarrow & \left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} \end{array} \]

\[ \begin{array}{ccccccc} X & \sim & F_{{\scriptscriptstyle X}}\left(x\right) & \overset{\text{FToC}}{\longleftrightarrow} & f_{{\scriptscriptstyle X}}\left(x\right) & \leftrightarrow & \mathrm{P}_{{\scriptscriptstyle X}}\\ & \text{inversion formula}\ : & \updownarrow & \forall\xi\approx0\left[M_{{\scriptscriptstyle X}}\left(\xi\right)\in\mathbb{R}\right]\ \wedge & \uparrow\downarrow & \searrow\nwarrow & \looparrowleft\wedge\ \mathrm{supp}\left(f_{{\scriptscriptstyle X}}\right)\text{ is bounded}\\ & & \varphi_{{\scriptscriptstyle X}}\left(\xi\right) & \leftrightarrows & M_{{\scriptscriptstyle X}}\left(\xi\right) & \rightarrow & \left\{ \mu_{{\scriptscriptstyle n}}\middle|n\in\mathbb{N}\right\} \end{array} \]


https://www.youtube.com/watch?v=fSbs6im6wqY

MGF theorems

\[ \ \]

https://www.youtube.com/watch?v=fSbs6im6wqY&t=524s

Theorem 18.14 If

\[ \ \]

CLT

18.1.2.8 common families of distributions

mean and variance of discrete probability distributions[46.5]

18.1.2.8.1 discrete distribution
18.1.2.8.1.1 discrete uniform distribution

\[ \begin{array}{c} X\sim\mathcal{DU}\left(1,N\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|N\right)=\dfrac{1}{N}\\ x\in X\left(\Omega\right)=\left\{ 1,2,\cdots,N\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \dfrac{N+1}{2}\\ \mathrm{V}\left[X\right]= & \dfrac{\left(N+1\right)\left(N-1\right)}{12} \end{cases} \end{array} \]

\[ \begin{aligned} \mathrm{E}\left[X\right]= & \sum_{x\in X\left(\Omega\right)}x\thinspace f_{{\scriptscriptstyle X}}\left(x|N\right)=\sum_{x=1}^{N}x\dfrac{1}{N}\\ = & \dfrac{1}{N}\sum_{x=1}^{N}x=\dfrac{1}{N}\dfrac{N\left(N+1\right)}{2}=\dfrac{N+1}{2}\\ \mathrm{E}\left[X\right]= & \dfrac{N+1}{2} \end{aligned} \]

\[ \begin{aligned} \mathrm{V}\left[X\right]= & \sum_{x\in X\left(\Omega\right)}\left(x-\mathrm{E}\left[X\right]\right)^{2}f_{{\scriptscriptstyle X}}\left(x|N\right)=\sum_{x=1}^{N}\left(x-\dfrac{N+1}{2}\right)^{2}\dfrac{1}{N}\\ = & \dfrac{1}{N}\sum_{x=1}^{N}\left[x^{2}-\left(N+1\right)x+\left(\dfrac{N+1}{2}\right)^{2}\right]\\ = & \dfrac{1}{N}\left[\sum_{x=1}^{N}x^{2}-\left(N+1\right)\sum_{x=1}^{N}x+N\left(\dfrac{N+1}{2}\right)^{2}\right]\\ = & \dfrac{1}{N}\left[\dfrac{N\left(N+1\right)\left(2N+1\right)}{6}-\left(N+1\right)\dfrac{N\left(N+1\right)}{2}+N\left(\dfrac{N+1}{2}\right)^{2}\right]\\ = & \left(N+1\right)\left[\dfrac{2N+1}{6}-\dfrac{N+1}{2}+\dfrac{N+1}{4}\right]=\left(N+1\right)\dfrac{4N+2-\left(3N+3\right)}{12}\\ = & \dfrac{\left(N+1\right)\left(N-1\right)}{12}\\ \mathrm{V}\left[X\right]= & \dfrac{\left(N+1\right)\left(N-1\right)}{12} \end{aligned} \]


\[ X\sim\mathcal{DU}\left(a,b\right)\\\Updownarrow\\\begin{cases} f_{{\scriptscriptstyle X}}\left(x|N\right)=\dfrac{1}{N}=\dfrac{1}{b-a+1} & N=b-a+1\\ x\in X\left(\Omega\right)=\left\{ a,a+1,\cdots,b\right\} \end{cases}\\\Updownarrow\\F_{{\scriptscriptstyle X}}\left(x|a,b\right)=\dfrac{\left\lfloor x\right\rfloor -a+1}{b-a+1} \]

18.1.2.8.1.2 hypergeometric distribution

\[ \begin{array}{c} X\sim\mathcal{HG}\left(N,M,K\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|N,M,K\right)=\dfrac{\dbinom{N-M}{K-x}\dbinom{M}{x}}{\dbinom{N}{K}}\\ x\in X\left(\Omega\right)=\left\{ \max\left\{ 0,K-\left(N-M\right)\right\} ,\cdots,\min\left\{ K,M\right\} \right\} \end{cases}\\ \Downarrow K\ll N,M\\ x\in X\left(\Omega\right)=\left\{ 0,1,\cdots,K\right\} \\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \dfrac{KM}{N}\\ \mathrm{V}\left[X\right]= & \dfrac{KM}{N}\dfrac{\left(N-M\right)\left(N-K\right)}{N\left(N-1\right)} \end{cases} \end{array} \]

18.1.2.8.1.3 Bernoulli distribution

\[ \begin{array}{c} X\sim\mathcal{B}\left(p\right),p=\mathrm{P}\left(X=1\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|p\right)=\left(1-p\right)^{1-x}p^{x}\\ x\in X\left(\Omega\right)=\left\{ 0,1\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & p\\ \mathrm{V}\left[X\right]= & p\left(1-p\right)\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \left(1-p\right)+p\mathrm{e}^{\xi} \end{cases} \end{array} \]

18.1.2.8.1.4 binomial distribution

independent and identical Bernoulli trials

\[ \begin{array}{c} X\sim\mathrm{b}\left(n,p\right),p=\mathrm{P}\left(X=1\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|n,p\right)=\dbinom{n}{x}\left(1-p\right)^{n-x}p^{x}\\ x\in X\left(\Omega\right)=\left\{ 0,1,\cdots,n\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & np\\ \mathrm{V}\left[X\right]= & np\left(1-p\right)\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n} \end{cases} \end{array} \]

reparameterization technique

\[ \begin{aligned} M_{{\scriptscriptstyle X}}\left(\xi\right)= & \mathrm{E}\left[\mathrm{e}^{\xi X}\right]=\sum_{x\in X\left(\Omega\right)}\mathrm{e}^{\xi x}f_{{\scriptscriptstyle X}}\left(x|n,p\right)\\ = & \sum_{x=1}^{n}\mathrm{e}^{\xi x}\dbinom{n}{x}\left(1-p\right)^{n-x}p^{x}=\sum_{x=1}^{n}\dbinom{n}{x}\left(1-p\right)^{n-x}\left(p\mathrm{e}^{\xi}\right)^{x}\\ = & \sum_{x=1}^{n}\dbinom{n}{x}\left[\dfrac{1-p}{\left(1-p\right)+p\mathrm{e}^{\xi}}\right]^{n-x}\left[\dfrac{p\mathrm{e}^{\xi}}{\left(1-p\right)+p\mathrm{e}^{\xi}}\right]^{x}\left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n-x}\left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{x}\\ = & \sum_{x=1}^{n}\dbinom{n}{x}\left[\dfrac{1-p}{\left(1-p\right)+p\mathrm{e}^{\xi}}\right]^{n-x}\left[\dfrac{p\mathrm{e}^{\xi}}{\left(1-p\right)+p\mathrm{e}^{\xi}}\right]^{x}\left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n}\\ = & \left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n}\sum_{x=1}^{n}\dbinom{n}{x}\left[\dfrac{1-p}{\left(1-p\right)+p\mathrm{e}^{\xi}}\right]^{n-x}\left[\dfrac{p\mathrm{e}^{\xi}}{\left(1-p\right)+p\mathrm{e}^{\xi}}\right]^{x}\\ = & \left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n}\sum_{x=1}^{n}\dbinom{n}{x}\left[p^{*}\right]^{n-x}\left[1-p^{*}\right]^{x},p^{*}=\dfrac{1-p}{\left(1-p\right)+p\mathrm{e}^{\xi}}\\ = & \left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n}\sum_{x=1}^{n}f_{{\scriptscriptstyle X}}\left(x|n,p^{*}\right),X\sim\mathrm{b}\left(n,p^{*}\right)\\ = & \left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n}\cdot1=\left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n}\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \left[\left(1-p\right)+p\mathrm{e}^{\xi}\right]^{n} \end{aligned} \]

18.1.2.8.1.5 Poisson distribution

count = number of events

an unbounded discrete distribution we first see or met

\[ \begin{array}{c} X\sim\mathcal{P}\left(\lambda\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|\lambda\right)=\dfrac{\mathrm{e}^{-\lambda}\lambda^{x}}{x!}\\ x\in X\left(\Omega\right)=\left\{ 0,1,\cdots\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \lambda\\ \mathrm{V}\left[X\right]= & \lambda=\mathrm{E}\left[X\right]\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \exp\left[\lambda\left(\mathrm{e}^{\xi}-1\right)\right]=\mathrm{e}^{\lambda\left(\mathrm{e}^{\xi}-1\right)} \end{cases} \end{array} \]

the Poisson postulates

\[ \begin{array}{c} \begin{cases} N_{{\scriptscriptstyle t}} & \text{a r.v. denoting the of events in }\left[0,t\right]\\ N_{{\scriptscriptstyle 0}}=N_{{\scriptscriptstyle t=0}}=0 & \text{reset the count at the initial point}\\ \forall s<t\left[N_{{\scriptscriptstyle s}}\perp N_{{\scriptscriptstyle t}}-N_{{\scriptscriptstyle s}}\right] & \text{disjoint intervals independent}\\ N_{{\scriptscriptstyle s}}=N_{{\scriptscriptstyle t+s}}-N_{{\scriptscriptstyle t}} & \text{depends on length instead of initial point}\\ \lim\limits _{t\rightarrow0}\dfrac{\mathrm{P}\left(N_{{\scriptscriptstyle t}}=1\right)}{t}=\lambda & \Rightarrow\forall t\approx0\left[\mathrm{P}\left(N_{{\scriptscriptstyle t}}=1\right)\approx\lambda t\right]\\ \lim\limits _{t\rightarrow0}\dfrac{\mathrm{P}\left(N_{{\scriptscriptstyle t}}>1\right)}{t}=0 & \text{no coincidence for small }t \end{cases}\\ \text{solve the differential equations }\Downarrow\text{ with probability axioms}\\ f_{{\scriptscriptstyle X}}\left(x|\lambda t\right)=\mathrm{P}\left(N_{{\scriptscriptstyle t}}=x\right)=\dfrac{\mathrm{e}^{-\lambda t}\left(\lambda t\right)^{x}}{x!} \end{array} \]

18.1.2.8.1.6 negative binomial distribution

also an unbounded discrete distribution

Count the number of independent and identical Bernoulli trials until \(r\) a fixed number of success.

\[ \begin{array}{c} X\sim\mathcal{NB}\left(r,p\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|r,p\right)=\dbinom{x-1}{r-1}\left(1-p\right)^{x-r}p^{r}\\ x\in X\left(\Omega\right)=\left\{ r,r+1,\cdots\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]=\\ \mathrm{V}\left[X\right]=\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= \end{cases} \end{array} \]

\[ \begin{array}{c} Y=X-r,X\sim\mathcal{NB}\left(r,p\right)\Rightarrow X=Y+r\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle Y}}\left(y|r,p\right)=\dbinom{r+y-1}{r-1}\left(1-p\right)^{y}p^{r}\\ y\in Y\left(\Omega\right)=\left\{ 0,1,\cdots\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[Y\right]= & r\dfrac{1-p}{p}=r\left(\dfrac{1}{p}-1\right)\\ \mathrm{V}\left[Y\right]= & r\dfrac{1-p}{p^{2}}=r\left(\dfrac{1}{p^{2}}-\dfrac{1}{p}\right)\\ M_{{\scriptscriptstyle Y}}\left(\xi\right)= & \left[\dfrac{p}{1-\left(1-p\right)\mathrm{e}^{\xi}}\right]^{r} \end{cases} \end{array} \]

Note:

\[ \begin{aligned} Y=X-r\Rightarrow f_{{\scriptscriptstyle Y}}\left(y|r,p\right)= & \mathrm{P}\left(Y=y\right)\\ = & \mathrm{P}\left(X-r=y\right)\\ = & \mathrm{P}\left(X=r+y\right) \end{aligned} \]

reparameterization technique

\[ \begin{aligned} M_{{\scriptscriptstyle Y}}\left(\xi\right)= & \mathrm{E}\left[\mathrm{e}^{\xi Y}\right]=\sum_{y\in Y\left(\Omega\right)}\mathrm{e}^{\xi y}f_{{\scriptscriptstyle Y}}\left(y|r,p\right)\\ = & \sum_{y=0}^{\infty}\mathrm{e}^{\xi y}\dbinom{r+y-1}{r-1}\left(1-p\right)^{y}p^{r}\\ = & p^{r}\sum_{y=0}^{\infty}\dbinom{r+y-1}{r-1}\left[\left(1-p\right)\mathrm{e}^{\xi}\right]^{y}\\ = & p^{r}\sum_{y=0}^{\infty}\dbinom{r+y-1}{r-1}\left[1-p^{*}\right]^{y}\left[p^{*}\right]^{r}\dfrac{1}{\left[p^{*}\right]^{r}},1-p^{*}=\left(1-p\right)\mathrm{e}^{\xi}\\ = & \left[\dfrac{p}{p^{*}}\right]^{r}\sum_{y=0}^{\infty}f_{{\scriptscriptstyle Y}}\left(y|r,p^{*}\right),Y\sim\mathcal{NB}\left(r,p^{*}\right),p^{*}=1-\left(1-p\right)\mathrm{e}^{\xi}\\ = & \left[\dfrac{p}{p^{*}}\right]^{r}\cdot1=\left[\dfrac{p}{p^{*}}\right]^{r}=\left[\dfrac{p}{1-\left(1-p\right)\mathrm{e}^{\xi}}\right]^{r} \end{aligned} \]

Note:

For sufficiently small \(t\) such that

\[ 0\le p^* = \left(1-p\right)\mathrm{e}^{\xi} \le 1 \]

or else \(t \gg 1\)

\[ p^* = \left(1-p\right)\mathrm{e}^{\xi} > 1 \]

18.1.2.8.1.7 geometric distribution

\[ \begin{array}{c} X\sim\mathcal{NB}\left(r,p\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|r,p\right)=\dbinom{x-1}{r-1}\left(1-p\right)^{x-r}p^{r}\\ x\in X\left(\Omega\right)=\left\{ r,r+1,\cdots\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]=\\ \mathrm{V}\left[X\right]=\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= \end{cases} \end{array} \]

\(r=1\),

\[ \begin{array}{c} X\sim\mathcal{NB}\left(r=1,p\right)=\mathcal{G}\left(p\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|r=1,p\right)=\left[\dbinom{x-1}{r-1}\left(1-p\right)^{x-r}p^{r}\right]_{{\scriptscriptstyle r=1}}=\dbinom{x-1}{1-1}\left(1-p\right)^{x-1}p^{1}=\left(1-p\right)^{x-1}p\\ x\in X\left(\Omega\right)=\left\{ r,r+1,\cdots\right\} _{{\scriptscriptstyle r=1}}=\left\{ 1,2,\cdots\right\} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]=\\ \mathrm{V}\left[X\right]=\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= \end{cases} \end{array} \]

the only “memoryless” discrete distribution, and there is also a “memoryless” continuous distribution.

memoryless property

\[ \forall s>t\left[\mathrm{P}\left(X>s|X>t\right)=\mathrm{P}\left(X>s-t\right)\right] \]

Survival depends on length instead of initial point; it might be proper assumption for stuff survival, but might not be proper for human suvival.

18.1.2.8.2 continuous distribution
18.1.2.8.2.1 uniform distribution = continuous uniform distribution

\[ \begin{array}{c} X\sim\mathcal{U}\left(a,b\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|a,b\right)=\dfrac{1\left(a\le x\le b\right)}{b-a}=\dfrac{1\left(x\in\left[a,b\right]\right)}{b-a}\\ x\in X\left(\Omega\right)=\left[a,b\right] \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \dfrac{a+b}{2}\\ \mathrm{V}\left[X\right]= & \dfrac{\left(b-a\right)^{2}}{12}\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \dfrac{\mathrm{e}^{b\xi}-\mathrm{e}^{a\xi}}{\left(b-a\right)\xi} \end{cases} \end{array} \]

\(a=0,b=1\)

\[ \begin{array}{c} X\sim\mathcal{U}\left(a=0,b=1\right)=\mathcal{U}\left(0,1\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|a=0,b=1\right)=\left[\dfrac{1\left(a\le x\le b\right)}{b-a}\right]_{{\scriptscriptstyle a=0,b=1}}=1\left(0\le x\le1\right)=1\left(x\in\left[0,1\right]\right)\\ x\in X\left(\Omega\right)=\left[a,b\right]|_{{\scriptscriptstyle a=0,b=1}}=\left[0,1\right] \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \dfrac{1}{2}\\ \mathrm{V}\left[X\right]= & \dfrac{1}{12}\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \dfrac{\mathrm{e}^{\xi}-1}{\xi} \end{cases} \end{array} \]

18.1.2.8.2.2 gamma distribution
18.1.2.8.2.3 exponential distribution
18.1.2.8.2.4 Chi-square distribution
18.1.2.8.2.5 Weibull distribution

https://www.youtube.com/watch?v=ojZb6nZWdvI

18.1.2.8.2.6 normal distribution

\[ \begin{array}{c} X\sim\mathcal{N}\left(\mu,\sigma^{2}\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma^{2}\right)=\dfrac{1}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}=\frac{\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}}{\sigma\sqrt{2\pi}}=\dfrac{\mathrm{e}^{-\left(x-\mu\right)^{2}/\left(2\sigma^{2}\right)}}{\sigma\sqrt{2\pi}}=\dfrac{\mathrm{exp}\left\{ \frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right\} }{\sigma\sqrt{2\pi}}\\ x\in X\left(\Omega\right)=\mathbb{R} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \mu\\ \mathrm{V}\left[X\right]= & \sigma^{2}\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \mathrm{e}^{\left(\mu\xi+\frac{\sigma^{2}}{2}\xi^{2}\right)}=\mathrm{e}^{\mu\xi+\frac{\sigma^{2}}{2}\xi^{2}} \end{cases} \end{array} \]

18.1.2.8.2.7 beta distribution

random success probability

18.1.2.8.2.8 Cauchy distribution

https://www.youtube.com/watch?v=ojZb6nZWdvI&t=33m34s

https://en.wikipedia.org/wiki/Cauchy_distribution

\[ \begin{array}{c} X\sim\mathcal{C}\left(\theta,\sigma\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|\theta,\sigma\right)=\dfrac{1}{\pi\sigma}\left[1+\left(\dfrac{x-\theta}{\sigma}\right)^{2}\right]^{-1}=\dfrac{1}{\pi\sigma\left[1+\left(\dfrac{x-\theta}{\sigma}\right)^{2}\right]}=\dfrac{1}{\pi}\left[\dfrac{\sigma}{\left(x-\theta\right)^{2}+\sigma^{2}}\right]\\ x\in X\left(\Omega\right)=\mathbb{R} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right] & \text{diverges}\\ \mathrm{V}\left[X\right] & \text{diverges}\\ M_{{\scriptscriptstyle X}}\left(\xi\right) & \text{diverges} \end{cases} \end{array} \]

not completed: Cauchy distribution vs. normal distribution

Fig. 18.1: not completed: Cauchy distribution vs. normal distribution

heavy tail

\(t\) distribution is also heavy tail

box plot

18.1.2.8.2.9 log-normal distribution

18.1.2.9 exponential family

Definition 18.18 exponential family: A family of PDF/PMF is called exponential family if

\[ f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right) \]

with \(\boldsymbol{\theta}=\boldsymbol{\theta}\left(\theta_{{\scriptscriptstyle 1}},\cdots,\theta_{{\scriptscriptstyle k}}\right)=\left(\theta_{{\scriptscriptstyle 1}},\cdots,\theta_{{\scriptscriptstyle k}}\right)\) for some \(h\left(x\right),c\left(\boldsymbol{\theta}\right),w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right),t_{{\scriptscriptstyle j}}\left(x\right)\), where

\[ h\left(x\right)c\left(\boldsymbol{\theta}\right)\ge0\Rightarrow f\left(x|\theta\right)\ge0 \]

and parameters \(\boldsymbol{\theta}\) and statistic or real number \(x\) can be separated.

\[ \mathcal{E}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \]

18.1.2.9.1 binomial distribution with known \(n\)

\[ f\left(x|p\right)=\dbinom{n}{x}\left(1-p\right)^{n-x}p^{x} \]

or

\[ f\left(x|p\right)=\dbinom{n}{x}\left(1-p\right)^{n-x}p^{x}=f\left(x|n=n,p\right) \]


not

\[ f\left(x|n,p\right)=\dbinom{n}{x}\left(1-p\right)^{n-x}p^{x} \]


\[ \begin{aligned} f\left(x|p\right)= & \dbinom{n}{x}\left(1-p\right)^{n-x}p^{x}\\ = & \dbinom{n}{x}\left(1-p\right)^{n}\left(\dfrac{p}{1-p}\right)^{x}\\ = & \dbinom{n}{x}\left(1-p\right)^{n}\mathrm{e}^{\left(\ln\frac{p}{1-p}\right)x}\\ = & h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}},\begin{cases} h\left(x\right)= & \dbinom{n}{x}\\ c\left(\boldsymbol{\theta}\right)=c\left(\theta_{{\scriptscriptstyle 1}}\right)=c\left(p\right)= & \left(1-p\right)^{n}\\ w_{{\scriptscriptstyle 1}}\left(\boldsymbol{\theta}\right)=w\left(\theta_{{\scriptscriptstyle 1}}\right)=w\left(p\right)= & \ln\frac{p}{1-p}\\ t_{{\scriptscriptstyle 1}}\left(x\right)= & x\\ k= & 1 \end{cases}\\ \Downarrow\\ f\left(x|p\right)=\dbinom{n}{x}\left(1-p\right)^{n-x}p^{x}\in & \mathcal{E}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \end{aligned} \]

why known \(n\)?

known \(n\)

\[ \dbinom{n}{x}=h\left(x\right) \]

unknown \(n\)

\[ \dbinom{n}{x}=h\left(x,n\right)\ne h_{{\scriptscriptstyle 1}}\left(n\right)h_{{\scriptscriptstyle 2}}\left(x\right) \]

18.1.2.9.2 continuous uniform distribution not in exponential family

\[ X\sim\mathcal{U}\left(a,b\right) \]

\[ \begin{aligned} f_{{\scriptscriptstyle X}}\left(x|a,b\right)= & \dfrac{1\left(x\in\left[a,b\right]\right)}{b-a}\\ = & \dfrac{1}{b-a},x\in\left[a,b\right] \end{aligned} \]

\[ \dfrac{1}{b-a}=h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}},\begin{cases} h\left(x\right)= & 1\\ c\left(\boldsymbol{\theta}\right)=c\left(\theta_{{\scriptscriptstyle 1}},\theta_{{\scriptscriptstyle 2}}\right)=c\left(a,b\right)= & \dfrac{1}{b-a}\\ w_{{\scriptscriptstyle 1}}\left(\theta_{{\scriptscriptstyle 1}}\right)=w_{{\scriptscriptstyle 2}}\left(\theta_{{\scriptscriptstyle 2}}\right)= & 0\\ t_{{\scriptscriptstyle 1}}\left(x\right)=t_{{\scriptscriptstyle 2}}\left(x\right)= & x\\ k= & 2 \end{cases} \]

however,

\[ 1\left(x\in\left[a,b\right]\right)\ne h\left(x\right)c\left(\boldsymbol{\theta}\right)=h\left(x\right)c\left(a,b\right) \]

thus

\[ f_{{\scriptscriptstyle X}}\left(x|a,b\right)=\dfrac{1\left(x\in\left[a,b\right]\right)}{b-a}\notin\mathcal{E}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \]

18.1.2.9.3 normal distribution is in exponential family

\[ \begin{array}{c} X\sim\mathcal{N}\left(\mu,\sigma^{2}\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma^{2}\right)=\dfrac{1}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}=\frac{\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}}{\sigma\sqrt{2\pi}}=\dfrac{\mathrm{e}^{-\left(x-\mu\right)^{2}/\left(2\sigma^{2}\right)}}{\sigma\sqrt{2\pi}}=\dfrac{\mathrm{exp}\left\{ \frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}\right\} }{\sigma\sqrt{2\pi}}\\ x\in X\left(\Omega\right)=\mathbb{R} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \mu\\ \mathrm{V}\left[X\right]= & \sigma^{2}\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \mathrm{e}^{\left(\mu\xi+\frac{\sigma^{2}}{2}\xi^{2}\right)}=\mathrm{e}^{\mu\xi+\frac{\sigma^{2}}{2}\xi^{2}} \end{cases} \end{array} \]


\[ \begin{aligned} f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma^{2}\right)=f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma\right)= & \dfrac{1}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}\\ = & \dfrac{1}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2\sigma^{2}}\left(x^{2}-2x\mu+\mu^{2}\right)}=\dfrac{\mathrm{e}^{\frac{-1}{2}\left(\frac{\mu}{\sigma}\right)^{2}}}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2\sigma^{2}}x^{2}+\frac{\mu}{\sigma^{2}}x}=\dfrac{\mathrm{e}^{\frac{-1}{2}\left(\frac{\mu}{\sigma}\right)^{2}}}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{\mu}{\sigma^{2}}x-\frac{1}{2\sigma^{2}}x^{2}}\\ = & h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}},\begin{cases} h\left(x\right)= & 1\\ c\left(\boldsymbol{\theta}\right)=c\left(\theta_{{\scriptscriptstyle 1}},\theta_{{\scriptscriptstyle 2}}\right)=c\left(\mu,\sigma\right)= & \dfrac{\mathrm{e}^{\frac{-1}{2}\left(\frac{\mu}{\sigma}\right)^{2}}}{\sigma\sqrt{2\pi}}\\ w_{{\scriptscriptstyle 1}}\left(\theta_{{\scriptscriptstyle 1}},\theta_{{\scriptscriptstyle 2}}\right)=w_{{\scriptscriptstyle 1}}\left(\mu,\sigma\right)= & \frac{\mu}{\sigma^{2}}\\ w_{{\scriptscriptstyle 2}}\left(\theta_{{\scriptscriptstyle 2}}\right)=w_{{\scriptscriptstyle 2}}\left(\sigma\right)= & \frac{-1}{2\sigma^{2}}\\ t_{{\scriptscriptstyle 1}}\left(x\right)= & x\\ t_{{\scriptscriptstyle 2}}\left(x\right)= & x^{2}\\ k= & 2 \end{cases}\\ \Downarrow\\ f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma\right)=\dfrac{1}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}\in & \mathcal{E}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \end{aligned} \]

18.1.2.9.4 normal distribution with unknown equal mean and standard deviation in curved exponential family

\[ \begin{array}{c} X\sim\mathcal{N}\left(\mu,\sigma^{2}=\mu^{2}\right)=\mathcal{N}\left(\mu,\mu^{2}\right)\\ \Updownarrow\\ \begin{cases} f_{{\scriptscriptstyle X}}\left(x|\mu,\mu^{2}\right)=f_{{\scriptscriptstyle X}}\left(x|\mu\right)=\left[\dfrac{1}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}\right]_{\sigma=\mu}=\dfrac{1}{\mu\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\mu}\right)^{2}}\\ x\in X\left(\Omega\right)=\mathbb{R} \end{cases}\\ \Downarrow\\ \begin{cases} \mathrm{E}\left[X\right]= & \mu\\ \mathrm{V}\left[X\right]= & \mu^{2}\\ M_{{\scriptscriptstyle X}}\left(\xi\right)= & \mathrm{e}^{\left(\mu\xi+\frac{\mu^{2}}{2}\xi^{2}\right)}=\mathrm{e}^{\mu\xi+\frac{\mu^{2}}{2}\xi^{2}} \end{cases} \end{array} \]


\[ \begin{aligned} f_{{\scriptscriptstyle X}}\left(x|\mu,\mu^{2}\right)=f_{{\scriptscriptstyle X}}\left(x|\mu\right)= & \dfrac{1}{\mu\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\mu}\right)^{2}}\\ = & \dfrac{1}{\mu\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2\mu^{2}}\left(x^{2}-2x\mu+\mu^{2}\right)}=\dfrac{\mathrm{e}^{\frac{-1}{2}}}{\mu\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2\mu^{2}}x^{2}+\frac{\mu}{\mu^{2}}x}=\dfrac{\mathrm{e}^{\frac{-1}{2}}}{\mu\sqrt{2\pi}}\mathrm{e}^{\frac{1}{\mu}x-\frac{1}{2\mu^{2}}x^{2}}\\ = & h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}},\begin{cases} h\left(x\right)= & 1\\ c\left(\boldsymbol{\theta}\right)=c\left(\theta_{{\scriptscriptstyle 1}}\right)=c\left(\mu\right)= & \dfrac{\mathrm{e}^{\frac{-1}{2}}}{\mu\sqrt{2\pi}}\\ w_{{\scriptscriptstyle 1}}\left(\theta_{{\scriptscriptstyle 1}}\right)=w_{{\scriptscriptstyle 1}}\left(\mu\right)= & \frac{1}{\mu}\\ w_{{\scriptscriptstyle 2}}\left(\theta_{{\scriptscriptstyle 1}}\right)=w_{{\scriptscriptstyle 2}}\left(\mu\right)= & \frac{-1}{2\mu^{2}}\\ t_{{\scriptscriptstyle 1}}\left(x\right)= & x\\ t_{{\scriptscriptstyle 2}}\left(x\right)= & x^{2}\\ k= & 2>1=p \end{cases}\\ \Downarrow & \begin{cases} p=\dim\boldsymbol{\theta}\\ k=\dim\boldsymbol{w} \end{cases}\\ f_{{\scriptscriptstyle X}}\left(x|\mu,\sigma\right)=\dfrac{1}{\sigma\sqrt{2\pi}}\mathrm{e}^{\frac{-1}{2}\left(\frac{x-\mu}{\sigma}\right)^{2}}\in & \mathcal{C}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k>p}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \\ \subset & \mathcal{E}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \end{aligned} \]


curved exponential family

\[ \mathcal{C}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k>p}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \]


exponential family

\[ \mathcal{E}^{f}=\left\{ f\middle|f=f\left(x|\boldsymbol{\theta}\right)=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \]


https://tex.stackexchange.com/questions/145969/filling-specified-area-by-random-dots-in-tikz

curved exponential family vs. exponential family

Fig. 18.2: curved exponential family vs. exponential family

18.1.2.9.5 properties of exponential family
18.1.2.9.5.1 fundamentals of statistical inference

Lemma 18.1 Leibnitz rule

\[ \begin{aligned} \dfrac{\mathrm{d}}{\mathrm{d}\theta}\int_{a\left(\theta\right)}^{b\left(\theta\right)}g\left(x,\theta\right)\mathrm{d}x= & \dfrac{\mathrm{d}\int_{a\left(\theta\right)}^{b\left(\theta\right)}g\left(x,\theta\right)\mathrm{d}x}{\mathrm{d}\theta}\\ = & g\left(x,b\left(\theta\right)\right)\dfrac{\mathrm{d}b\left(\theta\right)}{\mathrm{d}\theta}-g\left(x,a\left(\theta\right)\right)\dfrac{\mathrm{d}a\left(\theta\right)}{\mathrm{d}\theta}+\int_{a\left(\theta\right)}^{b\left(\theta\right)}\dfrac{\mathrm{d}}{\mathrm{d}\theta}g\left(x,\theta\right)\mathrm{d}x\\ = & g\left(x,b\left(\theta\right)\right)\dfrac{\mathrm{d}b\left(\theta\right)}{\mathrm{d}\theta}-g\left(x,a\left(\theta\right)\right)\dfrac{\mathrm{d}a\left(\theta\right)}{\mathrm{d}\theta}+\int_{a\left(\theta\right)}^{b\left(\theta\right)}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x \end{aligned} \]

\[ \dfrac{\mathrm{d}}{\mathrm{d}\theta}\int_{a\left(\theta\right)}^{b\left(\theta\right)}g\left(x,\theta\right)\mathrm{d}x=g\left(x,b\left(\theta\right)\right)\dfrac{\mathrm{d}b\left(\theta\right)}{\mathrm{d}\theta}-g\left(x,a\left(\theta\right)\right)\dfrac{\mathrm{d}a\left(\theta\right)}{\mathrm{d}\theta}+\int_{a\left(\theta\right)}^{b\left(\theta\right)}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x \]


\[ \begin{aligned} \dfrac{\mathrm{d}}{\mathrm{d}\theta}\int_{a\left(\theta\right)=a}^{b\left(\theta\right)=b}g\left(x,\theta\right)\mathrm{d}x= & g\left(x,b\left(\theta\right)\right)\dfrac{\mathrm{d}b\left(\theta\right)}{\mathrm{d}\theta}-g\left(x,a\left(\theta\right)\right)\dfrac{\mathrm{d}a\left(\theta\right)}{\mathrm{d}\theta}+\int_{a\left(\theta\right)}^{b\left(\theta\right)}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x\\ = & g\left(x,b\right)\dfrac{\mathrm{d}b}{\mathrm{d}\theta}-g\left(x,a\right)\dfrac{\mathrm{d}a}{\mathrm{d}\theta}+\int_{a}^{b}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x\\ = & g\left(x,b\right)0-g\left(x,a\right)0+\int_{a}^{b}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x=0+0+\int_{a}^{b}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x\\ = & \int_{a}^{b}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x\\ \dfrac{\mathrm{d}}{\mathrm{d}\theta}\int_{X\left(\Omega\right)\perp\theta}g\left(x,\theta\right)\mathrm{d}x= & \dfrac{\mathrm{d}}{\mathrm{d}\theta}\int_{a\left(\theta\right)=a}^{b\left(\theta\right)=b}g\left(x,\theta\right)\mathrm{d}x=\int_{a}^{b}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x=\int_{X\left(\Omega\right)\perp\theta}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x\\ \dfrac{\mathrm{d}}{\mathrm{d}\theta}\int_{X\left(\Omega\right)\perp\theta}g\left(x,\theta\right)\mathrm{d}x= & \int_{X\left(\Omega\right)\perp\theta}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x \end{aligned} \]

\[ \begin{aligned} \dfrac{\mathrm{d}}{\mathrm{d}\theta}\int_{X\left(\Omega\right)\perp\theta}g\left(x,\theta\right)\mathrm{d}x=&\int_{X\left(\Omega\right)\perp\theta}\dfrac{\mathrm{d}g\left(x,\theta\right)}{\mathrm{d}\theta}\mathrm{d}x \end{aligned} \]


point estimation

Lemma 18.2 parameter-independent expectation:

Assume the domain of \(X\) independent of \(\theta\), then

\[ \mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=0,\forall i=1,\cdots,p \]

Proof:

\[ \begin{aligned} \mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]= & \int\dfrac{\partial\ln f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \int\left[\dfrac{\partial\ln f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \int\left[\dfrac{1}{f\left(x|\boldsymbol{\theta}\right)}\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x=\int\dfrac{1}{f\left(x|\boldsymbol{\theta}\right)}\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \int\dfrac{f\left(x|\boldsymbol{\theta}\right)}{f\left(x|\boldsymbol{\theta}\right)}\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\mathrm{d}x=\int1\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\mathrm{d}x=\int\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\mathrm{d}x\\ \overset{\begin{cases} \text{Leibnitz rule}\\ x\perp\boldsymbol{\theta} \end{cases}}{=} & \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\int f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x \end{aligned} \]

\[ \tag*{$\Box$} \]

Note:

\[ \begin{aligned} \mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\int\dfrac{\partial\ln f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x= & \int\left(\frac{\frac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f\left(x|\boldsymbol{\theta}\right)}\right)f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ X\sim f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)\Rightarrow\mathrm{E}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\int\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)\mathrm{d}x= & \int\left(\frac{\frac{\partial f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)}\right)f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \int\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\mathrm{d}x\\ X\sim f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)\Rightarrow\mathrm{E}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\int\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)\mathrm{d}x= & \int\left(\frac{\frac{\partial f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}\right)f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \int\dfrac{\partial f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\left(\dfrac{f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)}{f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}\right)\mathrm{d}x \end{aligned} \]


\[ \mathrm{E}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\mathrm{E}_{{\scriptscriptstyle \boldsymbol{\theta}^{*}}}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\int\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)\mathrm{d}x \]

\[ \mathrm{E}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\mathrm{E}_{{\scriptscriptstyle \boldsymbol{\theta}}}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\int\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)\mathrm{d}x \]

\[ \mathrm{E}_{{\scriptscriptstyle \boldsymbol{\theta}}}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\int\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f_{{\scriptscriptstyle X}}\left(x|\boldsymbol{\theta}\right)\mathrm{d}x \]

\[ \mathrm{E}_{{\scriptscriptstyle \boldsymbol{\theta}}}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]\not\equiv0 \]

\[ \mathrm{E}_{{\scriptscriptstyle \boldsymbol{\theta}}}\left[\dfrac{\partial\ln f_{{\scriptscriptstyle X}}\left(X|\boldsymbol{\theta}^{*}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]\overset{\boldsymbol{\theta}^{*}=\boldsymbol{\theta}}{=}0 \]

as the fundamental to estimate parameters.

interval estimation

Lemma 18.3 parameter-independent variance:

Assume the domain of \(X\) independent of \(\theta\), then

\[ \mathrm{E}\left[\dfrac{\partial^{2}\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\right]=-\mathrm{E}\left[\left(\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)^{2}\right],\forall i=1,\cdots,p \]

Proof:

\[ \begin{aligned} & \mathrm{E}\left[\dfrac{\partial^{2}\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\right]=\int\dfrac{\partial^{2}\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x=\int\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\dfrac{\partial\ln f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \int\left\{ \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left(\dfrac{1}{f\left(x|\boldsymbol{\theta}\right)}\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)\right\} f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x=\int\left\{ \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left(\frac{\frac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f\left(x|\boldsymbol{\theta}\right)}\right)\right\} f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \int\dfrac{\dfrac{\partial^{2}f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}f\left(x|\boldsymbol{\theta}\right)-\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{\left[f\left(x|\boldsymbol{\theta}\right)\right]^{2}}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x=\int\dfrac{\dfrac{\partial^{2}f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\left[f\left(x|\boldsymbol{\theta}\right)\right]^{2}-\left[\dfrac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]^{2}\left[f\left(x|\boldsymbol{\theta}\right)\right]}{\left[f\left(x|\boldsymbol{\theta}\right)\right]^{2}}\mathrm{d}x\\ = & \int\dfrac{\partial^{2}f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\mathrm{d}x-\int\left(\frac{\frac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f\left(x|\boldsymbol{\theta}\right)}\right)^{2}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\overset{\begin{cases} \text{Leibnitz rule}\\ X\left(\Omega\right)\perp\boldsymbol{\theta} \end{cases}}{=}\dfrac{\partial^{2}}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\int f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x-\int\left(\frac{\frac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f\left(x|\boldsymbol{\theta}\right)}\right)^{2}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & \dfrac{\partial^{2}}{\partial\theta_{{\scriptscriptstyle i}}^{2}}1-\int\left(\frac{\frac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f\left(x|\boldsymbol{\theta}\right)}\right)^{2}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x=0-\int\left(\frac{\frac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f\left(x|\boldsymbol{\theta}\right)}\right)^{2}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x=-\int\left(\frac{\frac{\partial f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}}{f\left(x|\boldsymbol{\theta}\right)}\right)^{2}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x\\ = & -\int\left(\dfrac{\partial\ln f\left(x|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)^{2}f\left(x|\boldsymbol{\theta}\right)\mathrm{d}x=-\mathrm{E}\left[\left(\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)^{2}\right] \end{aligned} \]

\[ \tag*{$\Box$} \]


\[ \begin{cases} \mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]= & 0\\ \mathrm{E}\left[\dfrac{\partial^{2}\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\right]= & -\mathrm{E}\left[\left(\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)^{2}\right] \end{cases},\forall i=1,\cdots,p \]

18.1.2.9.5.1.1 exponential family expectation and variance

Theorem 18.15 exponential family expectation

\[ \begin{array}{c} X\in\mathcal{E}^{f}=\left\{ f\middle|f=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \\ \Downarrow\forall i=1,\cdots,p\\ \mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]=\dfrac{-\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}} \end{array} \]

Proof:

\[ \begin{aligned} f\left(x|\boldsymbol{\theta}\right)= & h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\\ \ln f\left(x|\boldsymbol{\theta}\right)= & \ln h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}\\ = & \ln h\left(x\right)+\ln c\left(\boldsymbol{\theta}\right)+\ln\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}\\ = & \ln h\left(x\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\\ \ln f\left(x|\boldsymbol{\theta}\right)= & \ln h\left(x\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\\ \ln f\left(X|\boldsymbol{\theta}\right)= & \ln h\left(X\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\\ \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln f\left(X|\boldsymbol{\theta}\right)= & \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left[\ln h\left(X\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ = & \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln h\left(X\right)+\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln c\left(\boldsymbol{\theta}\right)+\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\\ = & 0+\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left\{ w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\right\} \\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\right)t_{{\scriptscriptstyle j}}\left(X\right)+w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right)\right\} \\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\right)t_{{\scriptscriptstyle j}}\left(X\right)+w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)0\right\} \\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\right)t_{{\scriptscriptstyle j}}\left(X\right)\right\} \\ \dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}= & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right\} \\ \end{aligned} \]

\[ \begin{aligned} \dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}= & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right\} \\ 0\overset{\text{lemma: }\mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=0}{=}\mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]= & \mathrm{E}\left[\dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right\} \right]\\ = & \mathrm{E}\left[\dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]+\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ \overset{\mathrm{E}\left[c\right]=c}{=} & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\mathrm{E}\left[\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\mathrm{E}\left[t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ 0= & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ \mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]= & \dfrac{-\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}} \end{aligned} \]

\[ \tag*{$\Box$} \]

Theorem 18.16 exponential family variance

\[ \begin{array}{c} X\in\mathcal{E}^{f}=\left\{ f\middle|f=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \\ \Downarrow\forall i=1,\cdots,p\\ \mathrm{V}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]=\dfrac{-\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}-\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial^{2}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}t_{{\scriptscriptstyle j}}\left(X\right)\right] \end{array} \]

Proof:

same as the above

\[ \begin{aligned} f\left(x|\boldsymbol{\theta}\right)= & h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\\ \ln f\left(x|\boldsymbol{\theta}\right)= & \ln h\left(x\right)c\left(\boldsymbol{\theta}\right)\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}\\ = & \ln h\left(x\right)+\ln c\left(\boldsymbol{\theta}\right)+\ln\mathrm{e}^{{\scriptscriptstyle \sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)}}\\ = & \ln h\left(x\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\\ \ln f\left(x|\boldsymbol{\theta}\right)= & \ln h\left(x\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\\ \ln f\left(X|\boldsymbol{\theta}\right)= & \ln h\left(X\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\\ \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln f\left(X|\boldsymbol{\theta}\right)= & \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left[\ln h\left(X\right)+\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ = & \dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln h\left(X\right)+\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln c\left(\boldsymbol{\theta}\right)+\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\\ = & 0+\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\ln c\left(\boldsymbol{\theta}\right)+\sum\limits _{j=1}^{k}\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left\{ w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(X\right)\right\} \\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\right)t_{{\scriptscriptstyle j}}\left(X\right)+w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right)\right\} \\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\right)t_{{\scriptscriptstyle j}}\left(X\right)+w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)0\right\} \\ = & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)\right)t_{{\scriptscriptstyle j}}\left(X\right)\right\} \\ \dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}= & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\left\{ \dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right\} \\ \end{aligned} \]

\[ \begin{aligned} \dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}= & \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\\ \mathrm{V}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]= & \mathrm{V}\left[\dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ \overset{\mathrm{V}\left[aX+b\right]=a^{2}\mathrm{V}\left[X\right]}{=} & \mathrm{V}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ \mathrm{V}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]= & \mathrm{V}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right] \end{aligned} \]

\[ \begin{aligned} \mathrm{V}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]= & \mathrm{V}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=\mathrm{E}\left[\left(\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}-\mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]\right)^{2}\right]\\ \overset{\text{lemma: }\mathrm{E}\left[\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]=0}{=} & \mathrm{E}\left[\left(\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}-0\right)^{2}\right]=\mathrm{E}\left[\left(\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)^{2}\right]\\ \overset{\text{lemma: }\mathrm{E}\left[\dfrac{\partial^{2}\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\right]=-\mathrm{E}\left[\left(\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)^{2}\right]}{=} & -\mathrm{E}\left[\dfrac{\partial^{2}\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}\right]=-\mathrm{E}\left[\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right]\\ \overset{\dfrac{\partial\ln f\left(X|\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}=\dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)}{=} & -\mathrm{E}\left[\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left\{ \dfrac{\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}+\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right\} \right]\\ = & -\mathrm{E}\left[\dfrac{\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}+\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ = & \dfrac{-\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}-\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\left\{ \dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right\} \right]\\ = & \dfrac{-\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}-\mathrm{E}\left[\sum\limits _{j=1}^{k}\left\{ \left(\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\right)t_{{\scriptscriptstyle j}}\left(X\right)+\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\dfrac{\partial}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right\} \right]\\ = & \dfrac{-\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}-\mathrm{E}\left[\sum\limits _{j=1}^{k}\left\{ \dfrac{\partial^{2}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}t_{{\scriptscriptstyle j}}\left(X\right)+\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}0\right\} \right]\\ = & \dfrac{-\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}-\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial^{2}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}t_{{\scriptscriptstyle j}}\left(X\right)\right]\\ \mathrm{V}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]= & \dfrac{-\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}-\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial^{2}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}t_{{\scriptscriptstyle j}}\left(X\right)\right] \end{aligned} \]

\[ \tag*{$\Box$} \]


exponential family expectation and variance

sense of downgrading by differentiation instead of integration

\[ \begin{aligned} & X\in\mathcal{E}^{f}=\left\{ f\middle|f=h\left(x\right)c\left(\boldsymbol{\theta}\right)\exp\left(\sum\limits _{j=1}^{k}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)t_{{\scriptscriptstyle j}}\left(x\right)\right)\right\} \\ \Rightarrow & \begin{cases} \mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]= & \dfrac{-\partial\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}\\ \mathrm{V}\left[\sum\limits _{j=1}^{k}\dfrac{\partial w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}}t_{{\scriptscriptstyle j}}\left(X\right)\right]= & \dfrac{-\partial^{2}\ln c\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}-\mathrm{E}\left[\sum\limits _{j=1}^{k}\dfrac{\partial^{2}w_{{\scriptscriptstyle j}}\left(\boldsymbol{\theta}\right)}{\partial\theta_{{\scriptscriptstyle i}}^{2}}t_{{\scriptscriptstyle j}}\left(X\right)\right] \end{cases},\forall i=1,\cdots,p \end{aligned} \]

18.1.3 multivariable distribution

https://www.youtube.com/watch?v=y-Oi5voWQKo

univariable random vector

discrete: PMF equals probability function

\[ f_{{\scriptscriptstyle X}}\left(x\right)=\mathrm{P}\left(X=x\right) \]

continuous: PDF equals probability intensity

\[ f_{{\scriptscriptstyle X}}\left(x\right)\mathrm{d}x=\mathrm{d}\mathrm{P}\left(X\le x\right) \]

Given \(\left(X,Y\right)=\left(X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right)=\left\langle X,Y\right\rangle =\left\langle X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right\rangle \sim f_{{\scriptscriptstyle XY}}=f_{{\scriptscriptstyle XY}}\left(x,y\right)=f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}=f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right)\)

discrete:

Definition 18.19 JPMF = joint probability mass function: the JPMF of \(\left(X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right)=\left\langle X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right\rangle\) is

\[ f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}=f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right)=\mathrm{P}\left(X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}=x_{{\scriptscriptstyle 2}}\right) \]

joint vs. marginal

Theorem 18.17 joint probability mass function can inference marginal probability mass function:

The marginal PMF of \(X_{{\scriptscriptstyle 1}}\), \(f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\mathrm{P}\left(X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}}\right)\) is given by

\[ f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\sum_{x_{{\scriptscriptstyle 2}}\in X_{{\scriptscriptstyle 2}}\left(\Omega\right)}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right)=\sum_{x_{{\scriptscriptstyle 2}}\in X_{{\scriptscriptstyle 2}}\left(\Omega\right)}\mathrm{P}\left(X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}=x_{{\scriptscriptstyle 2}}\right) \]

Proof:

\[ \begin{aligned} f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)= & \mathrm{P}\left(X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}}\right)\\ = & \mathrm{P}\left(\left\{ X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}}\right\} \cap\left\{ X_{{\scriptscriptstyle 2}}\in\left(-\infty,\infty\right)\right\} \right)\\ = & \mathrm{P}\left(\bigcup_{x_{{\scriptscriptstyle 2}}\in\left(-\infty,\infty\right)}\left\{ X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}}\wedge X_{{\scriptscriptstyle 2}}=x_{{\scriptscriptstyle 2}}\right\} \right)\\ = & \bigcup_{x_{{\scriptscriptstyle 2}}\in\left(-\infty,\infty\right)}\mathrm{P}\left(X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}}\wedge X_{{\scriptscriptstyle 2}}=x_{{\scriptscriptstyle 2}}\right)=\bigcup_{x_{{\scriptscriptstyle 2}}\in X_{{\scriptscriptstyle 2}}\left(\Omega\right)}\mathrm{P}\left(X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}=x_{{\scriptscriptstyle 2}}\right)\\ = & \bigcup_{x_{{\scriptscriptstyle 2}}\in X_{{\scriptscriptstyle 2}}\left(\Omega\right)}\mathrm{P}\left(X_{{\scriptscriptstyle 1}}=x_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}=x_{{\scriptscriptstyle 2}}\right)=\sum_{x_{{\scriptscriptstyle 2}}\in X_{{\scriptscriptstyle 2}}\left(\Omega\right)}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right) \end{aligned} \]

\[ \tag*{$\Box$} \]

continuous:

Definition 18.20 JCDF = joint cumulative distribution function: the JCDF of \(\left(X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right)=\left\langle X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right\rangle\) is

\[ F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}=F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right)=\mathrm{P}\left(X_{{\scriptscriptstyle 1}}\le x_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\le x_{{\scriptscriptstyle 2}}\right) \]

Definition 18.21 JPDF = joint probability density function: the JPDF of \(\left(X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right)=\left\langle X_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\right\rangle\) is

\[ f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}=f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right)=\dfrac{\partial^{2}}{\partial x_{{\scriptscriptstyle 1}}\partial x_{{\scriptscriptstyle 2}}}F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right) \]

joint vs. marginal

Theorem 18.18 joint cumulative distribution function can inference marginal cumulative distribution function:

The marginal CDF of \(X_{{\scriptscriptstyle 1}}\) is

\[ F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},\infty\right) \]

Proof:

\[ \begin{aligned} F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)= & \mathrm{P}\left(X_{{\scriptscriptstyle 1}}\le x_{{\scriptscriptstyle 1}}\right)\\ = & \mathrm{P}\left(X_{{\scriptscriptstyle 1}}\le x_{{\scriptscriptstyle 1}}\wedge X_{{\scriptscriptstyle 2}}\le\infty\right)\\ = & \mathrm{P}\left(X_{{\scriptscriptstyle 1}}\le x_{{\scriptscriptstyle 1}},X_{{\scriptscriptstyle 2}}\le\infty\right)\\ = & F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},\infty\right) \end{aligned} \]

\[ \tag*{$\Box$} \]

Note:

\(F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)\)

\[ F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},\infty\right)=\int_{-\infty}^{\infty}\int_{-\infty}^{x_{{\scriptscriptstyle 1}}}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(u_{{\scriptscriptstyle 1}},u_{{\scriptscriptstyle 2}}\right)\mathrm{d}u_{{\scriptscriptstyle 1}}\mathrm{d}u_{{\scriptscriptstyle 2}} \]


\[ F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\int_{-\infty}^{\infty}\int_{-\infty}^{x_{{\scriptscriptstyle 1}}}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(u_{{\scriptscriptstyle 1}},u_{{\scriptscriptstyle 2}}\right)\mathrm{d}u_{{\scriptscriptstyle 1}}\mathrm{d}u_{{\scriptscriptstyle 2}} \]


\(f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)\)

According to the fundamental theorem of calculus,

\[ f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\dfrac{\mathrm{d}}{\mathrm{d}x_{{\scriptscriptstyle 1}}}F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\dfrac{\mathrm{d}}{\mathrm{d}x_{{\scriptscriptstyle 1}}}F_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},\infty\right)=\dfrac{\mathrm{d}}{\mathrm{d}x_{{\scriptscriptstyle 1}}}\int_{-\infty}^{\infty}\int_{-\infty}^{x}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(u_{{\scriptscriptstyle 1}},u_{{\scriptscriptstyle 2}}\right)\mathrm{d}u_{{\scriptscriptstyle 1}}\mathrm{d}u_{{\scriptscriptstyle 2}}=\int_{-\infty}^{\infty}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},u_{{\scriptscriptstyle 2}}\right)\mathrm{d}u_{{\scriptscriptstyle 2}} \]


\[ f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\int_{-\infty}^{\infty}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},u_{{\scriptscriptstyle 2}}\right)\mathrm{d}u_{{\scriptscriptstyle 2}} \]

\[ f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\int_{-\infty}^{\infty}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right)\mathrm{d}x_{{\scriptscriptstyle 2}} \]


\[ f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}}}\left(x_{{\scriptscriptstyle 1}}\right)=\begin{cases} \int_{-\infty}^{\infty}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right)\mathrm{d}x_{{\scriptscriptstyle 2}} & \text{continuous}\\ \sum\limits _{x_{{\scriptscriptstyle 2}}\in X_{{\scriptscriptstyle 2}}\left(\Omega\right)}f_{{\scriptscriptstyle X_{{\scriptscriptstyle 1}}X_{{\scriptscriptstyle 2}}}}\left(x_{{\scriptscriptstyle 1}},x_{{\scriptscriptstyle 2}}\right) & \text{discrete} \end{cases} \]

Theorem 18.19 A function \(f(x,y)\) is a joint PDF/PMF or JPDF/JPMF iff

\[ \begin{cases} f\left(x,y\right)\ge0 & \left(ne\right)\text{non-negative}\\ \begin{cases} \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f\left(x,y\right)\mathrm{d}x\mathrm{d}y=1 & \text{continuous}\\ \sum\limits _{x\in X\left(\Omega_{{\scriptscriptstyle X}}\right)}\sum\limits _{y\in Y\left(\Omega_{{\scriptscriptstyle Y}}\right)}f\left(x,y\right)=1 & \text{discrete} \end{cases} & \left(1\right)\text{total event} \end{cases} \]

Definition 18.22 expected value: The expected value of a random vector \(g\left(X,Y\right)\) is

\[ \mathrm{E}_{{\scriptscriptstyle X,Y}}\left[g\left(X,Y\right)\right]=\mathrm{E}\left[g\left(X,Y\right)\right]=\begin{cases} \intop\limits _{-\infty}^{+\infty}\intop\limits _{-\infty}^{+\infty}g\left(x,y\right)f_{{\scriptscriptstyle XY}}\left(x,y\right)\mathrm{d}x\mathrm{d}y & \text{continuous}\\ \sum\limits _{x\in X\left(\Omega_{{\scriptscriptstyle X}}\right)}\sum\limits _{y\in Y\left(\Omega_{{\scriptscriptstyle Y}}\right)}g\left(x,y\right)f_{{\scriptscriptstyle XY}}\left(x,y\right) & \text{discrete} \end{cases} \]

Def: 18.14

\[ \begin{aligned} \mathrm{P}\left(\left(X,Y\right)\in E\right)=\mathrm{E}\left[1\left(\left(X,Y\right)\in E\right)\right]= & \begin{cases} \int_{y\in Y\left(\Omega_{{\scriptscriptstyle Y}}\right)}\int_{x\in X\left(\Omega_{{\scriptscriptstyle X}}\right)}1\left(\left(X,Y\right)\in E\right)f_{{\scriptscriptstyle XY}}\left(x,y\right)\mathrm{d}x\mathrm{d}y & \text{continuous}\\ \sum\limits _{x\in X\left(\Omega_{{\scriptscriptstyle X}}\right)}\sum\limits _{y\in Y\left(\Omega_{{\scriptscriptstyle Y}}\right)}1\left(\left(X,Y\right)\in E\right)g\left(x,y\right)f_{{\scriptscriptstyle XY}}\left(x,y\right) & \text{discrete} \end{cases}\\ = & \begin{cases} \iint_{\left(X,Y\right)\in E}f_{{\scriptscriptstyle XY}}\left(x,y\right)\mathrm{d}x\mathrm{d}y & \text{continuous}\\ \sum\sum\limits _{\left(X,Y\right)\in E}f_{{\scriptscriptstyle XY}}\left(x,y\right) & \text{discrete} \end{cases} \end{aligned} \]


example

\(f_{{\scriptscriptstyle XY}}\left(x,y\right)=\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\Rightarrow\mathrm{P}\left(X+Y\le1\right)\)

Check if

\[ f_{{\scriptscriptstyle XY}}\left(x,y\right)=\mathrm{e}^{-y}1\left(0<x<y<\infty\right) \]

is a JPDF:

\[ \begin{cases} f_{{\scriptscriptstyle XY}}\left(x,y\right)=\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\overset{\begin{cases} \mathrm{e}^{-y}>0\\ 1\left(0<x<y<\infty\right)=\begin{cases} 0\\ 1 \end{cases} \end{cases}}{\ge}0 & \left(ne\right)\text{non-negative}\\ \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f_{{\scriptscriptstyle XY}}\left(x,y\right)\mathrm{d}x\mathrm{d}y=? & \left(1\right)\text{total event} \end{cases} \]

\[ \begin{aligned} & \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f_{{\scriptscriptstyle XY}}\left(x,y\right)\mathrm{d}x\mathrm{d}y=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\mathrm{d}x\mathrm{d}y,x\in\left(0,\infty\right)\\ = & \int_{-\infty}^{\infty}\int_{0}^{\infty}\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\mathrm{d}x\mathrm{d}y,y>x\in\left(0,\infty\right)\\ = & \int_{0}^{\infty}\int_{0}^{\infty}\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\mathrm{d}x\mathrm{d}y\overset{\text{Fubini}}{=}\int_{0}^{\infty}\int_{0}^{\infty}\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\mathrm{d}y\mathrm{d}x,y>x\\ = & \int_{0}^{\infty}\int_{x}^{\infty}\mathrm{e}^{-y}1\mathrm{d}y\mathrm{d}x=\int_{0}^{\infty}\int_{x}^{\infty}\mathrm{e}^{-y}\mathrm{d}y\mathrm{d}x=\int_{0}^{\infty}\left[-\mathrm{e}^{-y}\right]_{{\scriptscriptstyle y=x}}^{\infty}\mathrm{d}x=\int_{0}^{\infty}\left[-\mathrm{e}^{-\infty}-\left(-\mathrm{e}^{-x}\right)\right]\mathrm{d}x\\ = & \int_{0}^{\infty}\left[-0-\left(-\mathrm{e}^{-x}\right)\right]\mathrm{d}x=\int_{0}^{\infty}\mathrm{e}^{-x}\mathrm{d}x=\left[-\mathrm{e}^{-x}\right]_{{\scriptscriptstyle x=0}}^{\infty}=\left[-\mathrm{e}^{-\infty}-\left(-\mathrm{e}^{-0}\right)\right]=\left[-0-\left(-1\right)\right]=1 \end{aligned} \]

\[ \begin{cases} f_{{\scriptscriptstyle XY}}\left(x,y\right)=\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\ge0 & \left(ne\right)\text{non-negative}\\ \int_{-\infty}^{\infty}\int_{-\infty}^{\infty}f_{{\scriptscriptstyle XY}}\left(x,y\right)\mathrm{d}x\mathrm{d}y=1 & \left(1\right)\text{total event} \end{cases} \]

\(\mathrm{P}\left(X+Y\le1\right)\)

https://tex.stackexchange.com/questions/75933/how-to-draw-the-region-of-inequality

https://tex.stackexchange.com/questions/352511/how-to-fill-in-inequality-where-all-inequalities-overlap

interpolation dashed lines[16.5.6]

$X+Y\le1$

Fig. 16.17: \(X+Y\le1\)

\[ \begin{aligned} \mathrm{P}\left(X+Y\le1\right)= & \iint_{\left(X,Y\right)\in E}f_{{\scriptscriptstyle XY}}\left(x,y\right)\mathrm{d}x\mathrm{d}y,E=\left\{ X+Y\le1\right\} \\ = & \iint_{E}\mathrm{e}^{-y}1\left(0<x<y<\infty\right)\mathrm{d}x\mathrm{d}y,E=\left\{ X+Y\le1\right\} \\ = & \int_{0}^{0.5}\int_{x}^{1-x}\mathrm{e}^{-y}\mathrm{d}y\mathrm{d}x=\int_{0}^{0.5}\left[-\mathrm{e}^{-y}\right]_{{\scriptscriptstyle y=x}}^{{\scriptscriptstyle 1-x}}\mathrm{d}x=\int_{0}^{0.5}\left[-\mathrm{e}^{-\left(1-x\right)}-\left(-\mathrm{e}^{-x}\right)\right]\mathrm{d}x\\ = & \int_{0}^{0.5}\left[\mathrm{e}^{-x}-\mathrm{e}^{x-1}\right]\mathrm{d}x=\left[-\mathrm{e}^{-x}-\mathrm{e}^{x-1}\right]_{{\scriptscriptstyle x=0}}^{{\scriptscriptstyle 0.5}}=\left[-\mathrm{e}^{-0.5}-\mathrm{e}^{0.5-1}\right]-\left[-\mathrm{e}^{-0}-\mathrm{e}^{0-1}\right]\\ = & 1+\mathrm{e}^{-1}-2\mathrm{e}^{-0.5}\fallingdotseq0.154818\cdots \end{aligned} \]

\[ \tag*{$\Box$} \]

https://www.youtube.com/watch?v=r-bsPw2-sRg

https://www.youtube.com/watch?v=EwMFsvERFVw

https://www.youtube.com/watch?v=Os6HJLDtYgM

https://www.youtube.com/watch?v=8GefzYp6iBE

https://www.youtube.com/watch?v=nOSXlzxXrPY

https://www.youtube.com/watch?v=hwU7JLCRCgo

https://www.youtube.com/watch?v=duKMBB1bdCU

https://www.youtube.com/watch?v=H7wBXhvjZfg

https://www.youtube.com/playlist?list=PLTp0eSi9MdkNZB4kyLSzIXIUy9JQOJ5AM

https://www.youtube.com/watch?v=qgef6G9rzts

https://www.youtube.com/watch?v=telisdm9Aus

https://www.youtube.com/watch?v=qeRsAId5f5U

https://www.youtube.com/watch?v=cuR-HsEq_fs

https://www.youtube.com/watch?v=Ue1mgEVDwq0

https://www.youtube.com/watch?v=1meJoxJ5_UA

https://www.youtube.com/watch?v=UB0kwppDucI

https://www.youtube.com/watch?v=YrqcdCPM1nw

https://www.youtube.com/watch?v=GCw1T_lUunw

https://www.youtube.com/watch?v=MF0mZ5MpcSw

https://www.youtube.com/watch?v=zX8L8yGlYaU

https://www.youtube.com/watch?v=XJgvphHKXwY

https://www.youtube.com/watch?v=8Qzqf51O6ZE

https://www.youtube.com/watch?v=7x8QN1pYT7c

https://www.youtube.com/watch?v=oTf3n-OD7EI

https://www.youtube.com/watch?v=Igo2EJPz3sU

https://www.youtube.com/watch?v=NVemOifoMqw

https://www.youtube.com/watch?v=HrrBA3v-xTQ

https://www.youtube.com/watch?v=9HbtJ3ZIPxQ

https://www.youtube.com/watch?v=elPnSU4AF1o

https://www.youtube.com/watch?v=_OVHGnQ7Rug

https://www.youtube.com/watch?v=pqHEXAAW_vk

https://www.youtube.com/watch?v=BTK_1Lz5ox8

https://www.youtube.com/watch?v=iATKGYnlomU

https://www.youtube.com/watch?v=5LlTeUeAqDc

https://www.youtube.com/watch?v=usCaJRQ2i6E

https://www.youtube.com/watch?v=p8NXibyDKDo

references

6.
張翔. & 廖崇智. 提綱挈領學統計. (大碩教育, 2021).
7.
Mittelhammer, R. C. Mathematical Statistics for Economics and Business. (Springer, New York Heidelberg Dordrecht London, 2013).