1.4 \(O_\mathbb{P}\) and \(o_\mathbb{P}\) notation
In computer science the \(O\) notation used to measure the complexity of algorithms. For example, when an algorithm is \(O(n^2),\) it is said that it is quadratic in time and we know that is going to take on the order of \(n^2\) operations to process an input of size \(n.\) We do not care about the specific amount of computations, but focus on the big picture by looking for an upper bound for the sequence of computation times in terms of \(n.\) This upper bound disregards constants. For example, the dot product between two vectors of size \(n\) is an \(O(n)\) operation, albeit it takes \(n\) multiplications and \(n-1\) sums, hence \(2n-1\) operations.
In mathematical analysis, \(O\)-related notation is mostly used with the aim of bounding sequences that shrink to zero. The technicalities are however the same.
Definition 1.5 (Big $O$) Given two strictly positive sequences \(a_n\) and \(b_n,\)
\[\begin{align*} a_n=O(b_n):\iff \lim_{n\to\infty}\frac{a_n}{b_n}\leq C,\text{ for a }C>0. \end{align*}\]
If \(a_n=O(b_n),\) then we say that \(a_n\) is big-\(O\) of \(b_n\). To indicate that \(a_n\) is bounded, we write \(a_n=O(1).\)
Definition 1.6 (Little $o$) Given two strictly positive sequences \(a_n\) and \(b_n,\)
\[\begin{align*} a_n=o(b_n):\iff \lim_{n\to\infty}\frac{a_n}{b_n}=0. \end{align*}\]
If \(a_n=o(b_n),\) then we say that \(a_n\) is little-\(o\) of \(b_n\). To indicate that \(a_n\to0,\) we write \(a_n=o(1).\)
The interpretation of these two definitions is simple:
- \(a_n=O(b_n)\) means that \(a_n\) “not larger than” \(b_n\) asymptotically. If \(a_n,b_n\to0,\) then it means that \(a_n\) “does not decrease slower” than \(b_n.\)
- \(a_n=o(b_n)\) means that \(a_n\) is “smaller than” \(b_n\) asymptotically. If \(a_n,b_n\to0,\) then it means that \(a_n\) “decrease faster” than \(b_n.\)
Obviously, little-\(o\) implies big-\(O\). Playing with limits we can get a list of useful facts:
Proposition 1.7 Consider three sequences \(a_n,b_n,c_n\to0.\) The following properties hold:
- \(kO(a_n)=O(a_n),\) \(ko(a_n)=o(a_n),\) \(k\in\mathbb{R}.\)
- \(o(a_n)+o(b_n)=o(a_n+b_n),\) \(O(a_n)+O(b_n)=O(a_n+b_n).\)
- \(o(a_n)+O(b_n)=O(a_n+b_n),\) \(o(a_n)o(b_n)=o(a_nb_n).\)
- \(O(a_n)O(b_n)=O(a_nb_n),\) \(o(a_n)O(b_n)=o(a_nb_n).\)
- \(o(1)O(a_n)=o(a_n).\)
- \(a_n^r=o(a_n^s),\) for \(r>s\geq 0.\)
- \(a_nb_n=o(a_n^2+b_n^2).\)
- \(a_nb_n=o(a_n+b_n).\)
- \((a_n+b_n)^k=O(a_n^k+b_n^k).\)
The last result is a consequence of a useful inequality:
Lemma 1.1 ($C_p$ inequality) Given \(a,b\in\mathbb{R}\) and \(p>0,\)
\[\begin{align*} |a+b|^p\leq C_p(|a|^p+|b|^p), \quad C_p=\begin{cases} 1,&p\leq1,\\ 2^{p-1},&p>1. \end{cases} \end{align*}\]
The previous notation is purely deterministic. Let’s add some stochastic flavor by establishing the stochastic analogous of little-\(o.\)
Definition 1.7 (Little $o_\mathbb{P}$) Given a strictly positive sequence \(a_n\) and a sequence of random variables \(X_n,\)
\[\begin{align*} X_n=o_\mathbb{P}(a_n):\iff& \frac{|X_n|}{a_n}\stackrel{\mathbb{P}}{\longrightarrow}0 \\ \iff& \lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>\varepsilon\right]=0,\;\forall\varepsilon>0. \end{align*}\]
If \(X_n=o_\mathbb{P}(a_n),\) then we say that \(X_n\) is little-\(o_\mathbb{P}\) of \(a_n\). To indicate that \(X_n\stackrel{\mathbb{P}}{\longrightarrow}0,\) we write \(X_n=o_\mathbb{P}(1).\)
Therefore, little-\(o_\mathbb{P}\) allows to quantify easily the speed at which a sequence of random variables converges to zero in probability.
Example 1.3 Let \(Y_n=o_\mathbb{P}\big(n^{-1/2}\big)\) and \(Z_n=o_\mathbb{P}\left(n^{-1}\right).\) Then \(Z_n\) converges faster to zero in probability than \(Y_n.\) To visualize this, recall that the little-\(o_\mathbb{P}\) and limit definitions entail that
\[\begin{align*} \forall\varepsilon,\delta>0,\,\exists \,n_0=n_0(\varepsilon,\delta):\forall n\geq n_0,\, \mathbb{P}\left[|X_n|>a_n\varepsilon\right]<\delta. \end{align*}\]
Therefore, for fixed \(\varepsilon,\delta>0,\) and a fixed \(n\geq\max(n_{0,Y},n_{0,Z}),\) then \(\mathbb{P}\left[Y_n\in\big(-n^{-1/2}\varepsilon,n^{-1/2}\varepsilon\big)\right]<1-\delta\) and \(\mathbb{P}\left[Z_n\in(-n^{-1}\varepsilon,n^{-1}\varepsilon)\right]<1-\delta,\) but the latter interval is much shorter, hence \(Z_n\) is more tightly concentrated around \(0.\)
Definition 1.8 (Big $O_\mathbb{P}$) Given a strictly positive sequence \(a_n\) and a sequence of random variables \(X_n,\)
\[\begin{align*} X_n=O_\mathbb{P}(a_n):\iff&\forall\varepsilon>0,\,\exists\,C=C_\varepsilon>0,\,n_0=n_0(\varepsilon):\\ &\forall n\geq n_0,\, \mathbb{P}\left[\frac{|X_n|}{a_n}>C_\varepsilon\right]<\varepsilon\\ \iff& \lim_{C\to\infty}\lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>C\right]=0. \end{align*}\]
If \(X_n=O_\mathbb{P}(a_n),\) then we say that \(X_n\) is big-\(O_\mathbb{P}\) of \(a_n\).
Example 1.4 Chebyshev inequality entails that \(\mathbb{P}[|X-\mathbb{E}[X]|\geq t]\leq \frac{\mathbb{V}\mathrm{ar}[X]}{t^2},\) \(\forall t>0.\) Setting \(\varepsilon:=\frac{\mathbb{V}\mathrm{ar}[X]}{t^2}\) and \(C_\varepsilon:=\frac{1}{\sqrt{\varepsilon}},\) then \(\mathbb{P}\left[|X-\mathbb{E}[X]|\geq \sqrt{\mathbb{V}\mathrm{ar}[X]}C_\varepsilon\right]\leq \varepsilon.\) Therefore,
\[\begin{align*} X-\mathbb{E}[X]=O_\mathbb{P}\left(\sqrt{\mathbb{V}\mathrm{ar}[X]}\right). \end{align*}\]
Exercise 1.3 Prove that if \(X_n\stackrel{d}{\longrightarrow}X,\) then \(X_n=O_\mathbb{P}(1).\) Hint: use the double-limit definition and note that \(X=O_\mathbb{P}(1).\)
Example 1.5 (Example 1.18 in DasGupta (2008)) Suppose that \(a_n(X_n-c_n)\stackrel{d}{\longrightarrow}X\) for deterministic sequences \(a_n\) and \(c_n\) such that \(c_n\to c.\) Then, if \(a_n\to\infty,\) \(X_n-c=o_\mathbb{P}(1).\) The argument is simple:
\[\begin{align*} X_n-c&=X_n-c_n+c_n-c\\ &=\frac{1}{a_n}a_n(X_n-c_n)+o(1)\\ &=\frac{1}{a_n}O_\mathbb{P}(1)+o(1). \end{align*}\]
Exercise 1.4 Using the previous example, derive the weak law of large numbers as a consequence of the CLT, for iid random variables.
Proposition 1.8 Consider two strictly positive sequences \(a_n,b_n\to 0.\) The following properties hold:
- \(o_\mathbb{P}(a_n)=O_\mathbb{P}(a_n)\) (little-\(o_\mathbb{P}\) implies big-\(O_\mathbb{P}\)).
- \(o(1)=o_\mathbb{P}(1),\) \(O(1)=O_\mathbb{P}(1)\) (deterministic implies probabilistic).
- \(kO_\mathbb{P}(a_n)=O_\mathbb{P}(a_n),\) \(ko_\mathbb{P}(a_n)=o_\mathbb{P}(a_n),\) \(k\in\mathbb{R}.\)
- \(o_\mathbb{P}(a_n)+o_\mathbb{P}(b_n)=o_\mathbb{P}(a_n+b_n),\) \(O_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n).\)
- \(o_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n),\) \(o_\mathbb{P}(a_n)o_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n).\)
- \(O_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=O_\mathbb{P}(a_nb_n),\) \(o_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n).\)
- \(o_\mathbb{P}(1)O_\mathbb{P}(a_n)=o_\mathbb{P}(a_n).\)
- \((1+o_\mathbb{P}(1))^{-1}=O_\mathbb{P}(1).\)