1.4 $O_\mathbb{P}$ and $o_\mathbb{P}$ notation

In computer science the $O$ notation used to measure the complexity of algorithms. For example, when an algorithm is $O(n^2),$ it is said that it is quadratic in time and we know that is going to take on the order of $n^2$ operations to process an input of size $n.$ We do not care about the specific amount of computations, but focus on the big picture by looking for an upper bound for the sequence of computation times in terms of $n.$ This upper bound disregards constants. For example, the dot product between two vectors of size $n$ is an $O(n)$ operation, albeit it takes $n$ multiplications and $n-1$ sums, hence $2n-1$ operations.

In mathematical analysis, $O$-related notation is mostly used with the aim of bounding sequences that shrink to zero. The technicalities are however the same.

Definition 1.5 (Big $O$) Given two strictly positive sequences $a_n$ and $b_n,$

\[\begin{align*} a_n=O(b_n):\iff \lim_{n\to\infty}\frac{a_n}{b_n}\leq C,\text{ for a }C>0. \end{align*}\]

If $a_n=O(b_n),$ then we say that $a_n$ is big-$O$ of $b_n$. To indicate that $a_n$ is bounded, we write $a_n=O(1).$

Definition 1.6 (Little $o$) Given two strictly positive sequences $a_n$ and $b_n,$

\[\begin{align*} a_n=o(b_n):\iff \lim_{n\to\infty}\frac{a_n}{b_n}=0. \end{align*}\]

If $a_n=o(b_n),$ then we say that $a_n$ is little-$o$ of $b_n$. To indicate that $a_n\to0,$ we write $a_n=o(1).$

The interpretation of these two definitions is simple:

$a_n=O(b_n)$ means that $a_n$ “not larger than” $b_n$ asymptotically. If $a_n,b_n\to0,$ then it means that $a_n$ “does not decrease slower” than $b_n.$
$a_n=o(b_n)$ means that $a_n$ is “smaller than” $b_n$ asymptotically. If $a_n,b_n\to0,$ then it means that $a_n$ “decrease faster” than $b_n.$

Obviously, little-$o$ implies big-$O$. Playing with limits we can get a list of useful facts:

Proposition 1.7 Consider three sequences $a_n,b_n,c_n\to0.$ The following properties hold:

$kO(a_n)=O(a_n),$ $ko(a_n)=o(a_n),$ $k\in\mathbb{R}.$
$o(a_n)+o(b_n)=o(a_n+b_n),$ $O(a_n)+O(b_n)=O(a_n+b_n).$
$o(a_n)+O(b_n)=O(a_n+b_n),$ $o(a_n)o(b_n)=o(a_nb_n).$
$O(a_n)O(b_n)=O(a_nb_n),$ $o(a_n)O(b_n)=o(a_nb_n).$
$o(1)O(a_n)=o(a_n).$
$a_n^r=o(a_n^s),$ for $r>s\geq 0.$
$a_nb_n=o(a_n^2+b_n^2).$
$a_nb_n=o(a_n+b_n).$
$(a_n+b_n)^k=O(a_n^k+b_n^k).$

The last result is a consequence of a useful inequality:

Lemma 1.1 ($C_p$ inequality) Given $a,b\in\mathbb{R}$ and $p>0,$

\[\begin{align*} |a+b|^p\leq C_p(|a|^p+|b|^p), \quad C_p=\begin{cases} 1,&p\leq1,\\ 2^{p-1},&p>1. \end{cases} \end{align*}\]

The previous notation is purely deterministic. Let’s add some stochastic flavor by establishing the stochastic analogous of little-$o.$

Definition 1.7 (Little $o_\mathbb{P}$) Given a strictly positive sequence $a_n$ and a sequence of random variables $X_n,$

\[\begin{align*} X_n=o_\mathbb{P}(a_n):\iff& \frac{|X_n|}{a_n}\stackrel{\mathbb{P}}{\longrightarrow}0 \\ \iff& \lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>\varepsilon\right]=0,\;\forall\varepsilon>0. \end{align*}\]

If $X_n=o_\mathbb{P}(a_n),$ then we say that $X_n$ is little-$o_\mathbb{P}$ of $a_n$. To indicate that $X_n\stackrel{\mathbb{P}}{\longrightarrow}0,$ we write $X_n=o_\mathbb{P}(1).$

Therefore, little-$o_\mathbb{P}$ allows to quantify easily the speed at which a sequence of random variables converges to zero in probability.

Example 1.3 Let $Y_n=o_\mathbb{P}\big(n^{-1/2}\big)$ and $Z_n=o_\mathbb{P}\left(n^{-1}\right).$ Then $Z_n$ converges faster to zero in probability than $Y_n.$ To visualize this, recall that the little-$o_\mathbb{P}$ and limit definitions entail that

\[\begin{align*} \forall\varepsilon,\delta>0,\,\exists \,n_0=n_0(\varepsilon,\delta):\forall n\geq n_0,\, \mathbb{P}\left[|X_n|>a_n\varepsilon\right]<\delta. \end{align*}\]

Therefore, for fixed $\varepsilon,\delta>0,$ and a fixed $n\geq\max(n_{0,Y},n_{0,Z}),$ then $\mathbb{P}\left[Y_n\in\big(-n^{-1/2}\varepsilon,n^{-1/2}\varepsilon\big)\right]<1-\delta$ and $\mathbb{P}\left[Z_n\in(-n^{-1}\varepsilon,n^{-1}\varepsilon)\right]<1-\delta,$ but the latter interval is much shorter, hence $Z_n$ is more tightly concentrated around $0.$

Definition 1.8 (Big $O_\mathbb{P}$) Given a strictly positive sequence $a_n$ and a sequence of random variables $X_n,$

\[\begin{align*} X_n=O_\mathbb{P}(a_n):\iff&\forall\varepsilon>0,\,\exists\,C=C_\varepsilon>0,\,n_0=n_0(\varepsilon):\\ &\forall n\geq n_0,\, \mathbb{P}\left[\frac{|X_n|}{a_n}>C_\varepsilon\right]<\varepsilon\\ \iff& \lim_{C\to\infty}\lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>C\right]=0. \end{align*}\]

If $X_n=O_\mathbb{P}(a_n),$ then we say that $X_n$ is big-$O_\mathbb{P}$ of $a_n$.

Example 1.4 Chebyshev inequality entails that $\mathbb{P}[|X-\mathbb{E}[X]|\geq t]\leq \frac{\mathbb{V}\mathrm{ar}[X]}{t^2},$ $\forall t>0.$ Setting $\varepsilon:=\frac{\mathbb{V}\mathrm{ar}[X]}{t^2}$ and $C_\varepsilon:=\frac{1}{\sqrt{\varepsilon}},$ then $\mathbb{P}\left[|X-\mathbb{E}[X]|\geq \sqrt{\mathbb{V}\mathrm{ar}[X]}C_\varepsilon\right]\leq \varepsilon.$ Therefore,

\[\begin{align*} X-\mathbb{E}[X]=O_\mathbb{P}\left(\sqrt{\mathbb{V}\mathrm{ar}[X]}\right). \end{align*}\]

Exercise 1.3 Prove that if $X_n\stackrel{d}{\longrightarrow}X,$ then $X_n=O_\mathbb{P}(1).$ Hint: use the double-limit definition and note that $X=O_\mathbb{P}(1).$

Example 1.5 (Example 1.18 in DasGupta (2008)) Suppose that $a_n(X_n-c_n)\stackrel{d}{\longrightarrow}X$ for deterministic sequences $a_n$ and $c_n$ such that $c_n\to c.$ Then, if $a_n\to\infty,$ $X_n-c=o_\mathbb{P}(1).$ The argument is simple:

\[\begin{align*} X_n-c&=X_n-c_n+c_n-c\\ &=\frac{1}{a_n}a_n(X_n-c_n)+o(1)\\ &=\frac{1}{a_n}O_\mathbb{P}(1)+o(1). \end{align*}\]

Exercise 1.4 Using the previous example, derive the weak law of large numbers as a consequence of the CLT, for iid random variables.

Proposition 1.8 Consider two strictly positive sequences $a_n,b_n\to 0.$ The following properties hold:

$o_\mathbb{P}(a_n)=O_\mathbb{P}(a_n)$ (little-$o_\mathbb{P}$ implies big-$O_\mathbb{P}$).
$o(1)=o_\mathbb{P}(1),$ $O(1)=O_\mathbb{P}(1)$ (deterministic implies probabilistic).
$kO_\mathbb{P}(a_n)=O_\mathbb{P}(a_n),$ $ko_\mathbb{P}(a_n)=o_\mathbb{P}(a_n),$ $k\in\mathbb{R}.$
$o_\mathbb{P}(a_n)+o_\mathbb{P}(b_n)=o_\mathbb{P}(a_n+b_n),$ $O_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n).$
$o_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n),$ $o_\mathbb{P}(a_n)o_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n).$
$O_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=O_\mathbb{P}(a_nb_n),$ $o_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n).$
$o_\mathbb{P}(1)O_\mathbb{P}(a_n)=o_\mathbb{P}(a_n).$
$(1+o_\mathbb{P}(1))^{-1}=O_\mathbb{P}(1).$

References

DasGupta, A. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-75971-5.

1.4 \(O_\mathbb{P}\) and \(o_\mathbb{P}\) notation

References