## 1.4$$O_\mathbb{P}$$ and $$o_\mathbb{P}$$ notation

In computer science the $$O$$ notation used to measure the complexity of algorithms. For example, when an algorithm is $$O(n^2)$$, it is said that it is quadratic in time and we know that is going to take on the order of $$n^2$$ operations to process an input of size $$n$$. We do not care about the specific amount of computations, but focus on the big picture by looking for an upper bound for the sequence of computation times in terms of $$n$$. This upper bound disregards constants. For example, the dot product between two vectors of size $$n$$ is an $$O(n)$$ operation, albeit it takes $$n$$ multiplications and $$n-1$$ sums, hence $$2n-1$$ operations.

In mathematical analysis, $$O$$-related notation is mostly used with the aim of bounding sequences that shrink to zero. The technicalities are however the same.

Definition 1.5 (Big $$O$$) Given two strictly positive sequences $$a_n$$ and $$b_n$$,

\begin{align*} a_n=O(b_n):\iff \lim_{n\to\infty}\frac{a_n}{b_n}\leq C,\text{ for a }C>0. \end{align*}

If $$a_n=O(b_n)$$, then we say that $$a_n$$ is big-$$O$$ of $$b_n$$. To indicate that $$a_n$$ is bounded, we write $$a_n=O(1)$$.

Definition 1.6 (Little $$o$$) Given two strictly positive sequences $$a_n$$ and $$b_n$$,

\begin{align*} a_n=o(b_n):\iff \lim_{n\to\infty}\frac{a_n}{b_n}=0. \end{align*}

If $$a_n=o(b_n)$$, then we say that $$a_n$$ is little-$$o$$ of $$b_n$$. To indicate that $$a_n\to0$$, we write $$a_n=o(1)$$.

The interpretation of these two definitions is simple:

• $$a_n=O(b_n)$$ means that $$a_n$$ “not larger than” $$b_n$$ asymptotically. If $$a_n,b_n\to0$$, then it means that $$a_n$$ “does not decrease slower” than $$b_n$$.
• $$a_n=o(b_n)$$ means that $$a_n$$ is “smaller than” $$b_n$$ asymptotically. If $$a_n,b_n\to0$$, then it means that $$a_n$$ “decrease faster” than $$b_n$$.

Obviously, little-$$o$$ implies big-$$O$$. Playing with limits we can get a list of useful facts:

Proposition 1.7 Consider three sequences $$a_n,b_n,c_n\to0$$. The following properties hold:

1. $$kO(a_n)=O(a_n)$$, $$ko(a_n)=o(a_n)$$, $$k\in\mathbb{R}$$.
2. $$o(a_n)+o(b_n)=o(a_n+b_n)$$, $$O(a_n)+O(b_n)=O(a_n+b_n)$$.
3. $$o(a_n)+O(b_n)=O(a_n+b_n)$$, $$o(a_n)o(b_n)=o(a_nb_n)$$.
4. $$O(a_n)O(b_n)=O(a_nb_n)$$, $$o(a_n)O(b_n)=o(a_nb_n)$$.
5. $$o(1)O(a_n)=o(a_n)$$.
6. $$a_n^r=o(a_n^s)$$, for $$r>s\geq 0$$.
7. $$a_nb_n=o(a_n^2+b_n^2)$$.
8. $$a_nb_n=o(a_n+b_n)$$.
9. $$(a_n+b_n)^k=O(a_n^k+b_n^k)$$.

The last result is a consequence of a useful inequality:

Lemma 1.1 ($$C_p$$ inequality) Given $$a,b\in\mathbb{R}$$ and $$p>0$$,

\begin{align*} |a+b|^p\leq C_p(|a|^p+|b|^p), \quad C_p=\begin{cases} 1,&p\leq1,\\ 2^{p-1},&p>1. \end{cases} \end{align*}

The previous notation is purely deterministic. Let’s add some stochastic flavor by establishing the stochastic analogous of little-$$o$$.

Definition 1.7 (Little $$o_\mathbb{P}$$) Given a strictly positive sequence $$a_n$$ and a sequence of random variables $$X_n$$,

\begin{align*} X_n=o_\mathbb{P}(a_n):\iff& \frac{|X_n|}{a_n}\stackrel{\mathbb{P}}{\longrightarrow}0 \\ \iff& \lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>\varepsilon\right]=0,\;\forall\varepsilon>0. \end{align*}

If $$X_n=o_\mathbb{P}(a_n)$$, then we say that $$X_n$$ is little-$$o_\mathbb{P}$$ of $$a_n$$. To indicate that $$X_n\stackrel{\mathbb{P}}{\longrightarrow}0$$, we write $$X_n=o_\mathbb{P}(1)$$.

Therefore, little-$$o_\mathbb{P}$$ allows to quantify easily the speed at which a sequence of random variables converges to zero in probability.

Example 1.3 Let $$Y_n=o_\mathbb{P}\big(n^{-1/2}\big)$$ and $$Z_n=o_\mathbb{P}\left(n^{-1}\right)$$. Then $$Z_n$$ converges faster to zero in probability than $$Y_n$$. To visualize this, recall that the little-$$o_\mathbb{P}$$ and limit definitions entail that

\begin{align*} \forall\varepsilon,\delta>0,\,\exists \,n_0=n_0(\varepsilon,\delta):\forall n\geq n_0,\, \mathbb{P}\left[|X_n|>a_n\varepsilon\right]<\delta. \end{align*}

Therefore, for fixed $$\varepsilon,\delta>0$$, and a fixed $$n\geq\max(n_{0,Y},n_{0,Z})$$, then $$\mathbb{P}\left[Y_n\in\big(-n^{-1/2}\varepsilon,n^{-1/2}\varepsilon\big)\right]<1-\delta$$ and $$\mathbb{P}\left[Z_n\in(-n^{-1}\varepsilon,n^{-1}\varepsilon)\right]<1-\delta$$, but the latter interval is much shorter, hence $$Z_n$$ is more tightly concentrated around $$0$$.

Definition 1.8 (Big $$O_\mathbb{P}$$) Given a strictly positive sequence $$a_n$$ and a sequence of random variables $$X_n$$,

\begin{align*} X_n=O_\mathbb{P}(a_n):\iff&\forall\varepsilon>0,\,\exists\,C=C_\varepsilon>0,\,n_0=n_0(\varepsilon):\\ &\forall n\geq n_0,\, \mathbb{P}\left[\frac{|X_n|}{a_n}>C_\varepsilon\right]<\varepsilon\\ \iff& \lim_{C\to\infty}\lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>C\right]=0. \end{align*}

If $$X_n=O_\mathbb{P}(a_n)$$, then we say that $$X_n$$ is big-$$O_\mathbb{P}$$ of $$a_n$$.

Example 1.4 Chebyshev inequality entails that $$\mathbb{P}[|X-\mathbb{E}[X]|\geq t]\leq \frac{\mathbb{V}\mathrm{ar}[X]}{t^2}$$, $$\forall t>0$$. Setting $$\varepsilon:=\frac{\mathbb{V}\mathrm{ar}[X]}{t^2}$$ and $$C_\varepsilon:=\frac{1}{\sqrt{\varepsilon}}$$, then $$\mathbb{P}\left[|X-\mathbb{E}[X]|\geq \sqrt{\mathbb{V}\mathrm{ar}[X]}C_\varepsilon\right]\leq \varepsilon$$. Therefore,

\begin{align*} X-\mathbb{E}[X]=O_\mathbb{P}\left(\sqrt{\mathbb{V}\mathrm{ar}[X]}\right). \end{align*}

Exercise 1.3 Prove that if $$X_n\stackrel{d}{\longrightarrow}X$$, then $$X_n=O_\mathbb{P}(1)$$. Hint: use the double-limit definition and note that $$X=O_\mathbb{P}(1)$$.

Example 1.5 (Example 1.18 in DasGupta (2008)) Suppose that $$a_n(X_n-c_n)\stackrel{d}{\longrightarrow}X$$ for deterministic sequences $$a_n$$ and $$c_n$$ such that $$c_n\to c$$. Then, if $$a_n\to\infty$$, $$X_n-c=o_\mathbb{P}(1)$$. The argument is simple:

\begin{align*} X_n-c&=X_n-c_n+c_n-c\\ &=\frac{1}{a_n}a_n(X_n-c_n)+o(1)\\ &=\frac{1}{a_n}O_\mathbb{P}(1)+o(1). \end{align*}

Exercise 1.4 Using the previous example, derive the weak law of large numbers as a consequence of the CLT, for iid random variables.

Proposition 1.8 Consider two strictly positive sequences $$a_n,b_n\to 0$$. The following properties hold:

1. $$o_\mathbb{P}(a_n)=O_\mathbb{P}(a_n)$$ (little-$$o_\mathbb{P}$$ implies big-$$O_\mathbb{P}$$).
2. $$o(1)=o_\mathbb{P}(1)$$, $$O(1)=O_\mathbb{P}(1)$$ (deterministic implies probabilistic).
3. $$kO_\mathbb{P}(a_n)=O_\mathbb{P}(a_n)$$, $$ko_\mathbb{P}(a_n)=o_\mathbb{P}(a_n)$$, $$k\in\mathbb{R}$$.
4. $$o_\mathbb{P}(a_n)+o_\mathbb{P}(b_n)=o_\mathbb{P}(a_n+b_n)$$, $$O_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n)$$.
5. $$o_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n)$$, $$o_\mathbb{P}(a_n)o_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n)$$.
6. $$O_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=O_\mathbb{P}(a_nb_n)$$, $$o_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n)$$.
7. $$o_\mathbb{P}(1)O_\mathbb{P}(a_n)=o_\mathbb{P}(a_n)$$.
8. $$(1+o_\mathbb{P}(1))^{-1}=O_\mathbb{P}(1)$$.

### References

DasGupta, A. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-75971-5.