1.4 OP and oP notation

In computer science the O notation used to measure the complexity of algorithms. For example, when an algorithm is O(n2), it is said that it is quadratic in time and we know that is going to take on the order of n2 operations to process an input of size n. We do not care about the specific amount of computations, but focus on the big picture by looking for an upper bound for the sequence of computation times in terms of n. This upper bound disregards constants. For example, the dot product between two vectors of size n is an O(n) operation, albeit it takes n multiplications and n1 sums, hence 2n1 operations.

In mathematical analysis, O-related notation is mostly used with the aim of bounding sequences that shrink to zero. The technicalities are however the same.

Definition 1.5 (Big $O$) Given two strictly positive sequences an and bn,

an=O(bn):lim

If a_n=O(b_n), then we say that a_n is big-O of b_n. To indicate that a_n is bounded, we write a_n=O(1).

Definition 1.6 (Little $o$) Given two strictly positive sequences a_n and b_n,

\begin{align*} a_n=o(b_n):\iff \lim_{n\to\infty}\frac{a_n}{b_n}=0. \end{align*}

If a_n=o(b_n), then we say that a_n is little-o of b_n. To indicate that a_n\to0, we write a_n=o(1).

The interpretation of these two definitions is simple:

  • a_n=O(b_n) means that a_n “not larger than” b_n asymptotically. If a_n,b_n\to0, then it means that a_n “does not decrease slower” than b_n.
  • a_n=o(b_n) means that a_n is “smaller than” b_n asymptotically. If a_n,b_n\to0, then it means that a_n “decrease faster” than b_n.

Obviously, little-o implies big-O. Playing with limits we can get a list of useful facts:

Proposition 1.7 Consider three sequences a_n,b_n,c_n\to0. The following properties hold:

  1. kO(a_n)=O(a_n), ko(a_n)=o(a_n), k\in\mathbb{R}.
  2. o(a_n)+o(b_n)=o(a_n+b_n), O(a_n)+O(b_n)=O(a_n+b_n).
  3. o(a_n)+O(b_n)=O(a_n+b_n), o(a_n)o(b_n)=o(a_nb_n).
  4. O(a_n)O(b_n)=O(a_nb_n), o(a_n)O(b_n)=o(a_nb_n).
  5. o(1)O(a_n)=o(a_n).
  6. a_n^r=o(a_n^s), for r>s\geq 0.
  7. a_nb_n=o(a_n^2+b_n^2).
  8. a_nb_n=o(a_n+b_n).
  9. (a_n+b_n)^k=O(a_n^k+b_n^k).

The last result is a consequence of a useful inequality:

Lemma 1.1 ($C_p$ inequality) Given a,b\in\mathbb{R} and p>0,

\begin{align*} |a+b|^p\leq C_p(|a|^p+|b|^p), \quad C_p=\begin{cases} 1,&p\leq1,\\ 2^{p-1},&p>1. \end{cases} \end{align*}

The previous notation is purely deterministic. Let’s add some stochastic flavor by establishing the stochastic analogous of little-o.

Definition 1.7 (Little $o_\mathbb{P}$) Given a strictly positive sequence a_n and a sequence of random variables X_n,

\begin{align*} X_n=o_\mathbb{P}(a_n):\iff& \frac{|X_n|}{a_n}\stackrel{\mathbb{P}}{\longrightarrow}0 \\ \iff& \lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>\varepsilon\right]=0,\;\forall\varepsilon>0. \end{align*}

If X_n=o_\mathbb{P}(a_n), then we say that X_n is little-o_\mathbb{P} of a_n. To indicate that X_n\stackrel{\mathbb{P}}{\longrightarrow}0, we write X_n=o_\mathbb{P}(1).

Therefore, little-o_\mathbb{P} allows to quantify easily the speed at which a sequence of random variables converges to zero in probability.

Example 1.3 Let Y_n=o_\mathbb{P}\big(n^{-1/2}\big) and Z_n=o_\mathbb{P}\left(n^{-1}\right). Then Z_n converges faster to zero in probability than Y_n. To visualize this, recall that the little-o_\mathbb{P} and limit definitions entail that

\begin{align*} \forall\varepsilon,\delta>0,\,\exists \,n_0=n_0(\varepsilon,\delta):\forall n\geq n_0,\, \mathbb{P}\left[|X_n|>a_n\varepsilon\right]<\delta. \end{align*}

Therefore, for fixed \varepsilon,\delta>0, and a fixed n\geq\max(n_{0,Y},n_{0,Z}), then \mathbb{P}\left[Y_n\in\big(-n^{-1/2}\varepsilon,n^{-1/2}\varepsilon\big)\right]<1-\delta and \mathbb{P}\left[Z_n\in(-n^{-1}\varepsilon,n^{-1}\varepsilon)\right]<1-\delta, but the latter interval is much shorter, hence Z_n is more tightly concentrated around 0.

Definition 1.8 (Big $O_\mathbb{P}$) Given a strictly positive sequence a_n and a sequence of random variables X_n,

\begin{align*} X_n=O_\mathbb{P}(a_n):\iff&\forall\varepsilon>0,\,\exists\,C=C_\varepsilon>0,\,n_0=n_0(\varepsilon):\\ &\forall n\geq n_0,\, \mathbb{P}\left[\frac{|X_n|}{a_n}>C_\varepsilon\right]<\varepsilon\\ \iff& \lim_{C\to\infty}\lim_{n\to\infty}\mathbb{P}\left[\frac{|X_n|}{a_n}>C\right]=0. \end{align*}

If X_n=O_\mathbb{P}(a_n), then we say that X_n is big-O_\mathbb{P} of a_n.

Example 1.4 Chebyshev inequality entails that \mathbb{P}[|X-\mathbb{E}[X]|\geq t]\leq \frac{\mathbb{V}\mathrm{ar}[X]}{t^2}, \forall t>0. Setting \varepsilon:=\frac{\mathbb{V}\mathrm{ar}[X]}{t^2} and C_\varepsilon:=\frac{1}{\sqrt{\varepsilon}}, then \mathbb{P}\left[|X-\mathbb{E}[X]|\geq \sqrt{\mathbb{V}\mathrm{ar}[X]}C_\varepsilon\right]\leq \varepsilon. Therefore,

\begin{align*} X-\mathbb{E}[X]=O_\mathbb{P}\left(\sqrt{\mathbb{V}\mathrm{ar}[X]}\right). \end{align*}

Exercise 1.3 Prove that if X_n\stackrel{d}{\longrightarrow}X, then X_n=O_\mathbb{P}(1). Hint: use the double-limit definition and note that X=O_\mathbb{P}(1).

Example 1.5 (Example 1.18 in DasGupta (2008)) Suppose that a_n(X_n-c_n)\stackrel{d}{\longrightarrow}X for deterministic sequences a_n and c_n such that c_n\to c. Then, if a_n\to\infty, X_n-c=o_\mathbb{P}(1). The argument is simple:

\begin{align*} X_n-c&=X_n-c_n+c_n-c\\ &=\frac{1}{a_n}a_n(X_n-c_n)+o(1)\\ &=\frac{1}{a_n}O_\mathbb{P}(1)+o(1). \end{align*}

Exercise 1.4 Using the previous example, derive the weak law of large numbers as a consequence of the CLT, for iid random variables.

Proposition 1.8 Consider two strictly positive sequences a_n,b_n\to 0. The following properties hold:

  1. o_\mathbb{P}(a_n)=O_\mathbb{P}(a_n) (little-o_\mathbb{P} implies big-O_\mathbb{P}).
  2. o(1)=o_\mathbb{P}(1), O(1)=O_\mathbb{P}(1) (deterministic implies probabilistic).
  3. kO_\mathbb{P}(a_n)=O_\mathbb{P}(a_n), ko_\mathbb{P}(a_n)=o_\mathbb{P}(a_n), k\in\mathbb{R}.
  4. o_\mathbb{P}(a_n)+o_\mathbb{P}(b_n)=o_\mathbb{P}(a_n+b_n), O_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n).
  5. o_\mathbb{P}(a_n)+O_\mathbb{P}(b_n)=O_\mathbb{P}(a_n+b_n), o_\mathbb{P}(a_n)o_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n).
  6. O_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=O_\mathbb{P}(a_nb_n), o_\mathbb{P}(a_n)O_\mathbb{P}(b_n)=o_\mathbb{P}(a_nb_n).
  7. o_\mathbb{P}(1)O_\mathbb{P}(a_n)=o_\mathbb{P}(a_n).
  8. (1+o_\mathbb{P}(1))^{-1}=O_\mathbb{P}(1).

References

DasGupta, A. 2008. Asymptotic Theory of Statistics and Probability. Springer Texts in Statistics. New York: Springer. https://doi.org/10.1007/978-0-387-75971-5.