4.2 Pseudo Random Numbers

We will investigate ways to simulate numbers using algorithms in a computer. For this reason such numbers are usually called pseudo-random numbers. Pseudo means false, in the sense that the number are not really random! They are generated according to a deterministic algorithm whose aim is to imitate as closely as possible what randomness would look like. In particular, for numbers \(u_1,\dots,u_N\) it means that they should look like independence instances of a Uniform distribution between zero and one.

Possible departures from ideal numbers are:

  • the numbers are not uniformly distributed;

  • the mean of the numbers might not be 1/2;

  • the variance of the numbers might not be 1/12;

  • the numbers might be discrete-valued instead of continuous;

  • independence might not hold.

We already looked at examples of departures from the assumptions, but we will later study how to assess these departures more formally.

Before looking at how we can construct pseudo-random numbers, let’s discuss some important properties/considerations that need to be taken into account when generating pseudo-random numbers:

  • the random generation should be very fast. In practice, we want to use random numbers to do other computations (for example simulate a little donut shop) and such computations might be computationally intensive: if random generation were to be slow, we would not be able to perform them.

  • the cycle of random generated numbers should be long. The cycle is the length of the sequence before numbers start to repeat themselves.

  • the random numbers should be repeatable. Given a starting point of the algorithm, it should be possible to repeat the exact same sequence of numbers. This is fundamental for debugging and for reproducibility.

  • the method should be applicable in any programming language/platform.

  • and of course most importantly, the random numbers should be independent and uniformly distributed.

Repeatability of the pseudo-random numbers is worth further consideration. It is fundamental in science to be able to reproduce experiments so that the validity of results can be assessed. In R there is a specific function that allows us to do this, which is called set.seed. It is customary to choose as starting point of an algorithm the current year. So henceforth you will see the command:

set.seed(2021)

This ensures that every time the code following set.seed is run, the same results will be observed. We will give below examples of this.