6.1 Random Number Generation

In order to use the simulation-based techniques described in this book, we will need to be able to generate sequences of random numbers. Most of the time in the software we are using, there is a function that will do this for us. For example, in R, if we want to generate uniformly distributed random numbers over a fixed interval, we can use the runif() function.

Nevertheless, there are two issues that are worth considering here:

It is useful to know a little about what is going on under the hood of these random number generators. How exactly is the sequence of numbers created?
Built-in functions in R are only useful for well-known or well-characterized distributions. However, wil many simulation-based techniques, we will want to generate random numbers from distributions that we have likely never seen before and for which there will not be any built-in function.

6.1.1 Pseudo-random Numbers

The truth is that R, along with most other analytics packages, does not generate genuine random numbers. R generates pseudo-random numbers that appear to be random but are actually generated in a deterministic way. This approach sounds worse, but it’s actually better for two reasons. First, generating genuine random numbers can be slow and often will depend on some outside source of entropy/randomness. Second, genuine random numbers are not reproducible, so if you wanted to re-create some results based on simulations, you would not ever be able to do so.

Pseudo-random number generators (PRNGs have a long history that we will not cover here. One useful thing to know is that this is tricky area and it is not simple to wander in and start developing your own PRNGs. It is useful to know how the systems work, but after that it’s best to leave the specifics to the experts.

The most commonly used class of PRNGs in scientific applications is the linear congruential generator. The basic idea behind an LCG is that we have a starting seed, and then from there we generate pseudo-random numbers via a recurrence relation. Most LCGs have the form $X_{n+1} = (aX_n + c)~\text{mod}~m$ where $a$ is called the multiplier, $c$ is the increment, and $m$ is the modulus. For $n=0$ , the value $X_0$ is the seed. Modular arithmetic is needed in order to prevent the sequence from going off to infinity. For most generators, the values $X_0,X_1,\dots$ are integers. However, we could for example generate Uniform(0, 1) variates by taking $U_n = X_n / m$ .

Given the recurrence relation above and the modular arithmetic, the maximum number of distinct values that can be generated by an LCG is $m$ , so we would need $m$ to be very large. The hope is that if the PRNG is well-designed, the sequence should hit every number from $0$ to $m-1$ before repeating. If a number is repeated before all numbers are seen, then the generator has a period in it that is shorter than the maximal period that is possible. Setting the values of $a$ , $c$ , and $m$ is a tricky business and can require some experimentation. In summary, don’t do this at home. As an (historical) example, the random number generator proposed by the book Numerical Recipes specified that $a = 1664525$ , $c = 1013904223$ , and $m = 2^{32}$ .

Perhaps the biggest problem with using the historical LCGs for generating random numbers is that their periods are too short, even if they manage to hit the maximal period. Given the scale of simulations being conducted today, even a period of $2^{32}$ would likely be too short to appear sufficiently random. Most analytical software systems have since moved on to other more sophisticated generators. For example, the default in R is the Mersenne-Twister, which has a long period of $2^{19937}-1$ .

Further notes:

The randomness of a pseudo-random sequence can be checked via statistical tests of uniformity, such as the Kolmogorov-Smirnoff test, Chi-square test, or the Marsaglia “die hard” tests.
Many PRNGs generate sequences that look random in one dimension but do not look random when embedded into higher dimensions. It is possible for PRNGs to generate numbers that lie on a higher-dimensional hyperplane that still look random in one dimension.