2.5 Why is not the Bayesian approach that popular?

At this stage, one might wonder why the Bayesian statistical framework is not the dominant inferential approach, despite its historical origin in 1763 (Thomas Bayes 1763), whereas the Frequentist statistical framework was largely developed in the early 20th century. The scientific debate over the Bayesian inferential approach lasted for 150 years, and this may be explained by some of the following factors.

One issue is the apparent subjectivity of the Bayesian approach, which runs counter to the strong conviction that science demands objectivity. Bayesian probability is considered a measure of degrees of belief, where the initial prior may be just a guess. This was not accepted as objective and rigorous science. Initial critics argued that Bayes was quantifying ignorance by assigning equal probabilities to all potential outcomes. As a consequence, prior distributions were dismissed (McGrayne 2011).

Bayes himself seemed not to have believed in his idea. Although it seems that Bayes made his breakthrough in the late 1740s, he did not submit it for publication to the Royal Society. It was his friend, Richard Price, another Presbyterian minister, who rediscovered Bayes’ idea, polished it, and published it.

However, it was Laplace who independently generalized Bayes’ theorem in 1781. Initially, he applied it to gambling problems and soon thereafter to astronomy, combining various sources of information to advance research in situations where data was scarce. He later sought to apply his discovery to finding the probability of causes, which he thought required large datasets, thus turning to demography. In this field, he had to perform large-scale calculations, leading to the development of Laplace’s approximation and the central limit theorem (P. Laplace 1812). Unfortunately, this came at the cost of abandoning his research on Bayesian inference.

Once Laplace passed away in 1827, Bayes’ rule disappeared from the scientific discourse for almost a century. In part, personal attacks against Laplace led to the rule being forgotten. Moreover, there was a prevailing belief that statistics should not address causation, and that the prior was too subjective to be compatible with science. Nonetheless, practitioners continued to use Bayes’ rule to solve problems in astronomy, communication, medicine, military affairs, and social issues with remarkable results.

Thus, the concept of degrees of belief to operationalize probability was abandoned in favor of scientific objectivity. Probability was then defined as the frequency with which an event occurs in many repeatable trials, which became the accepted norm. Critics of Laplace argued that these two concepts were diametrically opposed, although Laplace considered them to be basically equivalent when large sample sizes are involved (McGrayne 2011).

The era of Frequentists, or sampling theorists, began, led by Karl Pearson and his nemesis, Ronald Fisher. Both were brilliant and persuasive characters, opposing the inverse probability approach and making it nearly impossible to argue against their ideas. Pearson’s legacy was carried on by his son, Egon, and Egon’s friend, Jerzy Neyman. Both inherited the anti-Bayesian and anti-Fisher sentiments.

Despite the anti-Bayesian campaign among statisticians, some independent thinkers continued to develop Bayesian ideas, including Borel, Ramsey, and de Finetti, who were isolated in different countries: France, England, and Italy. However, the anti-Bayesian trio of Fisher, Neyman, and Egon Pearson dominated the spotlight in the 1920s and 1930s. Only a geophysicist, Harold Jeffreys, kept Bayesian inference alive during the 1930s and 1940s. Jeffreys was a quiet, reserved gentleman working in the astronomy department at Cambridge. He was Fisher’s friend due to their shared character, although they were intellectual opposites when it came to Bayesian inference, leading to intense intellectual battles. Unfortunately for the Bayesian approach, Jeffreys lost. His work was highly technical, using confusing high-level mathematics. He focused on inference from scientific evidence, rather than guiding future actions based on decision theory, which was crucial in that era for mathematical statistics, especially during the Second World War. In contrast, Fisher was a dominant figure, persuasive in public and a master of practical applications, with his techniques written in a popular style with minimal mathematics.

Nevertheless, Bayes’ rule achieved remarkable results in applied settings such as at AT&T and the U.S. Social Security system. Bayesian inference also played a significant role during the Second World War and the Cold War. Alan Turing used inverse probability at Bletchley Park to crack German messages encoded using the Enigma machine, which was employed by U-boats. Andrei Kolmogorov used Bayesian methods to improve firing tables for Russian artillery. Bernard Koopman applied it for searching targets at sea, and the RAND Corporation used it during the Cold War. Unfortunately, these Bayesian developments remained top secret for almost 40 years, keeping the contribution of inverse probability hidden from modern history.

In the 1950s and 1960s, three mathematicians led the resurgence of the Bayesian approach: Good, Savage, and Lindley. However, it seems that they were reluctant to apply their theories to real-world problems. Despite the fact that the Bayesian approach proved its worth in various areas such as business decisions, naval searches, and lung cancer detection, it was largely applied to simple models due to its mathematical complexity and requirement for large computations. However, some breakthroughs changed this.

First, hierarchical models were introduced by Lindley and Smith, where a complex model is decomposed into many smaller, easier-to-solve models. Second, Markov chain Monte Carlo (MCMC) methods were developed by Hastings in the 1970s (Hastings 1970) and the Geman brothers in the 1980s (Geman and Geman 1984). These methods were incorporated into the Bayesian inferential framework in the 1990s by Gelfand and Smith (A. E. Gelfand and Smith 1990), and Tierney (Tierney 1994), when desktop computers gained sufficient computational power to solve complex models. Since then, the Bayesian inferential framework has gained increasing popularity among both practitioners and scientists.

References

Bayes, Thomas. 1763. “LII. An Essay Towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, FRS Communicated by Mr. Price, in a Letter to John Canton, AMFR S.” Philosophical Transactions of the Royal Society of London, no. 53: 370–418.

Gelfand, A. E., and A. F. M. Smith. 1990. “Sampling-Based Approaches to Calculating Marginal Densities.” Journal of the American Statistical Association 85: 398–409.

Geman, S, and D. Geman. 1984. “Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6: 721–41.

Hastings, W. 1970. “Monte Carlo Sampling Methods Using Markov Chains and Their Application.” Biometrika 57: 97–109.

Laplace, P. 1812. Théorie Analytique Des Probabilités. Courcier.

McGrayne, Sharon Bertsch. 2011. The Theory That Would Not Die: How Bayes’ Rule Cracked the Enigma Code, Hunted down Russian Submarines, & Emerged Triumphant from Two Centuries of c. Yale University Press.

Tierney, Luke. 1994. “Markov Chains for Exploring Posterior Distributions.” The Annals of Statistics, 1701–28.