Taking Bayesianism Seriously

And thereby proving it wrong

Cedric Warny
15 min readAug 18, 2023

There are two main competing explanations of how knowledge is created: Bayesianism, and Popperianism. In this post, I set out to evaluate both explanations, and show that one of them is a bad explanation, by taking its tenets seriously. First, I present the two competing explanations of knowledge creation. Then, I argue the various ways in which Bayesian epistemology is misguided. I conclude that there is no such thing as a real-life Bayesian. This is an opinionated post, and I make no pretense of impartiality.

Popperian epistemology

Popper states that knowledge creation always starts with a problem. A problem consists of evidence that can be accounted for by two or more competing explanations (usually just two). Solving such a problem consists in ruling out all but one explanation. There are two ways to do that: (1) via criticism, or (2) via a crucial experimental test. The one remaining (“best”) explanation then becomes part of knowledge. Such knowledge is said to be objective, i.e. it exists independently of a knower. For instance, it can be embodied in technology, or as symbols in a textbook. The technology or the textbook could be deciphered by aliens. Or it could be put to productive use, effecting real-world transformations, independently of whoever or whatever produced that knowledge.

Option 1 for problem-solving (criticism) is non-experimental. It’s about taking an explanation seriously and determining whether it is a good explanation. There is no definitive theory of what a good explanation is, merely criteria, such as falsifiability, self-consistency, and whether or not it follows the rules of logic (which, by the way, are themselves subservient to a “theory of logic”, since the rules of logic, far from being God-given, are merely human conjectures, subject to falsification like any conjecture). One additional criterion, proposed by David Deutsch, is that a good explanation is hard to vary (I won’t dwell on what that means exactly).

Option 2 for problem-solving (testing) is about devising a test such that we can decisively rule out one of the competing explanations. In fact, that is the only point of experimentation. In particular, Popper argues that the point of experimentation is never to confirm an explanation — only to reject one. It also follows from this that truth is inaccessible, since no amount of evidence can make a theory “more true”. That is not the point of evidence. The role of science is not to weigh theories with evidence.

A discipline that allows for the second method of solving problems (testing) is a scientific discipline. A discipline that only allows for the first method of solving problems (criticism) may not be scientific, but it is rational. A good example of a rational, yet unscientific, discipline, is mathematics, because it does not allow for empirical testing.

We’ve seen how we rule out explanations. But how do we come up with them in the first place? Through a creative process of conjecture. It’s a mysterious process that no one so far really understands, and that includes Popper. But he does explain what it is not: in particular, conjectures are not derived from observations. Rather than theories being derived from observations, Popper says that observations are interpreted through the lens of pre-existing theories. Theory always precedes observation. As Popper would say: observations are theory-laden. No amount of evidence can make a theory more true because the theory precedes evidence, and is not derived from it. It can only be ruled out by it. In other words, induction is impossible. It’s just not a thing! Russel’s chicken cannot know, purely from past observations, whether the farmer feeds him out of love or out of hunger.

Chicken, turkey, same difference (source)

One might be alarmed at the idea that theories cannot be “justified” (in the sense of being justified as true). Wouldn’t that preclude rational decision-making? How should we guide our actions? Don’t we need to act according to what is true? How do we make predictions if we can’t justify the theory we use to make those predictions? Isn’t there an infinity of explanations that can fit the evidence? This is known as “the problem of induction”.

The kind of questions raised by people who consider the impossibility of induction to be a problem betrays their concern for holding justified true beliefs. Those questions also highlight that a concern for justification and a concern for prediction are two sides of the same coin. And just as Popper argues that epistemology is not about justifying beliefs, he argues that it is not about prediction. For Popper, a theory/explanation is never justified. However, the use of a theory can be “justified” only in the sense that it is the only one that has not been ruled out by evidence. Not in the sense that it is the most probable. To Popperians, the impossibility of induction is not a problem. It’s irrelevant. Popperian epistemology seeks explanations, not predictions. Predictions are merely happy side effects when an explanation has reach.

Bayesian epistemology

To introduce Bayesianism, I’ll refer to what’s usually considered the Bayesian Mecca, i.e. the LessWrong website. And in this blogpost, LessWrong presents the three core tenets of Bayesianism:

Core tenet 1: Any given observation has many different possible causes.

Core tenet 2: How we interpret any event, and the new information we get from anything, depends on information we already had.

Core tenet 3: We can use the concept of probability to measure our subjective belief in something. Furthermore, we can apply the mathematical laws regarding probability to choosing between different beliefs. If we want our beliefs to be correct, we must do so.

Tenet 1 underscores the first major difference between Popperianism and Bayesianism. While Popper states that an observation can only have a single explanation, Bayesians claim that an observation has many possible explanations. That said, it’s unclear whether or not Tenet 1 is denying that there’s only one true cause or not. Maybe what a Bayesian calls “multiple possible causes” corresponds to a Popperian’s “competing explanations”. Tenet 1 simply doesn’t specify whether the different possible causes eventually turn into a single actual cause. But Tenet 3 refers to “mathematical laws” ruling the probability of these causes. And those mathematical laws (of which Bayes’ rule is central) do not offer a mechanism to set a probability to zero. Therefore, we have to conclude that, in Bayesian epistemology, an observation always has many possible causes. Since an infinity of causes/theories can fit a given set of observations, if we don’t have a mechanism to rule out causes, the “problem of induction” is exacerbated. (In this paragraph, I have taken the word “cause” to be synonymous with “explanation”. This might be confusing to some who regard multi-causality as an obvious fact of life. I don’t disagree, we’re just using the word differently. So when I say “single cause” I really mean “non-disjoint causal graph”, where the causal graph is the “explanation”.)

David Hume was the first to lose sleep over the problem of induction.

Now, having said that, most flesh-and-blood Bayesians will recognize that ruling out explanations is indeed possible and useful. But, within the framework of Bayesianism, this is a deus ex machina! Nothing allows for it. (The idea that real-life Bayesians routinely violate Bayesianism, hinting at the idea that there is no such thing as an actual Bayesian, will be a recurring theme in this post.)

Tenet 2 at first glance seems to correspond to Popper’s notion that “observation is theory-laden”. However, it’s unclear what this prior information is exactly. In Popper, such prior information is simply the best available explanation. But for a Bayesian, for which there are always many possible causes (i.e. explanations), what “prior information” refers to is much less clear.

Tenet 3, in my opinion, is the key one. According to that tenet, probabilities capture subjective degrees of belief across multiple possible causes. First, this contrasts with the Popperian notion that knowledge is objective (it is objectively instantiated information, capable of self-perpetuation independently of a knower). Second, this tenet reveals the centrality of justification in Bayesian epistemology. The main point of the epistemology is to determine what belief is most probable, and as such, justified. Bayes’ rule tells us how to update one’s degree of belief in some theory upon witnessing new evidence. Those mathematical laws are basically the “laws of justification”. While Bayesians may recognize that theories are not derived from observations, they still seek justification, or confirmation, of theories. For them, the impossibility of induction is very much a problem.

Structurelessness

Now none of those tenets explain what is a good or bad theory. In fact, the blog post argues that the Newtonian theory of gravity, and a theory “simply stating that the Flying Spaghetti Monster pushes the planets forwards with His Noodly Appendage”, are only set apart by “prior knowledge”, but it doesn’t explain what “prior knowledge” is relevant in choosing between those two theories that make equal predictions.

The post generalizes this question by distinguishing between “mundane explanations” and “supernatural explanations” for the same phenomenon, and arguing that the former should generally be preferred over the latter because “the mundane causes already have lots of evidence in their favor and supernatural causes have none”. So it seems like what is meant by “prior knowledge”, then, is simply “amount of evidence so far”. But if the theory of gravity and the Spaghetti Monster make the same predictions (which is what the blog post assumes), then they have the same amount of evidence thus far. It’s a circular argument! Just as Bayesianism has to resort to a deus ex machina to set probabilities to zero, it has to resort to another deus ex machina to reject the Flying Spaghetti Monster hypothesis. After all, it is backed by as much evidence as the Newtonian theory of gravity!

The Flying Spaghetti Monster (source)

By the time of the Eddington experiment, Newton’s theory of gravity had reigned supreme for centuries, “confirmed” by countless observations. Surely the strength of its “prior” must’ve been immense. How would a Bayesian “update” after Eddington’s experiment? What numbers do you plug into Bayes’ update rule? This highlights that Bayes’ rule, which is obviously central in Bayesianism, is very often totally impractical. Most examples of actual usage of Bayes’ rule are rather contrived, or involve completely made up probabilities. Outside of a few domains such as medicine, probabilities aren’t really available, and so they tend to be made up to give the appearance of rigor (i.e. scientism). I won’t dwell too much on the impracticality of this foundational theorem. In this post, I’m more interested in the tacit influence of Bayesian epistemology on people’s worldview.

The structurelessness of Bayesian epistemology comes from not having inherent criteria to evaluate a theory besides base rates and replication rates. In other words, it is lacking the all-important process of criticism, which figures prominently in Popper’s epistemology. As David Deutsch puts it, Bayesianism has a “structureless conception of how theories can fail”. This is also why the impossibility of induction is such a problem in Bayesianism: no matter how much evidence one accumulates, one can always come up with an infinite number of explanations for that evidence. Popperians do not have that problem because they have this additional tool of criticism, which allows them to reject theories without any reference to evidence, simply by taking the theory seriously (this is the “structurefulness” of Popperian epistemology). (Again, this argument is a result of taking Bayesian tenets seriously; I suspect many real-life self-styled Bayesians would push back against this account, but my claim is that they have no basis in their own principles for such a pushback.)

Without a process of criticism (the Spaghetti Monster was not criticized for being a bad explanation!), Bayesian epistemology is left to be a mostly mechanical process of updating a probability distribution over beliefs. In passing, this is also why there tends to be an affinity between Bayesians and the current AI paradigm of inductive learning. Bayesians tend to see intelligence essentially as curve-fitting (see Gwern on the scaling hypothesis). This tends to reduce rationality to compute power, and emphasizes the role of “hardware” (speed and bandwidth) in intelligence, rather than the role of “software”, which is independent of hardware specs. Popperians, in contrast, especially since the contributions of David Deutsch, tend to de-emphasize the role of hardware and emphasize that of software, by arguing that a brain is a universal computer, and that, as such, all brains, while they may differ in speed and bandwidth, are equally capable in terms of the software they can run. Given Bayesians’ emphasis on hardware, it’s no surprise that many of them focus on differences in compute power (IQ) as significant and problematic. In contrast, Popperians typically focus on differences in interests and passions, and argue that, in the fullness of time, hardware differences are a rounding error relative to the universally capable software that that hardware supports.

Prophecy

As we saw earlier, Bayesianism’s emphasis on justification naturally leads to a focus on a theory’s prediction ability, usually correlated with its degree of certainty, itself derived from repeatedly applying Bayesian updates. And the mechanical aspect of those updates naturally further lends itself to prediction.

In addition to this focus on prediction, Bayesianism has a tendency to assume a static set of causes/explanations/theories. I’ll call this the fixed knowledge assumption. Nothing in the Bayesian tenets seems to condemn a Bayesian to a fixed set of choices for the causes of some phenomenon, so why do I make that claim? The centrality of belief updates places a mechanical process, rather than a creative process, at the heart of the epistemology. And in the course of re-computing a probability distribution, the things the probabilities are distributed over tend to remain fixed. The common exhortation amongst Bayesians to “bet on one’s beliefs” not only highlights the focus on prediction, it also underlines this staticity of choices. The emphasis isn’t on going out and actively solving a problem by ruling out some of the “choices”; instead, one places bets over that set of choices. One attitude tends to be more dynamic, while the other tends to be more static.

When Bayesians want to illustrate why belief updates are useful, they typically use examples from games of chance, or examples from medicine (the latter being more relevant to real life). Those are the typical examples because they are some of the few domains where probabilities are readily available. However, what Bayesians don’t seem to realize is that even the more realistic examples from medicine actually illustrate an unrealistically static mindset: you suspect a condition, which has some prevalence, order some test, and interpret the test results knowing the test’s sensitivity and specificity; you dutifully run the math to get your posterior, and now you’re ready to “bet”? Real life is never that way. You discuss at length, seek additional opinions, schedule further tests, etc., to either rule out some explanations, or create new ones. Again, there is no such thing as a real-life Bayesian. Bayes’ theorem is a helpful logical step in an explanation, but it cannot be the basis of an epistemology.

Trolleyology, game theory, multipolar traps (Moloch), impossibility theorems — all these things, typically trendy in rationalist circles, when expanded into whole philosophies, acquire the same root flaw: the fixed knowledge assumption. I’m not claiming this assumption is an explicit component of Bayesianism, or even a necessary corollary, only that the epistemology subtly tends to bias toward that mindset. The emphasis on prediction, combined with this fixed knowledge assumption, leads one to prophecy: the tendency to make predictions beyond the time horizon where new knowledge would have had a material impact on the prediction.

Knowledge transforms the world. People are knowledge-creation machines. By definition, new knowledge is impossible to predict. Since the Enlightenment began, the world has been on a path of continual radical transformation. The list is long of people who made grand proclamations about the far future, only to be humbled by reality: from Thomas Malthus predicting mass starvation in the late 18th century, to Karl Marx predicting the inevitability of a proletarian uprising, to physicists prognosticating the near end of physics in the late 19th century, to the Simon-Erlich bet, to the recurrent theme of Peak Oil, etc. In every case, it is new knowledge that ended up invalidating the prophecies.

While deriding those Cassandras has become a tired meme, few point out that the root cause of these prognostication failures is unforeseeable knowledge creation. That is what makes them prophecies, and not simply predictions. In all of these examples, the forecasting horizon was way past the point where new knowledge could completely upend the forecast. There’s always a forecasting horizon for anything. We cannot even predict that the Sun will indeed become a nova, which would seem like the easiest thing to forecast. There’s plenty of time for new knowledge to come online that would upend that prediction. Who would have predicted a couple of centuries ago that planet Earth would be able to repulse asteroids rather than attract them? As Andrej Karpathy said, “it looks like if you bombard Earth with photons for a while, it can emit a Roadster”. In the fullness of time, physics becomes primarily the study of human choices.

Physics says: shoot photons at Earth, and you shall get a Tesla (source)

Therefore, if your epistemology leads one to adopt a fixed knowledge assumption, it’s no wonder that one will tend to prophesy. Bayesians famously do this, with their focus on all sorts of existential risks (see this book for instance, dear to many rationalists, meticulously cataloging our paths to extinction, from climate change to engineered pathogens, to artificial intelligence, and many more). Many Bayesian rationalists are notoriously hysterical about AI risk. I’m not interested here in refuting any specific prophecy, only in highlighting a deep affinity between Bayesianism and prophecy. I’ll just say that the more long-termist the view, the more biased the forecast will be toward negative prophecy (because the greater the underestimation of new knowledge), therefore the more likely to advocate for radical policies today that will actually harm the future.

In reality, the only viable long-term policy for survival is to maximize knowledge production — that’s it. Make sure we have the general tooling ready for when the presently unknowable, actual future problems come up. For an illustration of how that way of thinking can be applied to the case of the future of AI, see this post.

Probabilism

I now want to go back to Core Tenet 3 and its definition of probability as subjective belief. At first glance, this seems to imply that Bayesians are ready to accept that probabilities are not necessarily an objective fact of reality, just something in their heads. But is that self-consistent?

While many Bayesians will readily (wrongly) argue that the universe is objectively probabilistic (“God plays dice!”), let’s focus on the more interesting case of a Bayesian who recognizes that real-world causes are not “probabilistic” but either are or aren’t; it’s just that he can’t know for sure, and so, subjectively, he has to keep a probability distribution in his head. But, in our discussion of Core Tenet 1, we’ve seen how Bayesianism doesn’t allow one to drop a cause from one’s belief updates by setting its probability to zero, thereby concluding that one necessarily always carries around in one’s head an infinity of causes for everything. Yet Bayesians go around claiming that rationality is all about having “correspondence” between what’s in one’s mind and what’s out there. Therefore, a Bayesian who claims that probabilities exist subjectively, but not objectively, is a contradiction. A self-consistent Bayesian must necessarily conclude that probabilities objectively exist. Yet this contradicts our deepest scientific explanation of the world, namely quantum physics, and in particular the Everettian interpretation of quantum physics, which explains that the world is emphatically not probabilistic in any way.

Probabilities do not exist objectively. We live in a multiverse where everything that can happen happens, via a process of differentiation into quasi-parallel universes, modulo the phenomenon of interference, where parallel universes “re-join” (a phenomenon that quantum computing takes skillful advantage of). But this is definitely outside the scope of this post, as well as largely outside my skill set. Time for a conclusion.

There’s no such thing as a Bayesian

As I’ve argued throughout this post, a self-consistent Bayesian is a logical impossibility. He insists on using the impossible process of induction. He carries around in his head an infinite set of causes for everything. He is obsessed with predictions beyond any meaningful forecasting horizon. He views probabilities as objective, while physics tells us they aren’t.

“That Bayesian out there is NOT REAL” (source) (Yes, Distinguished Reader of the faraway future, this image was a popular meme of the good old summer of 2023)

In reality, whatever Bayesians actually do in real-life that actually works, it’s probably because they are actually acting like Popperians (also known as “critical rationalists”). One may call oneself Bayesian, but, when push comes to shove, one actually seeks explanations that survive criticism, not a “high posterior probability”. No one centers their real-life decision-making on probability calculus. Scientists don’t belief-update their way to a new law of nature; they face problems raised by evidence, conjecture explanations, and rule them all out but one. That is a thing actual people actually do. What they do not: plug in probabilities in a mechanical rule.

--

--