Scott Alexander writes in defense of probabilities without models. I denied the possibility of this before, also in the context of Scott’s steel-manning of Yudkowskyanism, but back then the focus of the discussion was slightly different. So this is a response to the new post, and if I wasn’t trying to revive this here blog it would probably be a comment. It’s not really intelligible without first reading the post I’m replying to.
I
For starts, what’s a probability model? Actually, while we’re at it, what’s a probability? Even more actually thats one step too far, because nobody has a general answer and perhaps there is none, though philosophers have lots of theories that seem wrong in different interesting ways. Modern mathematicians aren’t really bothered by this because they don’t care what things are as long as they know how they behave. (In bigger words this is called the axiomatic method.)
OK, so how do probabilities behave? Well they are numbers and they belong to events. What’s an event? Well, the mathematician doesn’t know, but here’s how they behave: 1. There is one event that is always true. 2. If something is an event, then “not that” is also an event. 3. If we have a bunch of events, then “all of them” is also an event. In bigger words, the events, whatever they are, form a σ-Algebra. Then the events get probabilities, and they behave like this: 1. All probabilities are all between 0 and 1 (where 1 is sometimes written as 100%). 2. If we have a bunch of incompatible events, then the probability for “any of those” is just all the individual probabilities added together. In bigger words, the probabilities are the values of a probability measure. As far as the abstract mathematician is concerned, that’s it.
In the real world, we don’t really need to know what a probability is either, as long as we can make sure that the things we are talking about obey these rules. Stereotypical example: We want to do probabilities with dice throws. We have a bunch of events, such as “1”, “6”, “an even number”, “either 2 or five” and so on. (64 events total if you are counting.) Then the events get probabilities. Often those will be \(\frac{1}{6}\) for each of “1”, “2”, “3”, “4”, “5” and “6”. But not always; perhaps we want to talk about loaded dice. But however we do it, the probability for “either 1 or 2” better be the summed probabilities of “1” and “2”.
In summary, when we use probabilities in the real world, we have some idea what the events are, we have probabilities of these events and these probabilities aren’t blatantly contradictory. This is – tada – called a probability model.
II
In practice we will also often need conditional probabilities, i.e. probabilities for something happening if something else happens first. For example, someone could have two different probabilities for being in a car accident, depending on whether they drive drunk or not. That doesn’t change the story much though, because by the Rev. T. Bayes’ theorem conditional probabilities are reducible to non-conditional ones. In the example, if we have probabilities for drunken and non-drunken accidents and a probability for drinking, then we can calculate the probability for an accident given drinking and non-drinking.
III
Coming back to the dice example, what if the dice, while still in motion, gets swallowed by a dog? What’s the probability of that? Well, the model didn’t account for that, so there is no answer. I silently assumed this to be a non-event and only events get probabilities.
OK, so maybe I should have used a better model including “the dog will eat it”, for a total of 65 events. That probability will be small, but not quite 0, because the whole point of including it is that canine dice engobblement is actually possible. But note that the probabilities for all possible numbers used to add up to 1. Now they will have to add up to 1 minus that small number. So in other words, if my model changes, then so do the probabilities.
Fine, but what if puritan space aliens destroy the dice with their anti-vice laser? If I want to account for that possibility, I’ll have to change my model again, and, in doing so, change my probabilities. And so on, every time I think of a new possibility the model changes and the probabilities change with it.
So what are the real probabilities? You might say those of the correct model. But then what’s the correct model? Well, if you need to bother with probabilities you’re not omniscient and if you’re not omniscient you can’t ever figure it out. Strictly speaking all the probabilities you’ll ever use are wrong.
IV
Can we get around this by just adding an event “all the possibilities I didn’t think of”? Not really. Remember if you add a new event you must assign new probabilities. And you can’t just do that by taking from all other events equally. This is easiest to see in the case of conditional probabilities.
Contrived but sufficiently insightful example: For me it would be slightly dangerous to drive without my glasses. For most other people it would be more or less dangerous to drive with my glasses. Now suppose the puritanical space aliens, figuring that driving more dangerously than necessary is also a vice, try to figure out the best way for humans to drive. Naturally they also study the effect of glasses. Unfortunately they didn’t quite understand that glass-wearers are compensating for bad eyes, so they calculate accident-probabilities with and without glasses and universalize those to the entire population. Maybe their sample has a lot of glass-needers who occasionally forget their glasses. Then they will conclude everybody needs to drive with glasses. Or maybe people who need glasses do always wear them. Then they will probably decide glass-wearing is actually dangerous, because glasses aren’t perfect and glass-needers still have more accidents. Either way, their probabilities for some specific person wearing glasses and being in an accident can be almost arbitrarily wrong. After they start zapping the wrong people with their anti-vice laser someone may tell them why some people wear glasses and others don’t and they will have to revise their probabilities.
But suppose they had wanted to account for possibly missing a possibility beforehand. They could have assigned, say, a 30% probability to “we are missing something very important”. Fine, but for that to help them at all, they also need probabilities for you crashing while you wear glasses and they miss something very important. And they can’t come up with that probability without knowing exactly what they are missing. In other words, you can’t just use catchall misspecification events in probability models.
V
Then why are probabilities so useful? Because often we can make assumptions that are good enough for a given purpose. In the dice example we are making calculations in a game and are fine with assuming that game will proceed orderly. So we get probabilities and we don’t care about them being meaningless if the assumption turns out to be false.
Similarly, if we design an airplane, we might start with probabilities for its various parts failing and then calculate a probability for the whole thing falling down. This doesn’t tell us anything about planes crashing because of drunk pilots, but that’s not the question the model was made for. Actually, aviation has a few decades of experience with improving the models whenever a plane crashes and then adding regulations to make that very improbable. So nowadays commercial planes basically only crash for new reasons. There remain situations where the crash probabilities are meaningless, but they are still extremely useful in their proper context.
Similarly, an investment banker assumes that everything that could possibly happen has already happened in the last twenty years, and, well in that case it turns out it wasn’t such a great idea.
VI
Now that might sound fine in theory, but hasn’t Scott given practical examples of probabilities without models? Darn right he hasn’t. To see that, let me nitpick the alleged examples to explain why they don’t count.
In the president and aliens example, Scott himself considers the possibility that the involved probabilities are
only the most coarse-grained ones, along the lines of “some reasonable chance aliens will attack, no reasonable chance they will want bananas.” Where “reasonable chance” can mean anything from 1% to 99%, and “no reasonable chance” means something less than that.
But actual probabilities aren’t coarse-grained. In probabilistic reasoning you get to be uncertain, but you don’t get to be uncertain about how uncertain you are. All those theorems about Bayesian reasoning being the very bestest conceivable method of reasoning evar presume probabilities to be real numbers i.e. not coarse-grained. In other words, these coarse-grained probabilities are called so only by analogy. They don’t make for examples of real probabilities any more than my printer’s device driver is an example of vehicular locomotion.
At this point my mental model of Scott protests thusly: “Hey, I didn’t admit that! I mentioned it as something a doubter might say, but actually the president should be using real probabilities” (No actual Scott consulted, so my mental model may or may not be mental in more ways than intended.) Fine, but then the example doesn’t work anymore. The president can make his decisions without knowing anything about probability theory. He is making judgments about some things being more likely than others but not attaching numbers to it. In fact he could be more innumerate than a lawyer and it wouldn’t affect his decisions one bit. If we want to make it about real probabilities the example simply doesn’t show anything about their necessity.
Concerning the research funding agency, first of all I’ll question the hypothetical. It’s hard to imagine a proposal for a research project that has a \(\frac{1}{1000}\) chance of success given the best information. The starry-eyed idealists surely don’t think so. So if there is actual disagreement there will be reasons for that disagreement, and if the decision is important the correct answer is examining those reasons, i.e. improving the models. This is actually a large part of why real funding agencies rely on peer review. Using probabilities here is somewhat like the global warming example in my above-mentioned prior post on a similar subject, where the entire point is that the probability relates to a model I wouldn’t want to use for real decisions.
But perhaps all competent reviewers got killed in a fire at their last conference, so let’s skip that objection for least convenient possible world reasons. Also, Scott notes similar decisions can and often should be made informally, in which case it’s not real probabilities, exactly like in the president and aliens example.
Let’s advance to the interesting point though. Scott says:
But refusing to frame choices in terms of probabilities also takes away a lot of your options. If you use probabilities, you can check your accuracy – the foundation director might notice that of a thousand projects she had estimated as having 1/1000 probabilities, actually about 20 succeeded, meaning that she’s overconfident. You can do other things. You can compare people’s success rates. You can do arithmetic on them (“if both these projects have 1/1000 probability, what is the chance they both succeed simultaneously?”), you can open prediction markets about them.
I’ll start with the accuracy checking use. If you’re guessing only once the probability doesn’t help at all. You’ll either be right or wrong, but you’re not getting a correct probability to compare against. That’s why the foundation director makes the check for a 1000 similar projects. If she has a 1000 similar projects that’s a strategy I can endorse. But at that point she has informally established a probability model. She has established events, namely combinations of succeeding and failing projects. This rules out any unforeseen possibilities, like the agency’s funding being slashed next year, nuclear war, the communist world revolution, and raptures both Christian and nerdy. That’s fine, because it’s not the kind of circumstances these probabilities are meant to think about. Furthermore, she has established probabilities of the events by (perhaps implicitly) assuming that the individual projects are equivalent for success-probability purposes and statistically independent, so they won’t all succeed or fail together. As long as she wants to reason within these assumptions, I’m fine with her doing so probabilistically. But notice that the probabilities get totally useless as soon as these assumptions fail. For example, there may be a new idea about what the natural laws might be and 200 proposals to exploit it. Those proposals will fail or succeed together and thus not be any help in predicting the rest. Or the next batch of proposals may be about ideas she knows more or less about, so they can’t be lumped with the old ones for accuracy judgment.
In summary, the probabilities are very useful as long as the implicit assumptions hold and totally worthless when they don’t. Also note, that the probability calculations are the boring part of the judgment process. All the interesting stuff happens in deciding which projects are comparable, which is the non-probabilistic part of the thought process.
It’s very similar with comparing people’s success rates. Here the implicit model is that people have a given success rate and those are independent. That’s often fine, but as always it breaks down if the (implicit) modeling assumptions stop to apply. For example, people may be experts on some things and not others and then their success rates will depend on what the problems of the day are. They may also listen to each other and come to a consensus or, worse, two consensus neatly aligned with political camps. Those are situations where the probabilities won’t help much.
The prediction market is slightly different: Here we are assuming someone else has a better model than we do and is willing to bet money on it. This is indeed a case where we use probabilities in a model we don’t know. Still, someone else does and our probabilities won’t be any more correct than that model.
Bottom line: All examples of probabilities being really useful are also examples of (at least implicitly) established models. And then the usefulness extends exactly as far as the modeling assumptions.
VII
Scott actually came up with this in a context: estimating the probability of Yudkowskyan eschatology.
Personally, I find this a lot less interesting than the part about probabilities without models. But still I’ll comment on it very briefly: I think in that context probability talk is a distraction from the real question. The real question is if we should worry about those future scenarios or not.
It can be clouded in discussions about whole categories of models. For example, I would be inclined to reason about Yudkowsky the way Scott reasons about Jesus: There are 7.3 billion people in the world and world-saving-wise most of them don’t seem to be at a disadvantage compared to Eliezer. So models assigning chances much above \(\small 1.3\times 10^{-10} \) are implausible. Scott would probably reply that Eliezer is more likely to save the world than other people, and that could lead to a very long argument if either of us had the time and nerve to follow it through.
But notice that that argument wouldn’t be about probabilities at all. The arguments would be about what kind of scenarios should be considered and compared with what, and the potential conclusions would be about what we should do with our money. Also, probabilities wouldn’t help with accuracy checking or comparing us, because we’re talking about a one-time event. There is nothing the number would add except an illusion of doing math.