It turns out that Scott Alexander is even smarter than I thought. This is somewhat disappointing. Perhaps I should slow down on explaining that?

The proof of his smartness is of course in agreeing with me. On his blog he has a defense of handwavy utilitarianism as a false but still useful heuristic. His conclusion:

It’s not that I think it will work. It’s that I think it will fail to work in a different way than our naive opinions fail to work, and we might learn something from it.

You still need to read the whole thing or else what follows won't make much sense.

Well I agree with his main point and that is sad because now who will live up to my stereotype?

More specifically, if I met a CFAR employee in real life and asked them to pitch their version of rationality to me, I could easily imagine his dialog happening with me in the role of the student. And I would be that stubborn because giving them a number - any number - would seem like a concession that there *is* a correct if unknown number and only one such number and I don't believe that because I'm a frequentist. Let me explain that in a more long-winded way.

CFAR has a page on what (they think) is rationality and I think it's simply wrong. According to them

For a cognitive scientist, “rationality” is defined in terms of what a perfect reasoner would look like.[...]its beliefs would obey the laws of logic[...] its degrees of belief would obey the laws of probability theory[...] its choices would obey the laws of rational choice theory.

(and they clearly want to use the word like that cognitive scientist)

According to their claims, the reason we aren't perfect reasoners is basically because we run on sucky hardware.

And that actually is *one* reason, but another reason is simply that a perfect reasoner is a mathematical model abstracting from many of the most interesting problems actual reasoning deals with.

For starts, a perfect reasoner lives in a probability space and all its reasoning is about the events of that probability space. It can be mistaken, but, by definition, it can't fail to at least consider the correct answer. And then it has consistent if possibly wrong probabilities for every one of those possibilities.

A perfect reasoner never needs to ask what probability space a probability lives on, because, by definition, every probability it encounters lives on the same one. The lack of that convenience is a problem for any real reasoner, because *you can't have a probability without a probability space*. Probabilities without probability spaces are quite simply meaningless, exactly like asking if a crocodile is greener than long.

Fortunately we have a practical solution to that problem. We just *define* a probability space and then we come up with some rules of which real world circumstances we will map to which events of that probability space. In other words we make models. Or, again in other words, we *assume* for the purpose of a particular probability argument that the world is structured in a certain way. And there is nothing wrong with that as long as we remember two things: First, outside of the model the probability is not even wrong it is simply meaningless. And second, the model is not true, it is at best useful.

The most clear cut example of probability models being useful is for random events that can be arbitrarily repeated. The classical examples are throwing dices, flipping coins, etc. but also e.g. repeated error-prone measurement of a physical property. In those cases probabilities will correspond to frequencies in the long run. There are still some imposed assumptions in such models. For example, in the real world a thrown dice could fly into the fire and burn rather than producing a number from one to six and the probability model doesn't account for that. But still, the numbers have a very clear meaning in that kind of model. If I say there's a 50% chance of the coin coming up heads, I know what those words mean. I'm not radical enough to call those the *only* kind of good probabilities (let's call that *radical* frequentism) but I do think it's the prototype and other probabilities are probabilities by analogy to this kind.

The next best kind is if I have non-repeatable events that are still similar enough for me to make good guesses by forcing them into a model that treats them as identical and thus repeatable. This is e.g. what an insurance does. Of course there aren't thousands of identical houses occupied by identical residents but treating many different houses and residents as identical is close enough to mostly work. Sometimes I can do the same. For example, I don't get a few dozen identical worlds to see in how many of them a prediction turns out true, but I can come up with categories of predictions I am about equally sure about and then see how often I'm right on that kind of prediction. If I am reasonably sure about my categorization I'm fairly comfortable doing that. A popular way to formalize this kind of thing is betting odds and *in some situations* that works fairly well. To be honest, I probably would be willing to grant a probability for my computer breaking down under this rubric, *as long as that interpretation is understood*. So if I say there is a 0.6% chance of my Computer breaking down this month, I know what those words mean.

This approach gets gradually worse as my similarity judgments get less confident or relevant. For example, what's the probability that human emissions of greenhouse gases play a large roll in global warming? A few years ago I looked into what I would have to read before I would feel confident actually arguing on that question and it turned out it would take about two months of full time work before I would think myself worth listening to. Given that I don't have any power or influence, I decided my having a correct opinion on this question isn't important enough to justify that investment. This doesn't quite prevent me from categorizing and assigning a probability. I know that that question is studied in comparatively hard sciences and that organizations aggregating the opinions of the scientists involved report the answer to probably be yes. I also know that the area of research is fairly young, that there are some dissenting voices and that the question is politically charged on both sides of the question. *Ex ante et ex ano,* I'd guess that the majority will be right about two out of three situations like this one. So technically I can assign a probability of about 67% to human emissions of greenhouse gases playing a large roll in global warming. But this probability is far less useful than the ones I talked of before. Because if I ever needed to make an important decision this question was relevant to, the correct approach would *not* be to plug 2/3 into any calculation it would be to do my homework. I wouldn't even offer bets on this kind of probability because that would just invite better informed people to take my money. Given that probability models are justified by being useful that is quite an indictment. Still there is some connection to reality here. In fifty years everyone will know the answer and then I can see if the scientists were actually right on two out of three such questions. So if I say there is a 67% chance of them being right, I know what those words mean.

OK, let's add another straw. According to present science, the law of gravitation is . In principle there could be another small term, so maybe it really is . If wasn't too large, our measurements wouldn't pick up on this and science wouldn't know about it. I don't believe this to be true because of Friar Occam's razor, but I do think it's possible. So if it's possible, what is the probability of it being true? If you shrug that is exactly the right answer, because if you told me that chance was , I seriously, honestly *wouldn't* know what those words mean. This is a different kind of not knowing, because my uncertainty about models is so much more important than my uncertainty in models. With global warming my model sucked, but I at least knew which sucky model I was reasoning in. Here I don't even know that.

Or what's the chance of murder being evil? I'm quite sure it is evil and I do think that is a question about objective reality rather than mere preference, but I'm also quite sure it is not an empirical question. There is just no way to tie this up with any kind of prediction so talking about probabilities in this context is simply a category error.

Some people try to get around this by just pretending they can do probability without models. This strategy could be called by names like hybris, superstition, or delusion, but the most commonly accepted euphemism is *Bayesianism*. (Or maybe *radical* Bayesianism because there are some people who mysteriously call themselves Bayesians despite knowing this.) My reaction to this is similar to what Bertrand Russel said in another context:

The method of "postulating" what we want has many advantages; they are the same as the advantages of theft over honest toil.

Of course in the real world communication is contextual and often people talk about probabilities with the model being understood implicitly. And that is fine as long as it isn't the question at hand, but if I'm talking Bayesianism with a Bayesian, giving them a belief number they will interpret in a model-free way is just letting them beg the question. So if I was the student in Scott's dialog I wouldn't let Ms Galef put any belief number on the computer breaking down either, even if she acknowledged that number was a guess, unless she also acknowledged even the right number relating to a specific model and possibly being different in other models. It's not just that this number may be wrong, it's that there may be no such thing as the right number. And behold Scott getting this, in his comments he even talks about an example of "weird failure[s] of math to model the domain space like Pascal’s Mugging".

So much for probability theory, now on to the same complaint for rational choice theory. The rational agent of rational choice theory is basically a simplified consequentialist. It has ranges of possible consequences and preferences among them and ranges of possible actions and probability distributions for what the actions might lead to. Then it acts in a way it expects to result in the consequences it likes best. Depending on what additional simplifications we are willing to make, the math of those choices can sometimes be made fairly simple. Of course the probability distribution part incorporates all the problems I already mentioned, so consider them repeated. In addition to that, the rational agent can't consider actions themselves, only their consequences. For example, its choices would be obviously wrong on at least one of the trolley and the fat man problems. In the mathematically most simple versions it also can't have lexical preferences. Sometimes this model is useful but on other occasions it fails not only as a description of how people do work but also as a description of how people should work. So in addition to all the problems of the probability theory part using rational choice theory in the definition of rationality also assumes rather than arguing for a whole lot of ethical assumptions I disagree with.

Finally, let's consider utilitarianism. This is actually not something CFAR seems to talk about, but it clearly is a major defining belief of the cognitive culture both Scott and CFAR hail from. Basically the idea here is that the preferences all agents have can somehow be aggregated into one preference and then morality consists in acting so as to get the best result according to those aggregated preferences. As a fundamental account of ethics that is pretty much hopeless because we know such an aggregation is mathematically impossible. And unlike the rest of the Less Wrong collective, Scott is actually aware of that fact. So now he basically says that even if it is not a correct model it is still often a useful model.

So looking at it we still have some differences (for example while he no longer seems like an orthodox utilitarian, I think he still is an orthodox consequentialist), but on the big picture our attitudes to mathematical decision making now mainly differs in emphasis. I say those models may be very useful, *but they are still wrong* and he says those models may be wrong, *but they are still useful*. The still useful part may be captured in his assertion that "imperfect information still beats zero information", and that is quite true as long as long as we remember the complement that I might formulate as "but zero information still beats false information".

The disappointing part of this comes from me having assigned Scott the role of the intelligent champion for the Yudkowskian/MIRI/Less Wrong/CFAR world-view. From my viewpoint the *entire point* of that philosophy is believing that those models are not only useful but correct. They actually want to build a perfect reasoner *and expect it to be a perfect reasoner in the non-technical sense of the word*. And when they don't call it an AI they call it a Bayesian superintelligence because they actually expect it to work by manipulating Bayesian probabilities all the way down. (Which is presumably why they don't call it a polyheuristic superintelligence.) And they talk of programming that machine to maximize "the coherent extrapolated volition of humankind" *as if those words had a meaning*. Actually that's my central criticism of Yudkowskyanism: Basically all its teachings are based on this one mistake of taking a vaguely mathematical or at least math-sounding model vaguely appropriate for some problem domain, absolutizing it and then running far far beyond the problem domain it was vaguely appropriate for.

Of curse if your role is defending that philosophy, casually converting from Bayesianism and fundamental utilitarianism to moderate frequentism and some weird kind of contractualism even while noting the old views are still great heuristics is just a failure to stay in character.

Of course more seriously I do realize Scott isn't defined by the roles I privately assign him and I do think it is better for him to be less wrong than before and I'll just have to reassign him as the champion of his own syncretic philosophy. But now I have an unpaid vacant position for the champion of Yudkowskyanism, for Scott is no longer qualified. So there.

Are you saying that CFAR people are wrong because they fail to believe in a rational philosophy of reality so end up reducing rationality down to a probability based pragmatism?

No.

Thanks for the nice words.

I think Eliezer, Luke, Julia, and other people you think of as your foils would agree with me (and with you) here. In fact, I had an interesting talk yesterday with Luke in which he pointed out several problems of population ethics I wasn't aware of and we discussed the various ways in which a system trying to use population ethics could fudge around these problems. Reading CFAR/MIRI promotional literature might not be the best way to get a nuanced picture of exactly how the writers think.

I've been trying to figure out where exactly our disagreement lies ever since your post on cluster-structures in thingspace, where I got the impression that you and I felt exactly the same way about them but you added "and therefore, this view is wrong" and I added "and therefore, this view is right".

The closest I'm able to come is that you expect philosophy to produce beautiful elegant answers with a perfect one-to-one correspondence to real features of the world, and I expect philosophy to produce a series of kludgy models which can be gradually refined and which might at some point be able to predict the real world pretty accurately given infinite computational power and a lot of other functions it can call.

So when CFAR or someone fails to add "...and this is a kludgy model that only approximates the real world, but it's still better than nothing", that's because

everythingaccessible to humans is only a kludgy model that only approximates the real world but is still better than nothing, and by Gricean implicature you don't have to add that phrase after every single utterance.(Am I modeling our disagreement correctly, or is this not how you're thinking at all?)

And if even the brain is also only a kludgy mathematical model of reality (as the computational theory of intelligence claims), then it should be possible to get these models to be as good as human intelligence and probably better. And if you can get information-gathering, decision-making, morality-using models at least as good as human intelligence onto a computer, you've finished the hard part of creating benevolent superintelligence.

[EDIT:I would add that although that post about parliamentary consequentialism you linked to was in fact a real change of heart, the post about hand-wavy statistics is something I think (though of course it's hard to remember accurately) I could have written and believed any time in the past few years. And one of the mathematicians working with MIRI told me people at MIRI had already been considering (I don't know if believing is the appropriate word here) the parliamentary model for several years before I came around to it.]Actually I'm fine with models being kludgy but better than nothing, but then they are better than nothing for particular purposes, and often we can think about what those purposes are. Then if we come to areas where they are not better than nothing we can be honest about our ignorance.

I'll probably need a follow-up post to make that clearer, hopefully not quite as long as this one.

As for your last (pre-edit) paragraph, isn't that basically a switch from Yudkowskyanism to Hansonism? Because as far as I understand Yudkowskyan eschatology, starting with something only as good as a human is supposed to be a recipe for results worse than death. And that part actually seems to make more sense than most of it, because making individual humans very powerful turns out to be a terrible idea even in more realistic scenarios.

You're right and I misspoke, in two ways.

First, Eliezer focuses on the ability to remain stable under self-modification. I tend not to think about that enough simply because I don't know enough computer science for the problem to even make sense to me.

Second, Eliezer believes that an uploaded human would be untrustworthy for the same reason non-uploaded humans would make untrustworthy world dictatorships. I think this is a small problem compared to everything else that could possibly go wrong, but I agree it is a problem. By "human-level morality" I suppose I meant something with a human's ability to think about moral issues, but with a computer's ability to be programmed to always try to do the moral thing and not be selfish at all.