[I was planning on this main metaphor before that other filthy water metaphor shook the Catholic blogosphere. Absolutely no reference intended.]
When journalists asked Konrad Adenauer, the first post-war German chancellor, why his foreign office had so many employees who had been Nazis just a few years earlier, he answered
Man schüttet kein dreckiges Wasser aus, wenn man kein reines hat!
(One doesn’t pour out filthy water if one doesn’t have pure [water]!)
I think he was right. Running a government with loads of allegedly reformed Nazis was terrible and had some very bad consequences in actual policy, but it’s not like the people complaining about it had any realistic alternatives to offer. So sometimes we must make do with filthy water.
This is one good objection to my last post, where I ranted about Less Wrong/MIRI/CFAR folks trying to eat the menu by promoting some simplified mathematical models as the definition as rationality. Sure, someone might say, these models are filthy water, but we can’t think without simplifying, so we don’t have pure water, so we can’t throw out the filthy water.
My main reply is that filthyness is contextual. For example, coffee is very filthy water for purposes of washing clothes but better than pure for drinking. On the other hand some dangerous bacteria can be killed off by drying, so they can make water too filthy for drinking but still pure enough for washing clothes if totally pure water isn’t available.
You may believe I’m getting carried away by my metaphor, but actually the metaphor is getting carried away by me. The thing is, models too can be pure enough for some purposes while being too filthy for others. In other words, they have a domain of things they describe fairly well and get worse as we extrapolate beyond that domain. So a model can easily be the best we can do for a certain kind of question but still give worse than worthless answers for others. Those other questions might be better answerable by other models (perhaps even the informal model of our intuitions) or they may not be describable by any available model (i.e. we may actually know nothing about them).
As an example, let’s look at probabilistic reasoning. I argued that it is useful if (a) the range of potentially relevant events is properly understood and (b) some way exists to assign them to categories, and (c) we have enough experience to have some idea of how often our guesses for a given category turn out correct. This is most paradigmatically the case if the events are arbitrarily repeatable (in which case probabilities are expected frequencies), but some other use cases are close enough. Essentially this makes probabilistic reasoning into a kind of meta-model: It works as long as simplifying assumptions making (a-c) true are pure enough water. Which, if any, simplifying assumptions are pure enough depends on the context, which is why we shouldn’t talk about probabilities without the simplifications being at least implicitly specified. So probabilities are great for handling one specific kind of uncertainty. On the other hand they totally suck for the kind of uncertainty that is rooted in sui generis cases or the knowledge that our model is misspecified but we don’t have any better model yet. And we do have purer water for that kind of uncertainty: It’s the “classical rationality” the canonical writings of Less Wrong consider outdated.
For example, this recent post at Less Wrong is all about hanging epicycles on a probability model used outside the domain probability models are good for. These problems are entirely homemade, they only arise from the assumption that probabilistic reasoning always has better water purity than the old fashioned methods used by the polloi.
I’m not saying probabilities are bad, I’m saying they are sometimes good and sometimes bad and we have at least some vague rules of thumb for when they are good and when they are bad.
So illustrated on the example of probability, this is my criticism of the thinking system promoted at Less Wrong: They take mathematical (or sometimes mathematical sounding) models that are somewhat pure for some purposes, canonize them as the definition of rationality, and then use them for other purposes they are too filthy for.
Okay, we’re starting to converge here but I still have some problems and objections.
You oppose “technical” rationality to classical rationality, but classical rationality (and I think you agree here) isn’t really a “model”. It’s just taking our intuitions and then running them through a couple of heuristics. This is indeed better than using technical rationality for most things (at least in Near Mode) and I think most CFAR/MIRI people agree. CFAR teaches a lot of lessons on how to improve your intuitions, although they use science-y terminology like “System I” or “Inner Simulator”.
But I’m surprised you are so quick to dismiss things like figuring out Pascal’s Mugging. That seems like a very high value activity to me for a couple of reasons. First, there is always the hope that mathematical models, properly fleshed out, can one day outperform humans either quantitatively or qualitatively. For example, humans have amazing face-recognition intuitions, but mathematizing them into things like “ratio of distance between the eyes to distance from the nose to the center of the upper lip” has allowed people to make very useful face-recognition machines which can compare airport passengers’ faces to a data of terrorist faces and therefore outperform humans at terrorist-recognition tasks. We’re already at the point where diagnostic algorithms can outperform doctors in specific areas, and being able to expand those areas – which requires further mathematizing decision making – of which Bayes is probably the most promising lead – would be hugely helpful. And of course if you are transhumanist, or just have a standard optimistic belief that one day we can create some form of AI, that’s going to need a mathematization of decision-making too.
And since computers and algorithms don’t have intuitions, it’s important not only to get the math right, but to get the math right in a way that solves problems a being with intuition would be unlikely to fall into, like Pascal’s Mugging. I would also add that solving Pascal’s Mugging would solve problems that *do* bother real humans, like Pascal’s Wager. Also, science for science’s sake! The map is not the territory, but if you had a GPS that mysteriously failed as soon as you got too far from home, mapping the boundaries of exactly where it failed and figuring out why would be interesting – anyone who says “It’s only a map, it’s unsurprising that it fails to match the territory in some areas” just wouldn’t be curious enough!
Once again I worry the real difference here is that I believe the brain is deterministic/mathematical (I’m trying to avoid the word “computational” to avoid a metaphor war) and you…possibly don’t? If the brain is deterministic/mathematical, then we already know for sure that there are mathematical models that can do everything the brain can, and it seems very very likely that there are others that can do even better. If you don’t believe the brain is deterministic/mathematical, then I guess you might think trying to find superhuman decision-making algorithms is a dead end.
I think this also explains the tendency to assume there’s some canonical form of rationality. If the brain is implementing a mathematical algorithm, then we can ask which algorithm that is and whether there’s something else that can implement it perfectly. It’s very tempting to say that our failures of rationality correspond to failures of implementation – for example, not enough nutrients to build a neural net that models the algorithm exactly, or having to use a less-powerful approximation to the algorithm because processing power is limited by head size. If the brain’s rationality runs off a mathematical model, it seems totally reasonable to expect that model, freed from the limitations of the form, to be better than that model-as-embodied in a lossy system.
I actually am somewhat confused about the philosophy of mind in some contexts, but I don’t think that factors into this disagreement. My present objections hold even if the reductionistic view turns out to be the whole picture.
As I said before, a human-level AI is almost certainly possible in principle. In practice we’re not even close to having have the technology for whole-brain emulation and we still know very little about the brain’s workings at higher levels of abstraction. That sums up to nobody having even the slightest hint of a shadow of a trace of a vague image of an approximation of a birds-eye view of a big picture of a mirage of a foggy clue of how to do it in practice. And at the speed our understanding is progressing that seems unlikely to change during our lifetimes. And, as with most hard problems, the people loudest about having figured out the principle of intelligence tend to understand least. Still, in the long run someone probably will build a human-level AI unless the world ends beforehand. Mainstream science is actually making some sloooow progress on problems that need to be solved before other problems that need to be solved before the question of how the brain implements the mind can become less mysterious. For example I would have thought the concept of
subsymbolic information self-contradictory before I understood it, but it now looks like a small puzzle part of how a mind can work.
So I do think there is some kind of algorithm the brain is executing and hence some (unknown) mathematical description of how our intuitions work. But I’m quite dismissive of claims of having figured it out even in its outlines. One such claim is of course that it’s basically a version of Bayesian reasoning. Im not even sure the mind even has a native statistical subsystem (It could be that we can only run such reasoning as one application of the Turing-completeness of the language-system, which would explain why so much of it is so counter-intuitive to most people) but if there is such a subsystem it is precisely that: a sub-system and one much less important than, for example, whatever systems provide language and conceptual analysis. Modeling the human mind as basically a Bayesian reasoner with some extensions makes about as much sense as modeling a human as basically an appendix with some extensions.
So let me tune your GPS metaphor. Until a few years ago the civilian version of the GPS signal was artificially distorted by a few hundred meters so it wouldn’t be good enough to aim missiles. Of course people came up with a simple workaround (which presumably made the distortion so pointless the US government eventually gave up on it): There used to be towers with GPS and radio broadcasting electronics in known locations. They would permanently compare where GPS said they were with where they actually were and then broadcast the displacement. Then GPS devices would receive that signal in addition to the GPS signal, correct the GPS position by that displacement and then display the corrected position. Since this is a fairly simple idea I suppose many people understood how those towers worked without understanding how the actual GPS system works. And in the range of the towers tower-corrected GPS was actually better than plain GPS. Figuring out how far that signal carried might have been a moderately interesting question and fairly complicated to figure out in individual cases. But saying that geolocation unsurprisingly got worse out of the range of the towers would have been entirlely satisfactory even if the person saying so had no clue how the actual GPS system works. And nothing you could learn about towers could promise a better answer to that question. Likewise saying that probabilistic reasoning unsurprisingly fails far away from frequencies is entirely satisfactory even though I can’t give a full account of how ordinary human reasoning works there. The only reason why it doesn’t seem satisfactory is mixing up probabilistic reasoning with some idealized version of intuitive reasoning. But that’s roughly analogous to mixing up the satellites with the towers.
Basically if your model features infinitesimal probabilities you’re clearly much too far away from the towers to be using probabilities in the first place. The entire problem comes from using a wrong model and that in turn comes from having confused that model with the missing model of how non-probabilistic reasoning works. Refining the details won’t help any more than refining the rituals of a cargo cult.
As for building machines smarter than humans I’d say there are probably some technological options, like the AI simply running faster, having perfect memory, not getting depressed &c. The Hansonian version of the future may well be technologically feasible, though of course highly unethical. But I don’t see any rational reasons to believe in anything close to the Yudkowskyan AI explosion scenario.
I’m curious why you’re so convinced that the brain isn’t natively Bayesian. I find arguments like http://mrl.isr.uc.pt/pub/bscw.cgi/d27540/ReviewKnillPouget2.pdf extremely intriguing, although obviously they’re far from proven thus far. I also disagree with claims like that the Bayesian system is less fundamental than the language system – some of the best natural language processing systems that human programmers have invented run on Bayesian algorithms, so a native Bayes architecture being able to figure out language sounds more plausible to me than a native language architecture being able to figure out Bayes – especially since so much of language has to be learned as a young child and learning is a basically Bayesian process.
I worry we might not be talking about the same thing here. Take an analogy to computers. Computers’ fundamental operation is arithmetic on binary strings. This doesn’t show up very well in the stuff we do with them – right now I’m interacting with Notepad and for all the world it looks like computers’ fundamental operation involves text – but a sufficiently clever alien archaeologist could notice the sorts of things computers do and the structure of transistors and say it sure looks like binary arithmetic is involved in a suspicious number of computer functions. This is totally consistent with, for example, a particular freeware program I download to let me-the-user do binary arithmetic for my school homework being terrible and buggy.
Bayes’ Theorem is exactly the sort of thing that the brain <i>should</i> use to work, in the same way that if we found something that was consistently adding numbers correctly, it <i>should</i> be using something like addition. There are some suggestive structural details of neurons, a few promising early experiments, and a couple people using the model to gain what I think are some pretty impressive insights about psychiatric disease. Why would you dismiss this promising possibility in favor of “something different no one understands”?
Can you give an example of probabilistic reasoning failing far away from frequencies? Pascal’s Mugging seems to be more about decision theory than probability – the probabilities are usually taken as a given.
I’m a little more optimistic about human level AI in our lifetimes than you are. We’ve got sixty years or so left – imagine how wrong anyone in 1953 saying, staring at a monstrosity of vacuum tubes and punch cards and saying “I bet computers won’t X in our lifetimes” would probably have been. I think a lot of what seem like unsolveable questions will end up yielding to more processing power, the same way chess yielded to ability to consider millions of different moves and natural language translation yielded to ability to consider a very large corpus of different works. Advance in processing power is pretty predictable and should reach human-brain levels sometime in the 2030s. I admit I could be totally wrong about this – I’m mostly working from a perspective of “not sure enough one way or the other to say it definitely won’t happen”.
so much of language has to be learned as a young child and learning is a basically Bayesian process.
Noam Chomsky famously argued that language in a certain sense isn’t really “learned” at all, and computational complexity theorists have apparently formalized this argument.
(I do mainly agree with you about Bayes, however.)
On the article you linked, I think the headline seems to claim more than the body.
I agree the brain uses density functions to model location. Actually, based on some conversations with a friend who used to be in a neurociency field, I had thought that a settled fact, so this article has much more doubt about that part than I did. I think calling these density functions probabilities is already a bit of a stretch, because they are also for extended objects where it is not only about uncertainty about the object’s place but also about it actually being distributed over several places. But then such a function should still behave mathematically similar to a probability density, so one could say it counts, because probabilty is as Kolmogorov axioms does. Well, actually that sounds like stuff frequentists say, but moderate Bayesians borrowing some frequentist insight is OK I guess.
Now the article’s claim is that the brain aggregates such functions in a Bayes-optimal way. I’m happy to believe that, because Bayes-optimality is a mathematical formulation of the best result, so it’s just another way of saying the brain works well. What the article doesn’t actually seem to argue for is Bayesian updating being the underlying mechanism here. Let me introduce a metaphor to make that point: Assume you want to keep a house’s temperature stable in winter. By preservation of energy, the way to do that is to add exactly as much energy trough heating as the house looses to its environment. And any optimally functioning heating system will have complied with that equation. In principle, one could calculate how much heat a house looses through various channels and then regulate the amount of fuel burnt to match it. Or one could do what everybody does and just use a thermostat. The thermostat will turn the heater on and off whenever desired and actual temperature differ by more than a given tolerance.
This will result in balancing the equation on the preservation of energy, but that doesn’t mean the thermostat actually uses that equation. Rather it has a tight feedback loop that results in an energy flow obeying the equation simply because that also happens to be the optimal energy flow. Now back to the brain: We already know the brain uses tight feedback loops trough the external world all the time. For example, people asked to make their walking movement whith their feet hanging in the air get it very wrong. That’s because the brain actually has tactile feedback goals rather than a predetermined muscle contraction sequence. Now if the brain permanently and rapidly up- or downgraded the weight of input factors according to them having been better or worse than the aggregated prediction, that alone would result in Bayes-optimal updating without the brain having any representation of The Rev. T. Bayes’ theorem. So since we already know the brain uses tight feedback loops true external reality a whole
lot, adding Bayes-hardware for something explainable in that way would look like a violation of Friar Occam’s razor. Now I’d note that I’m not really disagreeing with the article here, because it doesn’t claim such hardware exists. It’s just that that’s how I and apparently you would interpret the catchy title.
Also, we’re talking about sensory perception here, not intelligence. A cat easily outdoes me on this kind of ear-eye-paw coordination task, and for some applications so does a computer right now. I would still claim to be more intelligent than either. So this is information about how the brain works, and that is highly interesting, but it doesn’t actually tell us anything about the function of the most mysterious parts particular to humans.
On Bayesian language processing systems I would turn the argument around: Language processing systems are already Bayesian and still suck, so Bayesianism clearly isn’t the secret sauce.
The claim that learning is basically a Bayesian process seems question-begging. Learning of Bayesian reasoners is basically a Bayesian process, but learning in general isn’t necessarily and I don’t think we are basically defective Bayesian reasoners.
It’s conceivable that we could run on Bayes-gates the way a computer runs on binary gates, because networks of either are Turing complete. But I don’t see why this particular Turing-complete mechanism should seem more likely than others.
On the failures of probabilistic reasoning far away from frequencies, that’s a large part of what my last post was about. I mentioned climate change as a question were it is possible but useless and additional terms to the law of gravitation and murder being evil as questions were it isn’t even possible.
I don’t think processing power is an important limit of brain simulation. Given the slow clock rate neurons, a big house full of FPGAs on a fast network probably could have a human-brain level of processing power at today’s technology. It’s just that we wouldn’t know how to program such a computer to do anything useful. The more relevant constraint would be our ignorance of how the brain works. So we don’t know what to simulate and even if we knew we probably would have to simulate on a too low level of detail. For comparison, a modern PC can’t emulate a C64 in real time at the transistor level, but doing it at higher levels of abstraction is easy. I think the canonical transhumanist answer is that nanotech(TM) is going to allow full brain scans real soon now(TM). I don’t take that very seriously.
Hm. Our main difference might be a linguistic difference of exactly what a process has to do to be “Bayesian” versus “Bayes-optimal”. Especially since I’m not sure what it would mean to have an equation “represented” in the brain.
But if you agree brain processes should be Bayes-optimal, that seems like a departure from calling Bayes merely a model. Let’s take addition. I’m sure there are some brain processes that add different things – for example, maybe summing the light from individual photoreceptors to get an idea of the total amount of ambient light. First of all, I’m skeptical there’s a meaningful difference between saying the brain adds these light values versus “processes these light-values in an addition-optimal way that comes up with the same value addition would”. But more importantly, in this sort of case addition isn’t a model, it’s the Correct Way To Do This Task.
If you believe that learning and belief-updating are Bayes-optimal, then I’m confused why you still think it’s filthy water instead of water that rains directly down from Platonia and is objectively correct.
I would be surprised if there were huge differences between sensory perception and higher-level cognition in terms of the equations used, just because evolution is conservative and the amount of time between animals and humans isn’t really enough to come up with brilliant new math.
[I was planning on this main metaphor before that other filthy water metaphor shook the Catholic blogosphere. Absolutely no reference intended.]
Ah … what metaphor are you referring to? The only native of said blogosphere I know in RL was mystified by the question 🙁