Thursday, October 30, 2008

Zarathustra, Plato, Saving Boxes, Oracle Machines and Pineal Antennae

Reading over the conversation I had (with Abram Demski) in the Comments to a prior blog post

I was reminded of a conversation I had once with my son Zarathustra when he was 4 years old.

Zar was defending his claim that he actually was omniscient, and explaining how this was consistent with his apparent ignorance on many matters. His explanation went something like this:

"I actually do know everything, Ben! It's just that with all that stuff in my memory, it can take me a really really long time to get the memories out ... years sometimes...."

Of course, Zar didn't realize Plato had been there before (since they didn't cover Plato in his pre-school...).

He also had the speculation that this infinite memory store, called his "saving box", was contained in his abdomen somewhere, separate from his ordinary, limited-scope memories in his brain. Apparently his intuition for philosophy was better than for biology... or he would have realized it was actually in the pineal gland (again, no Descartes in preschool either ;-p).

This reminded me of the hypothesis that arose in the conversation with Abram, that in effect all humans might have some kind of oracle machine in their brains.

If we all have the same internal neural oracle machine (or if, say, we all have pineal-gland antennas to the the same Cosmic Oracle Machine (operated by the ghost of Larry Ellison?)), then we can communicate about the uncomputable even though our language can never actually encapsulate what it is we're talking about.

Terrence McKenna, of course, had another word for these quasi-neural oracle machines: machine-elves ;-)

This means that the real goal of AGI should be to create a software program that can serve as a proper antenna 8-D

Just a little hi-fi sci-fi weirdness to brighten up your day ... I seem to have caught a bad cold and it must be interfering with my thought processes ... or messing up the reception of my pineal antenna ...


perhaps some evidence for Zar's saving-box theory:

Tuesday, October 28, 2008

Random Memory of a Creative Mind (Paul Feyerabend)

I had a brief but influential (for me: I'm sure he quickly forgot it) correspondence with the philosopher-of-science Paul Feyerabend when I was 19.

I sent him a philosophical manuscript of mine, printed on a crappy dot matrix printer ... I think it was called "Lies and False Truths." I asked him to read it, and also asked his advice on where I should go to grad school to study philosophy. I was in the middle of my second year of grad school, working toward my PhD in math, but I was having second thoughts about math as a career....

He replied with a densely written postcard, saying he wasn't going to read my book because he was spending most of his time on non-philosophy pursuits ... but that he'd glanced it over and it looked creative and interesting (or something like that: I forget the exact words) ... and, most usefully, telling me that if I wanted to be a real philosopher I should not study philosophy academically nor become a philosophy professor, but should study science and/or arts and then pursue philosophy independently.

His advice struck the right chord and the temporary insanity that had caused me to briefly consider becoming a professional philosopher, vanished into the mysterious fog from which it had emerged ...

(I think there may have been another couple brief letters back and forth too, not sure...)

(I had third thoughts about math grad school about 6 months after that, and briefly moved to Vegas to become a telemarketer and Henry-Miller-meets-Nietzsche style prose-poem ranter ... but that's another story ... and anyways I went back to grad school and completed my PhD fairly expeditiously by age 22...)


Even at that absurdly young age (but even more so now), I had a lot of disagreements with Feyerabend's ideas on philosophy of science -- but I loved his contentious, informal-yet-rigorous, individualistic style. He thought for himself, not within any specific school of thought or tradition. That's why I wrote to him -- I viewed him as a sort of kindred maverick (if that word is still usable anymore, given what Maverick McCain has done to it ... heh ;-p)

My own current philosophy of science has very little to do with his, but, I'm sure we would have enjoyed arguing the issues together!

He basically argued that science was a social phenomenon with no fixed method. He gave lots of wonderful examples of how creative scientists had worked outside of any known methods.

While I think that's true, I don't think it's the most interesting observation one can make about science ... it seems to me there are some nice formal models you can posit that are good approximations explaining a lot about the social phenomenon of science, even though they're not complete explanations. The grungy details (in chronological order) are at:

But, one thing I did take from Feyerabend and his friend/argument-partner Imre Lakatos was the need to focus on science as a social phenomenon. What I've tried to do in my own philosophy of science is to pull together the social-phenomenon perspective with the Bayesian-statistics/algorithmic-information perspective on science.... But, as usual, I digress!

hiccups on the path to superefficient financial markets

A political reporter emailed me the other day asking my opinion on the role AI technology played in the recent financial crisis, and what this might imply for the future of finance.

Here's what I told him. Probably it freaked him out so much he deleted it and wiped it from his memory, but hey...

There's no doubt that advanced software programs using AI and other complex techniques played a major role in the current global financial crisis. However, it's also true that the risks and limitations of these software programs were known by many of the people involved, and in many cases were ignored intentionally rather than out of ignorance.

To be more precise: the known mathematical and AI techniques for estimating the risk of complex financial instruments (like credit default swaps, and various other exotic derivatives) all depend on certain assumptions. At this stage, some human intelligence is required to figure out whether the assumptions of a given mathematical technique really apply in a certain real-world situation. So, if one is confronted with a real-world situation where it's unclear whether the assumptions of a certain mathematical technique really apply, it's a human decision whether to apply the technique or not.

A historical example of this problem was the LTCM debacle in the 90's. In that case, the mathematical techniques used by LTCM assumed that the economies of various emerging markets were largely statistical independent. Based on that assumption, LTCM entered into some highly leveraged investments that were low-risk unless the assumption failed. The assumption failed.

Similarly, more recently, Iceland's financial situation was mathematically assessed to be stable, based on the assumption that (to simplify a little bit) a large number of depositors wouldn't decide to simultaneously withdraw a lot of their money. This assumption had never been violated in past situations that were judged as relevant. Oops.

A related, obvious phenomenon is that sometimes humans assigned with the job of assessing risk are given a choice between:

  1. assessing risk according to a technique whose assumptions don't really apply to the real-world situation, or whose applicability is uncertain
  2. saying "sorry, I don't have any good technique for assessing the risk of this particular financial instrument"

Naturally, the choice commonly taken is 1 rather than 2.

In another decade or two, I'd predict, we'll have yet more intelligent software, which is able to automatically assess whether the assumptions of a certain mathematical technique are applicable in a certain context. That would avoid the sort of problem we've recently seen.

So the base problem is that the software we have now is good at making predictions and assessments based on contextual assumptions ... but it is bad at assessing the applicability of contextual assumptions. The latter is left to humans, who often make decisions based on emotional bias, personal greed and so forth rather than rationality.

Obviously, the fact that a fund manager shares more in their fund's profit than in its loss, has some impact in their assessments. This will bias fund managers to take risks, because if the gamble comes out well, they get a huge bonus, but if it comes out badly, the worst that happens is that they find another job.

My feeling is that these sorts of problems we've seen recently are hiccups on the path to superefficient financial markets based on advanced AI. But it's hard to say exactly how long it will take for AI to achieve the needed understanding of context, to avoid this sort of "minor glitch."


After I posted the above, there was a followup discussion on the AGI mailing list, in which someone asked me about applications of AGI to investment.

My reply was:

Until we have a generally very powerful AGI, application of AI to finance will be in the vein of narrow-AI. Investment is a hard problem, not for toddler-minds.

Narrow-AI applications to finance can be fairly broad in nature though, e.g. I helped build a website called that analyzes financial sentiment in news

Once we have a system with roughly adult-human-level AGI, then of course it will be possible to create specialized versions of this that are oriented toward trading, and these will be far superior to humans or narrow AIs at trading the markets, and whomever owns them will win a lot of everybody's money unless the government stops them.


Someone on a mailing list pushed back on my mention of "AI and other mathematical techniques."

This seems worth clarifying, because the line between narrow-AI and other-math-techniques is really very fuzzy.

To give an indication of how fuzzy the line is ... consider the (very common) case of multiextremal optimization.

GA's are optimization algorithms that are considered AI ... but, is multi-start hillclimbing AI? Many would say so. Yet, some multiextremal optimization algorithms are considered operations research instead of AI -- say, multistart conjugate gradients...

Similarly, backprop NN's are considered AI .. yet, polynomial or exponential regression algorithms aren't. But they pretty much do the same stuff...

Or, think about assessment of credit risk, to determine who is allowed to get what kind of mortgage. This is done by AI data mining algorithms. OTOH it could also be done by some statistical algorithms that wouldn't normally be called AI (though I think it is usually addressed using methods like frequent itemset mining and decision trees, that are considered AI).

Are Uncomputable Entities Useless for Science?

When I first learned about uncomputable numbers, I was profoundly disturbed. One of the first things you prove about uncomputable numbers, when you encounter them in advanced math classes, is that it is provably never possible to explicitly display any example of an uncomputable number. But nevertheless, you can prove that (in a precise mathematical sense) "almost all" numbers on the real number line are uncomputable. This is proved indirectly, by showing that the real number line as a whole has one order of infinity (aleph-one) and the set of all computers has another order of infinite (aleph-null).

I never liked this, and I burned an embarrassing amount of time back then (I guess this was from ages 16-20) trying to find some logical inconsistency there. Somehow, I thought, it must be possible to prove this notion of "a set of things, none of which can ever actually be precisely characterized by any finite description" as inconsistent, as impossible.

Of course, try as I might, I found no inconsistency with the math -- only inconsistency with my own human intuitions.

And of course, I wasn't the first to tread that path (and I knew it). There's a philosophy of mathematics called "constructivism" which essentially bans any kind of mathematical entity whose existence can only be proved indirectly. Related to this is a philosophy of math called "intuitionism."

A problem with these philosophies of math is that they rule out some of the branches of math I most enjoy: I always favored continuous math -- real analysis, complex analysis, functional analysis -- over discrete math about finite structures. And of course these are incredibly useful branches of math: for instance, they underly most of physics.

These continuity-based branches of math also underly, for example, mathematical finance, even though the world of financial transactions is obviously discrete and computable, so one can't possibly need uncomputable numbers to handle it.

There always seemed to me something deeply mysterious in the way the use of the real line, with its unacceptably mystical uncomputable numbers, made practical mathematics in areas like physics and finance so much easier.

Notice, this implicitly uncomputable math is never necessary in these applications. You could reformulate all the equations of physics or finance in terms of purely discrete, finite math; and in most real applications, these days, the continuous equations are solved using discrete approximations on computers anyway. But, the theoretical math (that's used to figure out which discrete approximations to run on the computer) often comes out more nicely in the continuous version than the discrete version. For instance, the rules of traditional continuous calculus are generally far simpler and more elegant than the rules of discretized calculus.

And, note that the uncomputability is always in the background when you're using continuous mathematics. Since you can't explicitly write down any of these uncomputable numbers anyway, they don't play much role in your practical work with continuous math. But the math you're using, in some sense, implies their "existence."

But what does "existence" mean here?

To quote former President Bill Clinton, "it all depends on what the meaning of the word is, is."

A related issue arises in the philosophy of AI. Most AI theorists believe that human-like intelligence can ultimately be achieved within a digital computer program (most of them are in my view overpessimistic about how long it's going to take us to figure out exactly how to write such a program, but that's another story). But some mavericks, most notably Roger Penrose, have argued otherwise (see his books The Emperor's New Mind and Shadows of the Mind, for example). Penrose has argued specifically that the crux of human intelligence is some sort of mental manipulation of uncomputable entities.

And Penrose has also gone further: he's argued that some future theory of physics is going to reveal that the dynamics of the physical world is also based on the interaction of uncomputable entities. So that mind is an uncomputable consequence of uncomputable physical reality.

This argument always disturbed me, also. There always seemed something fundamentally wrong to me about the notion of "uncomputable physics." Because, science is always, in the end, about finite sets of finite-precision data. So, how could these mysterious uncomputable entities ever really be necessary to explain this finite data?

Obviously, it seemed tome, they could never be necessary. Any finite dataset has a finite explanation. But the question then becomes whether in some cases invoking uncomputable entities is the best way to explain some finite dataset. Can the best way of explaining some set of, say, 10 or 1000 or 1000000 numbers be "This uncomputable process, whose details you can never write down or communicate in ordinary language in a finite amount of time, generated these numbers."

This really doesn't make sense to me. It seems intuitively wrong -- more clearly and obviously so than the notion of the "existence" of uncomputable numbers and other uncomputable entities in some abstract mathematical sense.

So, my goal in this post is to give a careful explanation of why this wrong. The argument I'm going to give here could be fully formalized as mathematics, but, I don't have the time for that right now, so I'll just give it semi-verbally/semi-mathematically, but I'll try to choose my words carefully.

As often happens, the matter turned out to be a little subtler than I initially thought it would be. To argue that uncomputables are useless for science, one needs some specific formal model of what science itself is. And this is of course a contentious issue. However, if one does adopt the formalization of science that I suggest, then the scientific uselessness of uncomputables falls out fairly straightforwardly. (And I note that this was certainly not my motivation for conceiving the formal model of science I'll suggest; I cooked it up a while ago for quite other reasons.)

Maybe someone else could come up with a different formal model of science that gives a useful role to uncomputable entities ... though one could then start a meta-level analysis of the usefulness of this kind of formal model of science! But I'll defer that till next year ;-)

Even though it's not wholly rigorous math, this is a pretty mathematical blog post that will make for slow reading. But if you have suitable background and are willing to slog through it, I think you'll find it an interesting train of thought.

NOTE: the motivation to write up these ideas (which have been bouncing around in my head for ages) emerged during email discussions on the AGI list with a large group, most critically Abram Demski, Eric Baum and Mark Waser.

A Simple Formalization of the Scientific Process

I'll start by giving a simplified formalization of the process of science.

This formalization is related to the philosophy of science I outlined in the essay (included in The Hidden Pattern) and more recently extended in the blog post But those prior writing consider many aspects not discussed here.

Let's consider a community of agents that use some language L to communicate. By a language, what I mean here is simply a set of finite symbol-sequences ("expressions"), utilizing a finite set of symbols.

Assume that a dataset (i.e., a finite set of finite-precision observations) can be expressed as a set of pairs of expressions in the language L. So a dataset D can be viewed as a set of pairs

((d11, d12), (d21,d22) ,..., (dn1,dn2))

or else as a pair D=(D1,D2) where


Then, define an explanation of a dataset D as a set E_D of expressions in L, so that if one agent A1 communicates E_D to another agent A2 that has seen D1 but not D2, nevertheless A2 is able to reproduce D2.

(One can look at precise explanations versus imprecise ones, where an imprecise explanation means that A2 is able to reproduce D2 only approximately, but this doesn't affect the argument significantly, so I'll leave this complication out from here on.)

If D2 is large, then for E_D to be an interesting explanation, it should be more compact than D2.

Note that I am not requiring E_D to generate D2 from D1 on its own. I am requiring that A2 be able to generate D2 based on E_D and D1. Since A2 is an arbitrary member of the community of agents, the validity of an explanation, as I'm defining it here, is relative to the assumed community of agents.

Note also that, although expressions in L are always finitely describable, that doesn't mean that the agents A1, A2, etc. are. According to the framework I've set up here, these agents could be infinite, uncomputable, and so forth. I'm not assuming anything special about the agents, but I am considering them in the special context of finite communications about finite observations.

The above is my formalization of the scientific process, in a general and abstract sense. According to this formalization, science is about communities of agents linguistically transmitting to each other knowledge about how to predict some commonly-perceived data, given some other commonly-perceived data.

The (Dubious) Scientific Value of the Uncomputable

Next, getting closer to the theme of this post, I turn to consider the question of what use it might be for A2 to employ some uncomputable entity U in the process of using E_D to generate D2 from D1. My contention is that, under some reasonable assumptions, there is no value to A2 in using uncomputable entities in this context.

D1 and E_D are sets of L-expressions, and so is D2. So what A2 is faced with, is a problem of mapping one set of L-expressions into another.

Suppose that A2 uses some process P to carry out this mapping. Then, if we represent each set of L-expressions as a bit string (which may be done in a variety of different, straightforward ways), P is then a mapping from bit strings into bit strings. To keep things simple we can assume some maximum size cap on the size of the bit strings involved (corresponding for instance to the maximum size expression-set that can be uttered by any agent during a trillion years).

The question then becomes whether it is somehow useful for A2 to use some uncomputable entity U to compute P, rather than using some sort of set of discrete operations comparable to a computer program.

One way to address this question is to introduce a notion of simplicity. The question then becomes whether it is simpler for A2 to use U to compute P, rather than using some computer program.

And this, then, boils down to one's choice of simplicity measure.

Consider the situation where A2 wants to tell A3 how to use U to compute P. In this case, A2 must represent U somehow in the language L.

In the simplest case, A2 may represent U directly in the language, using a single expression (which may then be included in other expressions). There will then be certain rules governing the use of U in the language, such that A2 can successfully, reliably communicate "use of U to compute P" to A3 only if these rules are followed. Call this rule-set R_U. Let us assume that R_U is a finite set of expressions, and may also be expressed in the language L.

Then, the key question is whether we can have

complexity(U) < complexity(R_U)

That is, can U be less complex than the set of rules prescribing the use of its symbol S_U within the community of agents?

If we say NO, then it follows there is no use for A2 to use U internally to produce D2, in the sense that it would be simpler for A2 to just use R_U internally.

On the other hand, if we say YES, then according to the given complexity measure, it may be easier for A2 to internally make use of U, rather than to use R_U or something else finite.

So, if we choose to define complexity in terms of complexity of expression in the community's language L, then we conclude that uncomputable entities are useless for science. Because, we can always replace any uncomputable entity U with a set of rules for manipulating the symbol S_U corresponding to it.

If you don't like this complexity measure, you're of course free to propose another one, and argue why it's the right one to use to understand science. In a previous blog post I've presented some of the intuitions underlying my assumption of this "communication prior" as a complexity measure underlying scientific reasoning.

The above discussion assumes that U is denoted in L by a single symbolic L-expression S_U, but the same basic argument holds if the expression of U in L is more complex.

What does all this mean about calculus, for example ... and the other lovely uses of uncomputable math to explain science data?

The question comes down to whether, for instance, we have

complexity(real number line R) <>

If NO, then it means the mind is better off using the axioms for R than using R directly. And, I suggest, that is what we actually do when using R in calculus. We don't use R as an "actual entity" in any strong sense, we use R as an abstract set of axioms.

What would YES mean? It would mean that somehow we, as uncomputable beings, used R as an internal source of intuition about continuity ... not thus deriving any conclusions beyond the ones obtainable using the axioms about R, but deriving conclusions in a way that we found subjectively simpler.

A Postcript about AI

And, as an aside, what does all this mean about AI? It doesn't really tell you anything definitive about whether humanlike mind can be achieved computationally. But what it does tell you is that, if
  • humanlike mind can be studied using the communicational tools of science (that is, using finite sets of finite-precision observations, and languages defined as finite strings on finite alphabets)
  • one accepts the communication prior (length of linguistic expression as a measure of complexity)
then IF mind is fundamentally noncomputational, science is no use for studying it. Because science, as formalized here, can never distinguish between use of U and use of S_U. According to science, there will always be some computational explanation of any set of data, though whether this is the simplest explanation depends on one's choice of complexity measure.

Tuesday, October 07, 2008

Cosmic, overblown Grand Unified Theory of Development

In the 80's I spent a lot of time in the "Q" section of various libraries, which hosted some AI books, and a lot of funky books on "General Systems Theory" and related forms of interdisciplinary scientifico-philosophical wackiness.

GST is way out of fashion in the US, supplanted by Santa Fe Institute style "complexity theory" (which takes the same basic ideas but fleshes them out differently using modern computer tech), but I still have a soft spot in my heart for it....

Anyway, today when I was cleaning out odd spots of the house looking for a lost item (which I failed to find and really need, goddamnit!!) I found some scraps of paper that I scribbled on a couple years back while on some airline flight or another, sketching out the elements of a general-systems-theory type Grand Unified Theory of Development ... an overall theory of the stages of development that complex systems go through as they travel from infancy to maturity.

I'm not going to type in the whole thing here right now, but I made a table depicting part of it, so as to record the essence of the idea in some nicer, more permanent form than the fading dirty pieces of notebook paper....

The table shows the four key stages any complex system goes through, described in general terms, and then explained in a little more detail in the context of two examples: the human (or humanlike) mind as it develops from infancy to maturity, and the maturity of life from proto-life up into its modern form.

I couldn't get the table to embed nicely in this blog interface, so it's here as a PDF:

This was in fact the train of thought that led to two papers Stephan Bugaj and I wrote over the last couple years, on the stages of cognitive development of uncertain-inference based AI systems, and the stages of ethical development of such AI systems. While not presented as such in those papers, the stages given there are really specialized manifestations of the more general stages outlined in the above table.

Stephan and I are (slowly) brewing a book on hyperset models of mind and reality, which will include some further-elaborated, rigorously-mathematized version of this general theory of development...

Long live General Systems thinking ;-)

Monday, October 06, 2008

Parable of the Researcher and the Tribesman

I run an email discussion list on Artificial General Intelligence, which is often interesting, but lately the discussions there have been more frustrating than fascinating, unfortunately.

One recent email thread has involved an individual repeatedly claiming that I have not presented any argument as to why my designs for AGI could possibly work.

When I point to my published or online works, which do present such arguments, this individual simply says that if my ideas make any sense, I should be able to summarize my arguments nontechnically in a few paragraphs in an email.

Foolishly, I managed to get sufficiently annoyed at this email thread that I posted a somewhat condescending and silly parable to the email list, which I thought I'd record here, just for the heck of it....

What I said was:

In dialogues like this, I feel somewhat like a medical researcher talking to a member of a primitive tribe, trying to explain why he thinks he has a good lead on a potential drug to cure a disease. Imagine a dialogue like this:

  • RESEARCHER: I'm fairly sure that I'll be able to create a drug curing your son's disease within a decade or so
  • TRIBESMAN: Why do you believe that? Have you cured anyone with the drug?
  • RESEARCHER: No, in fact I haven't even created the drug yet
  • TRIBESMAN: Well, do you know exactly how to make the drug?
  • RESEARCHER: No, not exactly. In fact there is bound to be some inventive research involved in making the drug.
  • TRIBESMAN: Well then how the hell can you be so confident it's possible?
  • RESEARCHER: Well I've found a compound that blocks the production of the protein I know to be responsible for causing the disease. This compound has some minor toxic effects in rats, but it's similar in relevant respects to other compounds that have shown toxic effects in rats, and then been minorly modified to yield variant compounds with the same curative impacts without toxic effects
  • TRIBESMAN: So you're saying it's cured the same disease in rats?
  • RESEARCHER: Yes, although it also makes the rats sick ... but if it didn't make them sick, it would cure them. And I'm pretty sure I know how to change it so as to make it not make the rats sick. And then it will cure them.
  • TRIBESMAN: But my son is not a rat. Are you calling my son a rat? You don't seem to understand what a great guy my son is. All the women love him. His winky is twice as long as yours. What does curing a rat have to do with curing my son? And it doesn't even cure the rat. It makes him sick. You just want to make my son sick.
  • RESEARCHER: Look, you don't understand. If you look at all the compounds in that class, you'll see there are all sorts of ways to modify them to avoid these toxic effects.
  • TRIBESMAN: So you're saying I should believe you because you're a big important scientist. But your drug hasn't actually cured anyone. I don't believe it'll possibly work. People come by here all the time trying to sell me drugs and they never work. Those diet pill were supposed to make my wife 100 pounds thinner, but she still looks like a boat.
  • RESEARCHER: I'm not responsible for the quacks who sold you diet pills
  • TRIBESMAN: They had white lab coats just like yours
  • RESEARCHER: Look, read my research papers. Then let's discuss it.
  • TRIBESMAN: I can't read that gobbledygook. Do all the other researchers agree with you?
  • RESEARCHER: Some of them do, some of them don't. But almost all of them who have read my papers carefully think I at least have a serious chance of turning my protein blocker into a cure. Even if they don't think it's the best possible approach.
  • TRIBESMAN: So all the experts don't even agree, and you expect me to take you seriously?
  • RESEARCHER: Whatever. I'll talk to you again when I actually have the cure. Have a nice few years.
  • TRIBESMAN: We won't need your cure by then, Mr. Scientist. We're curing him with leeches already.

That just about sums it up....

The point is, the researchers's confidence comes from his intuitive understanding of a body of knowledge that the tribesman cannot appreciate due to lack of education.

The tribesman says "you haven't cured anyone, therefore you know nothing about the drug" ... but the researcher has a theoretical framework that lets him understand something about the drug's activity even before trying it on people.

Similarly, some of us working on AGI have a theoretical framework that lets us understand something about our AGI systems even before they're complete ... this is what guides our work building the systems. But conveying our arguments to folks without this theoretical framework is, unfortunately, close to impossible.... If I were to write some sort of popular treatment of my AGI work, the first 75% of it would have to consist of a generic explanation of background ideas (which is part of the reason I don't take the time to write such a thing ... it seems like an awful lot of work!!).

Obvious stuff, of course. I'm metaphorically kicking myself for burning half an hour in this sort of absurd email argument tonight ... gotta be more rigorous about conserving my time and attention, there's a lot of work to be done!!!

Saturday, October 04, 2008

Reflections on "Religulous" ... and introducing the Communication Prior

I saw the documentary Religulous w/ my kids last night (well, the two who still live at home) ... it's a sort of goofball documentary involving comedian Bill Maher interviewing people with absurd religious beliefs (mostly focusing on Christians, Jews and Muslims, with a few other oddities like a Scientologist street preacher and an Amsterdam cannabis-worshipper) ...

This blog post records some of my random reactions to the movie, and then at the end gets a little deeper and presents a new theoretical idea that popped into my head while thinking about the difficulty of making a really sound intellectual refutation of common religious beliefs.

The new theoretical idea is called the Communication Prior ... and the crux is the notion that in a social group, the prior probability of a theory may be defined in terms of the ease with which one group member can rapidly and accurately communicate the theory to another. My suggestion is that the Communication Prior can serve as the basis for a pragmatic everyday interpretation of Occam's Razor (as opposed to the Solomonoff-Levin Prior, which is a formal-computer-science interpretation). This is important IMHO because science ultimately boils down to pragmatic everyday social phenomena not formal mathematical phenomena.

Random Reactions to Religulous

First a bit about Religulous, which spurred the train of thought reported here....

Some of the interviews in the movie were really funny -- for instance a fat Puerto Rican preacher named Jesus who claims to literally be the Second Coming of Christ, and to have abolished sin and hell ...

and as a whole the interviews certainly made Maher's point that all modern religions are based on beliefs that seem bizarre and twisted in the light of the modern scientific world-view ... the talking snake in the Garden of Eden ... Judgment Day when God comes to Earth and sorts the goodies from the baddies ... the notion that rapture will come only when the Muslims have finally killed all the Jews ... etc. etc. etc. etc. etc. ...

Some interesting historical tidbits were presented as well, e.g. the Egyptian figure Horus, who well predated Christ and whose life-story bears remarkable similarities to the Biblical tale of Jesus....

I've never been a huge fan of stand-up comedians; and among comedians Maher doesn't really match my taste that well ... he's not outrageous or absurd enough ... so I got a bit weary of his commentary throughout the film, but I felt the interviews and interspersed film and news snippets were well-done and made his point really well.

Of course, it's a damn easy point to make, which was part of his point: Of course all religions ancient and modern have been based on bizarre, wacky, impossible-for-any-sane-person-to-believe, fictional-sounding ideas...

One point that came up over and over again in his dialogues with religious folks was his difference with them over the basic importance (or lack thereof) of faith. "Why," he kept asking, "is faith a GOOD thing? Why is it a good thing to believe stuff that has no evidence in favor of it? Why is it a good thing to believe stuff that makes no sense and contradicts observation and apparent reality?"

The answer the religious folks invariably give him is something like "Faith is a good thing because it saved my life."

Dialogue like: "I used to be a Satan worshipper and wasted decades of my life on sex and drugs ... Getting saved by Jesus saved my life blahblaa..."

Religion and Politics: Egads!

Maher's interview with a religious fundamentalist US Senator is a bit disturbing. Indeed, to have folks who believe Judgment Day is nigh, in charge of running the most powerful country in the world, is, uh, scary....

And note that our outgoing President, W Bush, repeatedly invokes his religious beliefs in justifying his policies. He explicitly states that his faith in God is the cornerstone of his policies. Scary, scary, scary. I don't want to live in a society that is regulated based on someone's faith in a supernatural being ... based on someone's faith in the literal or metaphorical truth of some book a bunch of whacked-out, hallucinating Middle-Easterners wrote 2000 years ago....

As Maher points out, this is a completely senseless and insane basis for a modern society to base itself on....

Maher's Core Argument

I don't expect Maher's movie to un-convert a substantial number of religious folks...

Their natural reaction will be: "OK, but you just interviewed a bunch of kooks and then strung their kookiest quotes together."

Which is pretty much what he did ... and in a way that may well be compelling as a tool for helping atheists feel more comfortable publicly voicing their beliefs (which I imagine was much of his purpose) ...

And it has to be noted that a deep, serious, thorough treatment of the topic of religion and irrationality would probably never get into movie theaters.

Modern culture, especially US culture but increasingly world culture as well, has little time for deep rational argumentation. Al Gore made this book quite nicely in his book The Assault on Reason ... which however not that many people read (the book contained too much rational argumentation...).

So it's hard to fault Maher's film for staying close to the surface and presenting a shallow argument against religion ... this is the kind of argument that our culture is presently willing to accept most easily ... and if atheists restricted themselves to careful, thorough, reflective rational arguments, the result would be that even fewer people would listen to them than is now the case....

Maher's argument is basically: All religions have absurd, apparently-delusional, anti-scientific beliefs at their core ... and these absurd beliefs are directly tied to a lot of bad things in the world ... Holy Wars and so forth ....

He also, correctly, traces the bizarre beliefs at the heart of religions to altered brain-states on the part of religious prophets.

As he notes, if someone today rambled around telling everyone they'd been talking to a burning bush up on a hill, they'd likely get locked into a mental institution and force-fed antipsychotics. Yet, when this sort of experience is presented as part of the history of religion, no one seems to worry too much -- it's no longer an insane delusion, it's a proper foundation for the government of the world ;-p

What Percentage of the Population Has a World View Capable of Sensibly Confronting the Singularity?

One thing that struck me repeatedly when listening to Maher's interviews was:

Wow, given all the really HARD issues the human races during this period of rapidly-approaching Singularity ... it's pathetic that we're still absorbed with these ridiculous debates about talking snakes and Judgment Day and praying to supreme beings ... egads!!!

While a digression from this blog post, this is something I think about a lot, in the context of trying to figure out the most ethical and success-probable approach to creating superhuman AI....

On the one hand, due to various aspects of human psychology, I don't trust elitism much: the idea of a small group of folks (however gifted and thoughtful) creating a superhuman AI and then transforming the world, without broader feedback and dialogue, is a bit scary....

On the other hand, I've got to suspect that folks who believe in supreme beings, Judgment Day, jihad, reincarnation and so forth are not really likely to have much useful contribution to the actual hard issues confronting us as Singularity approaches....

Of course, one can envision a lot of ways of avoiding the difficulties alluded to in the prior two paragraphs ... but also a lot of ways of not avoiding them....

One hope is that Maher's movie and further media discourse legitimizing atheism will at least somewhat improve the intellectual level of broad public conversation ... so that, maybe, in a decade or so it won't be political suicide for a US Senatorial candidate to admit they're not religious or superstitious, for example...

On the other hand, it may well eventuate that this process of de-superstitionizing the world will be damn slow compared to the advent of technology ...

But, that's a topic for another lengthy blog post, some other weekend....

The Issues Posed by the "Problem of Induction" and the Philosophy of Science for the Argument Against Religion

Now I'll start creeping, ever so slowly, toward the more original intellectual content of this post, by asking: What might a more deeply reasoned, reflective argument against religion look like?

This topic is actually fairly subtle, because it gets at deep issues in the philosophy of science ... such as I reviewed in an essay a few years ago (included in my 2006 book The Hidden Pattern)...

Although Maher talks a lot about scientific evidence ... and correctly points out that there is no scientific evidence for the various kooky-sounding claims at the core of modern religions ... he doesn't seem to have thought much about the nature of scientific evidence itself. (Which is no surprise as he's a professional comedian and actor ... but of course, he's now a self-styled commentator on politics, science and religion, so....)

Evidence, in the sense of raw data, is not disputed that often among scientists -- and even religious folks don't dispute raw data collected by scientists that often. Statements like "this laboratory instrument, at this point in time, recorded this number on its dial" are not oft disputed. Sometimes argumentation may be made that not enough data were recorded to evaluate an empirical statement like the above (say, the temperature in the room, or the mind-state of the lab assistant, were not recorded): but this still isn't really an argument that the data are wrong, more an argument that the data are too incomplete to draw useful conclusions from them.

(The only area of research I know where raw data is routinely disputed is psi ... which I already addressed in a prior blog post.)

But the step from raw items of evidence to theory is a big one -- a bigger one than Maher or most naively-pro-science advocates care to admit.

This of course relates to the uncomfortable fact that the Humean problem of induction was never solved.

As Maher points out repeatedly in his film, we just don't really know anything for sure ... and it appears that by the basic logic of the universe and the nature of knowledge itself, we never can.

What he doesn't point out (because it's not that kind of movie) is that without making some kind of background assumptions (going beyond the raw evidence collected), we also can't really make probability estimates, or probabilistic predictions about the outcomes of experiments or situations.

Given a set of observations, can we predict the next observations we'll see? Even probabilistically? As Hume pointed out, we can do so only by making some background assumptions.

For instance, we can adopt the Occam's Razor heuristic and assume that there will be some simple pattern binding the past observations to the future ones.... But that begs the question: what is the measure of simplicity?

Hume says, in essence, that the relevant measure of simplicity is human nature.

But this conclusion may, initially, seem a bit disturbing in the context of the religion vs. science dichotomy.

Because, human nature in in many ways, not to put it too tactlessly, more than a bit fucked-up.

Maher doesn't review the evidence in this regard, but he does allude to it, e.g interviewing the discoverer of the "God gene" ... the point is: it seems to be the case that religious experience and religious delusions are deeply tied to intrinsic properties of the human brain.

What this suggests is that the reason religion is so appealing to people is precisely that it is assigned a high prior probability by their Humean "human nature" ... that our brain structure, which evolved in superstitious pre-civilized societies, biases us towards selecting theories that not only explain our everyday empirical observations, but also involve talking animals, voices speaking from the sky, tribalism, physical rewards or punishments for moral transgressions, and so forth...

So when Maher says that "it's time for us to grow up" and let go of these ancient religious superstitions and just be rational and scientific ... two big problems initially appear to arise, based on cursory consideration of the philosophy of science:

  • There is no such thing as "just being rational" ... applying rationality to real observations always involves making some background assumptions
  • The ancient religious superstitions are closely related to patterns wired into our brains by evolution ... which are naturally taken by us as background assumptions...

So when he asks folks to drop their religious beliefs, is Maher really asking folks to self-modify their brains so as not to apply prior distributions supplied by evolution (which has adapted our cognitive patterns to superstitious, tribal society), and to instead apply prior distributions supplied by the scientific and rationalist tradition...?

If so, that would seem a really tough battle to fight. If this were the case, then essentially, the transcendence of religious superstitions would require a kind of cognitive transhumanism.

Fortunately, though I don't think the situation is quite that bad. Cognitive transhumanism (which I define as the attempt to go beyond innately-human patterns of thinking) certainly can be a huge help in the transcendence of superstitions, but it's not strictly necessary.

It appears to me that it's enough "just" to get people to think more clearly about the relationship between their theories and ideas, their community, and their community's collective observations. If people understand this relationship clearly, then it's not actually necessary for them to transcend their various superstition-oriented human biases in order for them to go beyond naive religious ideas.

To elaborate on this point further I'll need to get technical for a moment and introduce a bit of Bayesian statistics and algorithmic information theory...

The Communication Prior

I'll now shift from philosophical babbling to basic math for a few paragraphs.

Recall the basics of Bayes Theorem... . Setting T for "theory" and E for "evidence", it says:

P(T|E) = P(T) P(E|T)/P(E)

... i.e., it says that a person's subjective probability that a theory T is true given that they receive evidence E, should be equal to their prior probability that T is true times the probability that they would receive evidence E if hypothesis T were true, divided by the probability of E (and the latter is usually found by summing over the weighted conditional probabilities given all potential theories).

It is critical to note that, according to Bayes rule, one's conclusion about the probability of theory T given evidence E depends upon one's prior assignment of probabilities.

Now, a real mind with computational limitations cannot always apply Bayes rule accurately ... so the best we can do is approximate.

(Some cognitive theorists, such as Pei Wang, argue that a real mind shouldn't even try to approximate Bayes rule, but should utilize a different logic specially appropriate for cognitive systems with severe resource limitations ... but I don't agree with this and for the purpose of this blog post will assume it's not the case.)

But even if a mind has enough computational resources to apply Bayes rule correctly, there remains the problem of how to arrive at the prior assignment of probabilities?

The most commonsensical way is to use Occam's Razor, the maxim stating that simpler hypotheses should be considered a priori more probable. But this also leads to some subtleties....

The Occam maxim has been given mathematical form in the Solomonoff-Levin universal prior, which says very roughly that the probability of a hypothesis is higher if the computer-programs for computing that hypothesis are shorter (yes, there's more to it, so look it up if you're curious).

Slightly more rigorously, Wikipedia notes that:

The universal prior probability of any prefix p of a computable sequence x is the sum of the probabilities of all programs (for a universal computer) that compute something starting with p. Given some p and any computable but unknown probability distribution from which x is sampled, the universal prior and Bayes' theorem can be used to predict the yet unseen parts of x in optimal fashion.

Note in the above quote that the probability of a program may be estimated as the probability that the program is found by randomly selecting bits in the program-defining section of the memory of a computer.

Anyway: That's very nice for mathematicians, but it doesn't help us much in everyday life ... because even if we wanted to apply this kind of formalization in everyday life (say, to decide an issue like evolution vs. creationism), the mapping of real-world situations into mathematical formalisms is itself highly theory-laden....

So what we really need is not just a mathematical formalization of a universal prior, but a commonsensical formalization of a prior that is helpful for everyday human situations (even if not truly universal).

One suggestion I have is to use Solomonoff's core idea here, but interpret it a bit differently, in terms of everyday human communicational operations rather than mathematical, abstracted machine operations.

Paraphrasing the above quoted text, I propose that

The communicational prior probability of any prefix p of a computable sequence x, relative to a social group G and a body of evidence E, is the sum of the communicational probabilities (calculated relative to G and E) of all programs that compute something starting with p.

But how then to compute the communicational probability of a program relative to a social group G and body of evidence E?

As the name indicates, this is defined, not in terms of bit-flipping, but in terms of communication within the group.

I define the communicational probability of a program p, as being proportional to the average amount of time it would take a randomly chosen member A of group G to communicate p to another randomly chosen member B of group G, with sufficient accuracy that G can then evaluate the outputs of p on randomly selected inputs drawn from E.

(The assumption is that A already knows how to evaluate the program on inputs drawn from E.)

One can also bake a certain error rate into this definition, so that G has to be able to correctly evaluate the outputs of p only on a certain percentage of inputs drawn from E.

This defines what I suggest to call the Communication Prior.

A variant would be the communication-and-testing probability of a program p, definable as being proportional to the average, for randomly chosen members A and B in the social group such that A already knows how to evaluate p on inputs in E, of

  • the amount of time it would take A to communicate p to B, with sufficient accuracy that B can then evaluate the outputs of p on randomly selected inputs drawn from E
  • the amount of time it actually takes B to evaluate p on a randomly selected element of E
(One can of course weight the two terms in this average, if one wants to.)

Taking a bit of terminological liberty, I will also group this communication-testing variant as being under the umbrella of the "Communication Prior."

Pragmatically, what does this mean about theories?

Roughly speaking, it means that the a priori probability of a theory (i.e. the "bias toward" a theory) has to do with ease of effectively communicating that theory within a social group ... and (in the communication-testing variant), the ease of effectively communicating how to efficiently apply the theory.

Of course, the a priori probability theory doesn't tell you how good a theory is. Communicating a theory may be very simple, but so what ... unless the theory explains something. But the "explanation" part is taken care of in Bayes Rule, in the P(E | T) / P(E) fraction. If the observed evidence is not surprisingly likely given the assumption of the theory, then this fraction will be small.

The Communication Prior is similar in spirit to the Solomonoff-Levin Universal Prior ... but it's not about formal, mathematical, theoretical systems, it's about real-world social systems, such as human communities of scientists. In terms of philosophy of science, this is sort-of a big deal, as it bridges the gap between formalist and social-psychology-based theories of science.

What's the Take-Away from All That Techno-babble?

So, roughly speaking, the nontechnical take-away from the above technical excursion should be the following suggestion:

A theory should be considered good within a social group, to the extent that it explains the evidence better than it would explain a bunch of randomly selected evidence -- and it's reasonably rapid to effectively communicate, to others in the group, information about how to efficiently apply the theory to explain the available evidence.

This may seem simple or almost obvious, but it doesn't seem to have been said before, in quite so crisp of a way.

(In my prior essay on philosophy of science, I left off without articulating any sort of specific simplicity measure: the Communication Prior fills in that gap, thus bringing the ideas in that essay closer to practical applicability.)

Consider for instance the evolution vs. creationism argument. For my new suggestion to favor evolution over creationism, what would have to be true?

Whether the simple essential core of creationism or evolution is easier to communicate within a human social group, really depends on the particular social group.

However, the simple essential core of creationism does an extremely bad job of explaining why the observed body of evidence (e.g. the fossil record) is more likely than a lot of other possible bodies of evidence.

To make a version of creationism that would explain why the observed body of evidence is particularly likely, one would need to add a heck of a lot of special-pleading-type explanations onto the essential core of creationism. This is because creationism does not effectively compress or compactify the body of observed data.

So, to get a version of creationism that is equally explanatory of the particulars of the evidence as evolution, one needs to make a version of creationism that takes a long time to communicate.

Conclusion: creationism is worse than evolution.

(OK, we don't really need to go through so much complexity to get to such an obvious conclusion! But I'm just using that example to make a more general point, obviously.)

Why Is Religion a Bad Idea?

Getting back to the initial theme of this overlong, overdiverse blog post, then: why is religion a bad idea?

Because we should judge our theories using Bayes rule with a communication prior ... or in other words, by asking that they explain the particulars of observed reality in a relatively rapidly communicable way.

There is a balance between success-at-detailed-explanation and rapid-communicability, and the exact way to strike this balance is going to be subtle and in some cases subjective. But, in the case of religious beliefs, the verdict is quite clear: the religious world view, compared to the scientific world view, fails miserably at explaining the particulars of observed reality in a relatively rapidly communicable way.

The key point here is that, even if people want to stick with their evolutionary-legacy-based inductive biases (which make them intuitively favor superstitious explanations), the failure of religious theories to explain the particulars of observed reality is now so drastic and so obvious, that anyone who really carefully considers the evidence should reject these religious theories anyway.

Maher's film points out sensationalistically silly aspects of religious belief systems. But these aren't really the right anti-religion argument to use, in terms of philosophy of science and the theory of rationality. After all, are the Big Bang and Big Crunch and the evolution of humans from apes really any less everyday-ishly wacky than Judgment Day and the talking snake in the Garden of Eden?

The right argument to use is that, if one assumes Bayes rule plus a Communication Prior (or any other sensible, everyday-reality-based prior), then religious theories fail miserably.

Of course, almost no one on the planet can understand the previous sentence, though ... which is why his approach of dramatically emphasizing the most absurdly wacky religious beliefs and believers is probably a way more effective PR strategy!

The Emotion Prior

Finally, another suggestion I have regarding the popularity of religious beliefs has to do with something my ex-wife said to me once, shortly after her religious conversion to Buddhism, a topic about which we had numerous arguments (some heated, some more rational and interesting, none usefully conclusive nor convincing to either of us). What she said was: "I believe what I need to believe in order to survive."

She didn't just mean "to survive physically" of course ... that was never at issue (except insofar as emotional issues could have threatened her physical survival) ... what she meant was "to survive emotionally" ... to emotionally flourish ...

My (rather uncontroversial) suggestion is that in many cases religious people -- and others -- have a strong bias toward theories that they enjoy believing.

Or in other words: "If believing it feels good, it can't be wrong!"

This is probably the main issue in preaching atheism: one is asking people to

  • adopt (some approximant of) Bayes rule with a Communication Prior (or similar)
  • actually carefully look at the evidence that would be used in Bayes rule

... rather than to, on the other hand,

  • avoid looking at evidence that might disconfirm one's theory
  • utilize an Emotion Prior when evaluating various theories that might explain the evidence

The question is then whether, in each individual case,

  • the Emotion Prior outweights the Communication Prior (or similar)
  • the sociopsychological pressure to look at evidence outweighs the sociopsychological pressure to ignore it
Ignoring evidence gets harder and harder as the Internet broadcasts data to everyone, all the time....

To study these choices in an interesting way, one would need to model the internals of the believer's mind more subtly that has been done in this post so far....

But anyway ... the evidence of the clock in front of me is that I have spent too much time amusing myself by writing this blog post, and now have more useful things to do ... so, till next time!

P.S. Thanks to my wife Izabela for discussions leading to the introduction of the communication-testing variant of the Communication Prior, after the more basic version had already been formulated....