To follow this blog by email, give your address here...

Sunday, April 06, 2008

Artificial Wisdom (... episodic memory, general intelligence, the Tao of John Coltrane, and so forth)

Every now and then, someone suggests to me that, alongside the pursuit of Artificial Intelligence, we should also be pursuing "Artificial Wisdom."

I always figured the "artificial wisdom" idea was probably just a bunch of useless English-language wordplay -- but one night last week, while watching Idiocracy with the kids for the second time (great movie exploring a non-Singularity-based future by the way ... highly recommend it!), I spent a while surfing the Web on my laptop refreshing my memory on how others have construed the "wisdom" concept and musing on what it might mean for AI.

Surprisingly enough, this led in some moderately interesting directions -- nothing revolutionary, but enough to justify the couple hours spent musing about it (and another 90 minutes or so synthesizing and writing up my glorious conclusions).

My main conclusion was a perspective in which wisdom is viewed as one of three core aspects of intelligence, associated with three distinct types of memory:

  • cleverness, associated with declarative memory (and the ability to manipulate abstract, certain or uncertain declarative knowledge)
  • skillfulness, associated with procedural memory (and the ability to effectively learn and adapt new procedures based on experience)
  • wisdom, associated with episodic memory (and insightful drawing of large-scale conclusions therefrom)

This being a blog post, though, rather than just presenting my conclusion, I'll start out by recounting some of the winding and mostly irrelevant path that led me there ;-)

Classical Conceptions of Wisdom

I started out with the dictionary, and as usual found it close to useless....

A typical dictionary definition of "wisdom," which is not a heck of a lot of help, is from Wiktionary, which tells us that

wisdom (plural wisdoms)


  1. An element of personal character that enables one to distinguish the wise from the unwise.
  2. A piece of wise advice.
  3. The discretionary use of knowledge for the greatest good.
  4. The ability to apply relevant knowledge in an insightful way, especially to different situations from that in which the knowledge was gained.
  5. The ability to make a decision based on the combination of knowledge, experience, and intuitive understanding.
  6. (theology) The ability to know and apply spiritual truths.
and furthermore that



Showing good judgement or the benefit of experience.

Hoo haw.

These definitions don't give us any particularly interesting way of distinguishing "wisdom" from "intelligence." Essentially they define wisdom as either intelligence, spiritual insight, or the application of intelligence for ethical ends. Nothing new here.

Wikipedia is slightly more useful (but only slightly). Firstly it notes that

A standard philosophical, (philos-sophia: literally "lover of wisdom"), definition says that wisdom consists of making the best use of available knowledge.

It then notes some psychological research demonstrating that in popular culture, wisdom is considered as different from intelligence. Psychological researchers are quoted as saying that though "there is an overlap of the implicit theory of wisdom with intelligence, perceptiveness, spirituality and shrewdness, it is evident that wisdom is a distinct term and not a composite of other terms."

More interestingly, Wikipedia notes, Erik Erikson and other psychologists have argued that it is, in large part, the imminence of death that gives older human beings wisdom.

The knowledge of imminent death is seen as focusing the mind on concerns beyond its own individual well-being and survival, thus inducing a broader scope of understanding and an identification with the world at large, which are associated with the concept of wisdom.

This is interesting from a transhumanist perspective in that it suggests that the death of death would be the death of wisdom! I have seen some evidence for that in the incredible, shallow-minded selfishness of a certain subset of the transhumanist community -- people who are dead-set on having their own selves live forever, without any real thought as to why this might be valuable or what this might mean in a larger perspective. But of course, I don't really think death is the only or ultimate source of wisdom, though in a human context I can believe it's one of the main forces nudging us toward wisdom.

Paul Graham on Wisdom

One of the more interesting theories of wisdom I've run across (I found it a while ago for some random reason I've forgotten, and dug it up again last week) came from a contemporary blogger, Paul Graham:

who distinguishes wisdom from intelligence in the following way:

"Wise" and "smart" are both ways of saying someone knows what to do. The difference is that "wise" means one has a high average outcome across all situations, and "smart" means one does spectacularly well in a few.

This explanation also suggests why wisdom is such an elusive concept: there's no such thing. "Wise" means something—that one is on average good at making the right choice. But giving the name "wisdom" to the supposed quality that enables one to do that doesn't mean such a thing exists. To the extent "wisdom" means anything, it refers to a grab-bag of qualities as various as self-discipline, experience, and empathy

Graham considers wisdom as partly a kind of de-biasing and cleansing of the mind, a notion that has some resonance with the modern notion of "Bayesian calibration" of the mind:

Recipes for wisdom, particularly ancient ones, tend to have a remedial character. To achieve wisdom one must cut away all the debris that fills one's head on emergence from childhood, leaving only the important stuff. Both self-control and experience have this effect: to eliminate the random biases that come from your own nature and from the circumstances of your upbringing respectively. That's not all wisdom is, but it's a large part of it. Much of what's in the sage's head is also in the head of every twelve year old. The difference is that in the head of the twelve year old it's mixed together with a lot of random junk.

Provocatively, Graham also posits that intelligence is quite different from wisdom, in that it has to do with accentuating rather than avoiding biases:

The path to intelligence seems to be through working on hard problems. You develop intelligence as you might develop muscles, through exercise. But there can't be too much compulsion here. No amount of discipline can replace genuine curiosity. So cultivating intelligence seems to be a matter of identifying some bias in one's character -— some tendency to be interested in certain types of things—- and nurturing it. Instead of obliterating your idiosyncrasies in an effort to make yourself a neutral vessel for the truth, you select one and try to grow it from a seedling into a tree.

To avoid confusion, from here on I'll sometimes refer to Graham's interpretation of these concepts as Graham-style wisdom and Graham-style intelligence, respectively.

There is an unclarity in Graham's essay as to the extent to which he thinks the kind of focusing and bias-accentuation that's part of Graham-style intelligence has to involve irrationality. My own view is that Graham-style intelligence definitely does NOT require an individual to be irrational, in the sense of making suboptimal judgments about a particular problem given the resources devoted to thinking about the problem. However, a finite system in a complex environment is always going to be irrational to some measure, due to not having enough resources to make a fully analysis of any complex situation. To the extent that Graham-style intelligence involves heavy focus on some particular set of topic areas, it's going to drain resources from other areas, thus making the mind less intelligent regarding these other areas.

So, in Graham's view, intelligence has to do with focusing loads of resources on processing in a handful of narrow domains that match one's innate biases, whereas wisdom has to do with evenly distributing processing across all the different domains in one's environment.

Along these lines Graham also notes (correctly, I think) that:

The wise are all much alike in their wisdom, but very smart people tend to be smart in distinctive ways.

As Graham conceives it, wisdom is basically equivalent to general intelligence: it's intelligence averaged across a variety of situations. In mathematics there exist various sorts of averages, some of which weight extreme values more heavily than others (these are p'th power averages). Graham's view would be that "wisdom" and "intelligence" are both estimates of general intelligence (defined as intelligence averaged over different domains/tasks), but with different sorts of averaging: in the case of intelligence, an averaging that pays especial attention to extremes (say a p-power average with p=5, or whatever); and in the case of wisdom, a more typical arithmetic averaging.

This is all sort of nice, but (as will become clear as the essay unfolds) I don't really think it gets at the crux of the matter.

Wisdom Goes Beyond the Individual

Another interesting perspective (that I also think doesn't get at the crux of the matter) is given in the paper "Meaning generation and artificial wisdom" with abstract

We propose an interpretation of wisdom in terms of meaning generation in social groups. Sapient agents are able to generate useful meanings for other agents beyond their own capability of generation of self-meanings. This makes sapient agents specially valuable entities in agent societies because they provide interagent reliable third-person meaning generation that provides some functional redundancy that contributes to enhance individual and social robustness and global performance.

Here wisdom is identified with the ability to generate meaning in the social group, going beyond meaning that is perceptible by the individual doing the meaning-generating. This harks back to Erikson's understanding of wisdom as related to identification with the world at large, beyond the mind/body.

This view also reminds me vaguely of Aldous Huxley's Perennial Philosophy, an attempt to distill the "wisdom teachings" of all the world's religions. In the Perennial Philosophy, wisdom teaches that the individual self is an illusion and all of us are one with the universe (and yet in a sense still distinct and individual.)

Mulling over all this, none of it really satisfied me. Of course, a folks concept like "wisdom" can't be expected to have a crisp and sensible formalistic definition ... but it still seemed to me that all the attempts at systematization and formalization I'd read about were missing some really essential aspects of the folk concept.

Wisdom, Cleverness and Skillfulness

And so, I came up with a totally different idea....

After a fair bit of musing, my mind kept drifting to the familiar distinction between declarative, procedural and episodic memory (drawn from textbook cognitive psych).


  • Declarative knowledge = knowledge of facts, conjectures, hypotheses (abstract or concrete)
  • Procedural knowledge = knowledge of how to do things (could be physical, mental, social, etc.)
  • Episodic knowledge = knowledge of stories that have occurred in the history of intelligent beings (oneself, others one knows, others one has heard about,...)

One interesting thought that popped into my head is: The concept of wisdom, in its folk-psychology sense, has a lot to do with the ability to solve problems that are heavily dependent on context, using intuition that's based on large-scale analysis of one's episodic-memory store.

Or, less geekily: Wisdom consists of making intelligent use of experience.

A subtlety here is that this need not be one's own experience. Direct experience may be the best way to acquire wisdom (and surely this is part of the reason that wisdom is commonly associated with age) but some rare folks are remarkably gifted at absorbing wisdom from the experience of others -- absorbed via observation, via reading, or conversation, or whatever.

More broadly, this train of thought leads me to a sort of fundamental trinity of aspects of intelligence: cleverness, skillfulness and wisdom.

There's cleverness, which is the ability to appropriately manipulate, create and absorb declarative knowledge toward one's goals. This declarative knowledge may be abstract, or it may be concrete facts. Declarative knowledge is largely symbolic in nature, and cleverness is largely founded on adeptness at symbol-manipulation.

There's skillfulness, which is the ability to effectively do stuff in service of one's goals. This covers physical skills but also highly abstract mental skills like writing an essay, proving a theorem, or closing a business deal.

In some domains skillfulness can exist in the total absence of cleverness. The vast majority of shred metal guitarists would seem to fit in this category (to choose a somewhat random example based on what's playing in my headphones at the moment). These guys are so damn skilled, yet there's not much adept manipulation of meaning in their solos, or compositions. Compare the typical shred guitarist to Yngwie Malmsteen or Buckethead, who are also massively skilled (and in similar ways) -- but who are also highly clever in their symbolic manipulation of the abstract patterns characterizing the concrete sonic forms they're so skilled at producing.

In other domains, it's really hard for cleverness and skillfulness to emerge in any way except exquisitely intercombined. Mathematics is an example. Procedural knowledge at doing proofs is needed for fully understanding complex proofs -- because so many steps are left out in proofs as typically written down, if you don't know how to do proofs, you won't be able to fill in all the gaps in your head when you read a proof, so you'll never get more than a general understanding. On the other hand, it's even more obvious that deep declarative understanding and manipulation-ability regarding mathematical content is necessary to do mathematical proofs. Math is a domain where procedural and declarative intelligence have got to work in extremely tight synergy.

Finally, there's wisdom, which as I'm conceiving it here is the ability to intelligently draw conclusions from a vast repository of data regarding specific situations.

Human minds tend to organize data regarding specific situations using story-like, "narrative" structure, so that in human practice, wisdom often takes the form of the ability to mine appropriate abstract patterns from a vast pool of remembered stories.

Of course, the operation of human episodic memory is largely constructive -- we don't actually grab experiential data out of some sort of neurological database; rather, we synthesize stories from fragmentary images, stories, and such. Wisdom is about synthesizing appropriate stories from large databases of partially-remembered, ambiguous, fractional stories -- and then, as appropriate, using these stories to guide the creation of declarative or procedural knowledge.

In mathematics, wisdom is closely related to what's called "mathematical maturity" ... the general sense of how mathematics is done. Mathematical maturity guides the mind to interesting problems and interesting concepts ... and helps you choose an overall proof strategy (whereas it's cleverness and skillfulness that help you carry out the proof).

The transition from {cleverness + skillfulness} to wisdom in music is epitomized to me by the mid-to-late John Coltrane ... the Coltrane of "My Favorite Things" and "A Love Supreme." These are the solos of a man who has listened so much and played so much that he's disassembled thousands of different musical narratives and reassembled them to tell different kinds of stories, like no one ever told before. So much richer than the merely clever, skillful and emotionally moving solos of the early Coltrane. Certain works of great art manage to be intensely personal and dramatically universal at the same time, and
this often results from wisdom in the sense I'm defining it here.

Note that a mature mathematician or a world-changing jazz soloist need not be "wise" in the sense of a Taoist sage. The classical conception of wisdom has to do with making intelligent judgments based on large stores of experience in everyday human life. In the old days this was pretty much the only experience there was -- everyday human life plus various shamanic and psychedelic experiences.... But now the human world has become far more specialized, and it's possible to have a specialized wisdom, because it's possible to have a huge and rich store of episodic knowledge that's restricted to some special domain, like music or mathematics, or even a sufficiently complex game like Go or chess.

This vision of wisdom would seem to contradict Graham's, cited above -- he views wisdom as related to the ability to achieve goals over a broad variety of domains, in contract to intelligence which he conceives as a more narrowly domain-specialized intelligence.

But I don't think the contradiction is total.

I think that within a sufficiently rich and complex domain, one requires wisdom as I've defined it in order to achieve a really high level of intelligence. Learning skills and manipulating symbols is not enough. Direct and intelligent mining of massive experience-stores is needed.

I also think that wisdom, even if achieved initially and primarily within a certain domain, has a striking power to transcend domains. There are a lot of universal patterns among large stores of stories, no matter what the domain.

But even if the wisdom achieved by a great mathematician or chess player or jazz soloist helps that person to intuitively understand the way things work in other domains, this won't necessarily lead them to practical greatness in these other domains -- great achievement seems to require a synthesis of wisdom with either cleverness or skillfulness, and in some domains (like math or jazz improvisation) all three.

Defined-Problem versus Contextual Intelligence

Next, what does all this have to do with artificial intelligence?

One of the lessons learned in the last few decades of AI practice is that there is a pretty big difference between:

  1. Defined-problem intelligence: Problem-solving that occurs "after a crisply-defined problem statement has been identified", versus
  2. Contextual intelligence: problem-solving that is mainly concerned with interpreting general goals in the context of a complex situation, and, "figuring out what the context-specific problem is, in the first place" -- i.e. figuring out what crisply-defined problem, if solved in the relevant context, is likely to work toward the general goals at hand

I think this might be a more useful and more precise distinction than the "narrow AI" versus "general AI" distinction that I've often made before. It's ultimately getting at the same thing, but it's putting the point in a better way, I think.

What's narrow about "narrow AI" systems like chess-playing programs and medical diagnostic expert systems isn't merely that they're focused on specific, narrow domains. It's the fact that they operate based on defined-problem intelligence. It happens, though, that in some sufficiently specialized domains, defined-problem intelligence is enough to yield ass-kicking performance. In other domains it's not -- because in these other domains, figuring out what the problem is, is basically the problem.

I suggest that defined-problem intelligence is focused on declarative and procedural knowledge: i.e. it consists of cleverness or skillfulness or some combination thereof.

Logical reasoning systems, for example, are focused on declarative knowledge, and possess in some cases great facility at manipulating declarative knowledge.

Evolutionary learning systems and neural nets, on the other hand, are mainly focused on procedural knowledge -- on learning how to do stuff, without need for symbolic representations or symbol manipulations.

On the other hand: Contextual intelligence, I suggest, is a matter of knowing how to synthesize declarative and procedural knowledge, that representing problem-statements and problem-solutions, out of the combination of general goals and real-world situations.

I suggest that powerful contextual intelligence always relies upon powerful use of episodic memory, and associated mechanisms for storing, accessing, manipulating and analyzing sets of stories.

Or, briefly getting less geeky again: contextual intelligence requires wisdom.

Not at the level of the Taoist sage, John Coltrane or Riemann ... but at a way higher level than possessed by any currently operational AI system.

Note that defined-problem intelligence may sometimes draw on a wide body of background knowledge -- but it uses this background knowledge in a manner constrained by certain well-defined declarative propositions, or practical constraints on procedure-learning. It uses the background knowledge in a manner that doesn't require the background knowledge to be organized or accessed episodically -- rather, it uses background knowledge as a set of declarative facts, or data items, or constraints on actions, or procedures for doing specific things in specific types of situations.

"How to make a lot of money in Russia" is a problem that requires intense contextual as well as defined-problem intelligence. Whereas, "how to make a lot of money by trading oil futures on the Russian stock exchange" is more heavily weighted toward calculational intelligence, though it could be approached in a contextual-intelligence-heavy manner as well.

For instance, in the domain of bioinformatics, figuring out a rule that can diagnose a disease based on a gene expression microarray dataset, is a well-defined problem -- a problem that can be solved via focusing strictly on a small set of reasonably well-encapsulated information items. Declarative and/or procedural focused AI works well here ... much better than human intelligence.

On the other hand, figuring out which datasets are likely to be reliable, and figuring out how to normalize these datasets in a reasonable way based on the experimental apparatus described in the associated research paper, are tasks that require much more understanding of context, more milking of subtle patterns in episodic memory. I.e., I'm suggesting, more wisdom.

In the current practice of bioinformatic data analysis, human wisdom is needed to craft well-defined problems to feed into the superior (in this domain) declarative and procedural intelligence of narrow-AI bioinformatic data-analysis systems like the ones we've created at Biomind LLC.

Doing Time in the Universal Mind

Getting back to some of the ideas introduced at the start of this essay ... it seems all this ties in moderately closely with Erikson's definition and the Perennial Philosophy definition of "wisdom."

These definitions conceive wisdom as related to an understanding of life situations in a broader context than that of the individual body and mind. Wisdom as these thinkers conceive it, is a higher level of contextual intelligence than average humans display -- an ability to conceive daily situations in a broader-than-usual context.

This corresponds, really, to relying on a kind of collective episodic memory store, rather than just the episodic memory store corresponding to one's own life. By the time one is old, one is reviewing a longer life, and reviewing the past and future lives of one's children and grandchildren, and thinking about the whole scope of stories all these people may be involved in. A much richer contextuality.

Another ingredient of the Perennial Philosophy notion of wisdom is self-understanding, and I think that ties in here very closely too. One's own self is always part of the context, and to carry out really deep contextual understanding or problem-solving, one needs to appreciate how one's own history, knowledge and biases are affecting the situation and affecting one's own judgments. Powerful contextual intelligence -- unlike powerful calculational intelligence -- requires deep and broad self-understanding.

Wrapping Up

Sooo ... if we conceive wisdom as contextual intelligence powered by rich analysis of episodic memory, then it is clear that wisdom is a key aspect of general intelligence -- and is precisely the aspect that the AI research field has most abjectly ignored to date.

And it is also clear that ethical judgment is richly bound up with wisdom, as here conceived. Ethical judgment, in real life, is all about contextual understanding. It's not about following logical principles of ethics -- even when such principles are agreed-upon, real-life application always comes down to tricky context-specific intuitive judgments. Which comes down to understanding a vast pool of different situations, different episodes, that have existed in the lives of different human being and groups.

Defined-problem intelligence can be useful for ethical judgments. For instance in cases where scarce resources need to be divided fairly among a large number of parties with complex interrelationships and constraints, one has a well-defined problem of figuring out the optimally ethical balance, or a reasonable approximation thereof. But this actually seems an exceptional case, and the default case of ethical judgment seems to be to rely much more heavily on contextual than defined-problem intelligence.

Just to be clear: I'm not claiming that the conception of "wisdom" I've outlined here thoroughly captures all aspects of the natural-language/folk-psychology term "wisdom." Like "mind", "intelligence" and so forth, "wisdom" is a fuzzy term that amalgamates various different overlapping meanings ... it's not the kind of thing that CAN be crisply defined and analyzed once and for all.

What I hope to have done is to extract from the folks concept of wisdom some more precise, interesting and productive ideas, that closely relate to this folk concept but don't pretend to exhaust it.

In short...

  • General intelligence = defined-problem intelligence + contextual (problem-defining) intelligence
  • Calculational intelligence = cleverness (declarative intelligence) + skillfulness (procedural intelligence)
  • Contextual intelligence = in the human context, highly reliant on large-scale analysis of episodic memory
  • Wisdom = interestingly interpreted as contextual intelligence
  • Ethics = heavily reliant on wisdom

In this view, not surprisingly, the pursuit of Artificial Wisdom emerges as a subtask of the pursuit of Artificial General Intelligence. But what's interesting is it emerges as a complementary subtask to the one that most of the AI community is working on at the moment -- narrow-AI, or artificial defined-problem intelligence.

There is a bit of work in the AI community on narrative and story understanding. But most of this work seems, well, overly artificial. It has to do with formalistic systems for representing story structure. That is just not how we do things, in our human minds, and I suspect it's not an effective path at all.

I don't at the moment know any way to give an AGI system a rich understanding of episodes in the world than to actually embed it in the world and let is learn via experiencing. Virtual worlds may be a great start, given the amount of rich social interaction now occurring therein.

Thus I conclude that an excessive focus on narrow-AI research is, well, un-wise ;-)

And physically or virtually embodied AGI may potentially be a wise approach...

And I return again to the apparent wisdom of integrative AI approaches. Cleverness, skillfulness and wisdom are, I suggest, separate aspects of intelligence, which are naturally implemented in an AI system as separate modules -- but modules which must be architected for close inter-operation, because the real crux of general intelligence is the synergetic fusion of the three.