To follow this blog by email, give your address here...

Friday, April 16, 2010

"Conceptual Spaces" and AGI

One of my AI collaborators from the late 1990s, Alexandru Czimbor, recently suggested I take a look at Peter Gardenfors book "Conceptual Spaces."

I read it and found it interesting, and closely related to some aspects of my own AGI approach ... this post contains some elements of my reaction to the book.

Gardenfors' basic thesis is that it makes sense to view a lot of mind-stuff in terms of topological or geometrical spaces: for example topological spaces with betweenness, or metric spaces, or finite-dimensional real spaces. He views this as a fundamentally different mind-model than the symbolic or connectionist perspectives we commonly hear about. Many of his examples are drawn from perception (e.g. color space) but he also discusses abstract concepts. He views both conceptual spaces and robust symbolic functionality as (very different) emergent properties of intelligent systems. Specific cognitive functions that he analyzes in terms of conceptual spaces include concept formation, classification, and inductive learning.

About the Book Itself

This blog post is mainly a review of the most AGI-relevant ideas in Gardenfors book, and their relationship to my own AI work ... not a review of his book. But I'll start with a few comments on the book as a book.

Basically, the book reads sorta like a series of academic philosophy journal papers, carefully woven together into a book. It's carefully written, and technical points are elucidated in ordinary language. There are a few equations here and there, but you could skip them without being too baffled. The pace is "measured." The critiques of alternate perspectives on AI strike me as rather facile in some places (more on that below), and -- this is a complaint lying on the border between exposition and content -- there is a persistent unclarity regarding which of his ideas require a dimensional model of mind-stuff, versus which merely require a metric-space or weaker topological model. More on the latter point below.

If you're interested in absorbing a variety of well-considered perspectives on the nature of the mind, this is certainly a worthwhile book to pay attention to. I'd stop short of calling it a must-read, though.

Mindspace as Metric Space

I'll start with the part of Gardenfors thesis that I most firmly agree with.

I agree that it makes sense to view mind-stuff as a metric space. Percepts, concepts, actions, relationships and so forth can be used as elements of a metric space, so that one can calculate distances and similarities between them.

As Gardenfors points out, this metric structures lets one do a lot of interesting things.

For instance, it gives us a notion of between-ness. As an example of why this is helpful, suppose one wants to find a way of drawing conclusions about Chinese politics from premises about Chinese individual personality. It's very helpful, in this case, to know which concepts lie in some sense "between" personality and politics in the conceptual metric space.

It also lets us specify the "exemplar" theory of concepts in an elegant way. Suppose that we have N prototypes, or more generally N "prototype-sets", each corresponding to a certain concept. We can then assign a new entity X to one of these concepts, based on which prototype or prototype-set it's closest to (where "close" is defined in terms of the metric structure).

Mindspace as Dimensional Space

Many of Gardenfors ideas only require metric space, but others go further and require dimensional space -- and one of my complaints with the book is that he's not really clear on which ideas fall into which category.

For instance, he cites some theorems that if one defines concepts via proximity to prototypes (as suggested above) in a dimensional space, then it follows that concepts are convex sets. The theorem he gives holds in dimensional spaces but it seems to me this should also hold in more general metric spaces, though I haven't checked the mathematics.

This leads up to his bold and interesting hypothesis that natural concepts are convex sets in mindspace.

I find this hypothesis fascinating, partly because it ties in with the heuristic assumption made in my own Probabilistic Logic Networks book, that natural concepts are spheres in mindspace. Of course I don't really believe natural concepts are spheres, but this was a convenient assumption to make to derive certain probabilistic inference formulas.

So my own suspicion is that cognitively natural concepts don't need to be convex, but there is a bias for them to be. And they also don't need to be roughly spherical, but again I suspect there is a bias for them to be.

So I suspect that Gardenfors hypothesis about the convexity of natural concepts is an exaggeration of the reality -- but still a quite interesting idea.

If one is designing a fitness function F for a concept-formation heuristic, so that F(C) estimates the likely utility of concept C, then it may be useful to incorporate both convexity and sphericality as part of the fitness function.

Conceptual Space and the Problem of Induction

Gardenfors presents the "convexity of natural concepts" approach as a novel solution to the problem of induction, via positing a hypothesis that when comparing multiple concepts encapsulating past observations, one should choose the convex concepts as the basis for extrapolation into the future. This is an interesting and potentially valuable idea, but IMO positing it as a solution to the philosophical induction problem is a bit peculiar.

What he's doing is making an a priori assumption that convex concepts -- in the dimensional space that the brain has chosen -- are more likely to persist from past to future. Put differently, he is assuming that "the tendency of convex concepts to continue from past into future",
a pattern he has observed during his past, is going to continue into his future. So, from the perspective of the philosophical problem of induction, his approach this still requires one to make a certain assumption about some properties of past experience continuing into the future.

He doesn't really solve the problem of induction -- what he does is suggest a different a priori assumption, a different "article of faith", which if accepted can guide be used to guide induction. Hume (when he first posed the problem of induction) suggested that "human nature" guides induction, and perhaps Gardenfors' suggestion is part of human nature.

Relating Probabilistic Logic and Conceptual Geometry

Gardenfors conceives the conceptual-spaces perspective as a radically different alternative to
the symbolic and subsymbolic perspectives. However, I don't think this is the right way to look at it. Rather, I think that

  1. a probabilistic logic system can be considered as a metric space (and this is explained in detail in the PLN book)
  2. either a probabilistic logic system or a neural network system can be projected into a dimensional space (using dimensional embedding algorithms such as developed by Haren and Koren among others, and discussed on the OpenCog wiki site)

Because of point 1, it seems that most of Gardenfors' points actually apply within a probabilistic logic system. One can even talk about convexity in a general metric space context.

However, there DO seem to be advantages to projecting logical knowledge bases into dimensional spaces, because certain kinds of computation are much more efficient in dimensional spaces than in straightforward logical representations. Gardenfors doesn't make this point in this exact way, but he hints at it when he says that dimensional spaces get around some of the computational problems plaguing symbolic systems. For instance, if you want to quickly get a list of everything reasonably similar to a given concept -- or everything along a short path between concept A and concept B -- these queries are much more efficiently done in a dimensional- space representation than in a traditional logic representation.

Gardenfors points out that, in a dimensional formulation, prototype-based concepts correspond to cells in Voronoi or generalized Voronoi tesselations. This is interesting, and in a system that generates dimensional spaces from probabilistic logical representations, it suggests a nice concept formation heuristic: tesselate the dimensional space based on a set of prototypes, and then create new concepts based on the cells in the tesselation.

This brings up the question of how to choose the prototypes. If one uses the Harel and Koren embedding algorithm, it's tempting to choose the prototypes as equivalent to the pivots, for which we already have a heuristic algorithm. But this deserves more thought.

Summary

Gardenfors' book gives many interesting ideas, and in an AGI design/engineering context, suggests some potentially valuable new heuristics. However its claim to have a fundamentally novel approach to modeling and understanding intelligence seems a bit exaggerated. Rather than a fundamentally disjoint sort of representation, "topological and geometric spaces" are just a different way of looking at the same knowledge represented by other methods such as probabilistic logic. Probabilistic logic networks are metric spaces, and can be projected into dimensional spaces; and the same things are likely true for many other representation schemes as well. But Gardenfors gives some insightful and maybe useful new twists on the use of dimensional spaces in intelligent systems.

Owning Our Actions: Natural Autonomy versus Free Will

At the Toward a Scinece of Consciousness conference earlier this week, I picked up a rather interesting book to read on the flight home: Henrik Walter's "The Neurophilosophy of Free Will" ....

It's an academic philosophy tome -- fairly well-written and clear for such, but still possessing the dry and measured style that comes with that genre.

But the ideas are quite interesting!

Walter addresses the problem: what kind of variant of the intuitive "free will" concept might be compatible with what neuroscience and physics tell us.

He decomposes the intuitive notion of free will into three aspects:

  1. Freedom: being able to do otherwise
  2. Intelligibility: being able to understand the reasons for one's actions
  3. Agency: being the originator of one's actions

He argues, as many others have done, that there is no way to salvage the three of these in their obvious forms, that is consistent with known physics and neuroscience. I won't repeat those arguments here. [There are much better references, but I summarized some of the literature here, along with some of my earlier ideas on free will (which don't contradict Walter's ideas, but address different aspects)]

Walter then argues for a notion of "natural autonomy," which replaces the first and third of these aspects with weaker things, but has the advantage of being compatible with known science.

First I'll repeat his capsule summary of his view, and then translate it into my own language, which may differ slightly from his intentions.

He argues that "we possess natural autonomy when

  1. under very similar circumstances we could also do other than what we do (because of the chaotic nature of the brain)
  2. this choice is understandable (intelligible -- it is determined by past events, by immediate adaptation processes in the brain, and partially by our linguistically formed environment)
  3. it is authentic (when through reflection loops with emotional adjustments we can identify with that action)"

The way I think about this is that, in natural autonomy as opposed to free will,

  • Freedom is replaced with: being able to do otherwise in very similar circumstances
  • Agency is replaced with: emotionally identifying one's phenomenal self as closely dynamically coupled with the action

Another way to phrase this is: if an action is something that


  • depends sensitively on our internals, in the sense that slight variations in the environment or our internals could cause us to do something significantly different
  • we can at least roughly model and comprehend in a rational way, as a dynamical unfolding from precursors and environment into action was closely coupled with our holistic structure and dynamics, as modeled by our phenomenal self

then there is a sense in which "we own the action." And this sense of "ownership of an action" or "natural autonomy" is compatible with both classical and quantum physics, and with the known facts of neurobiology.

Perhaps "owning an action" can take the place of "willing an action" in the internal folk psychology of people who are not comfortable with the degree to which the classical notion of free will is illusory.

Another twist that Walter doesn't emphasize is that even actions which we do own, often

  • depend with some statistical predictability upon our internals, in the sense that agents with very similar internals and environments to us, have a distinct but not necessarily overwhelming probabilistic bias to take similar actions to us

This is important for reasoning rationally about our own past and future actions -- it means we can predict ourselves statistically even though we are naturally autonomous agents who own our own actions.

Free will is often closely tied with morality, and natural autonomy retains this. People who don't "take responsibility for their actions" in essence aren't accepting a close dynamical coupling between their phenomenal self and their actions. They aren't owning their actions, in the sense of natural autonomy -- they are modeling themselves as NOT being naturally autonomous systems, but rather as systems whose actions are relatively uncoupled with their phenomenal self, and perhaps coupled with other external forces instead.

None of this is terribly shocking or revolutionary-sounding -- but I think it's important nonetheless. What's important is that there are rational, sensible ways of thinking about ourselves and our decisions that don't require the illusion of free will, and also don't necessarily make us feel like meaningless, choiceless deterministic or stochastic automata.

Friday, March 26, 2010

The GOLEM Eats the Chinese Parent (Toward An AGI Meta-Architecture Enabling Both Goal Preservation and Radical Self-Improvement)

I thought more about the ideas in my previous blog post on the "Chinese Parent Theorem," and while I didn't do a formal proof yet, I did write up the ideas a lot more carefully

GOLEM: Toward An AGI Meta-Architecture Enabling Both Goal Preservation and Radical Self-Improvement

and IMHO they make even more sense now....

Also, I changed the silly name "Chinese Parent Meta-Architecture" to the sillier name "GOLEM" which stands for "Goal-Oriented LEarning Meta-architecture"

The GOLEM ate the Chinese Parent!

I don't fancy that GOLEM, in its present form, constitutes a final solution to the problem of "making goal preservation and radical self-improvement compatible" -- but I'm hoping it points in an interesting and useful direction.

(I still have some proofs about GOLEM sketched in the margins of a Henry James story collection, but the theorems are pretty weak and I'm not sure when I'll have time to type them in. If they were stronger theorems I would be more inspired for it. Most of the work in typing them in would be in setting up the notations ;p ....)

But Would It Be Creative?

In a post on the Singularity email list, Mike Tintner made the following complaint about GOLEM:



Why on earth would you want a "steadfast" AGI? That's a contradiction of AGI.


If your system doesn't have the capacity/potential to revolutionise its goals - to have a major conversion, for example, from religiousness to atheism, totalitarianism to free market liberalism, extreme self-interest and acquisitiveness to extreme altruism, rational thinking to mystical thinking, and so on (as clearly happens with humans), gluttony to anorexia - then you don't have an AGI, just another dressed-up narrow AI.

The point of these examples should be obviously not that an AGI need be an intellectual, but rather that it must have the capacity to drastically change

  1. the priorities of its drives/goals,
  2. the forms of its goals

and even in some cases:

3. eliminate certain drives (presumably secondary ones) altogether
.


My answer was as follows:

I believe one can have an AGI that is much MORE creative and flexible in its thinking than humans, yet also remains steadfast in its top-level goals...

As an example, imagine a human whose top-level goal in life was to do what the alien god on the mountain wanted. He could be amazingly creative in doing what the god wanted -- especially if the god gave him

  • broad subgoals like "do new science", "invent new things", "help cure suffering" , "make artworks", etc.
  • real-time feedback about how well his actions were fulfilling the goals, according to the god's interpretation
  • advice on which hypothetical actions seemed most likely to fulfill the goals, according to the god's interpretation

But his creativity would be in service of the top-level goal of serving the god...

This is like the GOLEM architecture, where

  • the god is the GoalEvaluator
  • the human is the rest of the GOLEM architecture

I fail to see why this restricts the system from having incredible, potentially far superhuman creativity in working on the goals assigned by the god...


Part of my idea is that the GoalEvaluator can be a narrow AI, thus avoiding an infinite regress where we need an AGI to evaluate the goal-achievement of another AGI...

Can the Goal Evaluator Really Be a Narrow AI?


A dialogue with Abram Demski on the Singularity email list led to some changes to the original GOLEM paper.

The original version of GOLEM states that the GoalEvaluator would be a Narrow AI, and failed to make the GoalEvaluator rely on the Searcher to do its business...

Abram's original question, about this original version, was "Can the Goal Evaluator Really Be a Narrow AI?"

My answer was:

The terms narrow-AI and AGI are not terribly precise...

The GoalEvaluator needs to basically be a giant simulation engine, that tells you: if program P is run, then the probability of state W ensuing is p. Doing this effectively could involve some advanced technologies like probabilistic inference, along with simulation technology. But it doesn't require an autonomous, human-like motivational system. It doesn't require a system that chooses its own actions based on its goals, etc.

The question arises, how does the GoalEvaluator's algorithmics get improved, though? This is where the potential regress occurs. One can have AGI_2 improving the algorithms inside AGI_1's GoalEvaluator. The regress can continue, till eventually one reaches AGI_n whose GoalEvaluator is relatively simple and AGi-free...

...

After some more discussion, Abram made some more suggestions, which led me to generalize and rephrase his suggestions as follows:

If I understand correctly, what you want to do is use the Searcher to learn programs that predict the behavior of the GoalEvaluator, right? So, there is a "base goal evaluator" that uses sensory data and internal simulations, but then you learn programs that do approximately the same thing as this but much faster (and maybe using less memory)? And since this program learning has the specific goal of learning efficient approximations to what the GoalEvaluator does, it's not susceptible to wire-heading (unless the whole architecture gets broken)...

After the dialogue, I incorporated this suggestion into the GOLEM architecture (and the document linked from this blog post).

Thanks Abram!!

Wednesday, March 17, 2010

"Chinese Parent Theorem"?: Toward a Meta-Architecture for Provably Steadfast AGI

Continuing my series of (hopefully edu-taining ;) blog posts presenting speculations on goal systems for superhuman AGI systems, this one deals with the question of how to create an AGI system that will maintain its initial goal system even as it revises and improves itself -- and becomes so much smarter that in many ways it becomes incomprehensible to its creators or its initial conditions.

This is closely related to the problem Eliezer Yudkowsky has described as "provably Friendly AI." However, I would rather not cast the problem that way, because (as Eliezer of course realizes) there is an aspect of the problem that isn't really about "Friendliness" or any other particular goal system content, but is "merely" about the general process of goal-content preservation under progressive self-modification.

Informally, I define an intelligent system as steadfast if it continues to pursue the same goals over a long period of time. In this terminology, one way to confront the problem of creating predictably beneficial AGI, is to solve the two problems of:

  1. Figuring out how to encapsulate the goal of beneficialness in an AGI's goal system
  2. Figuring out how to create (perhaps provably) steadfast AGI, in a way that applies to the "beneficialness" goal among others
My previous post on Coherent Aggregated Volition (CAV) dealt with the first of these problems. This post deals with the second. My previous post on predictably beneficial AGI deals with both.

The meat of this post is a description of an AGI meta-architecture, that I label the Chinese Parent Meta-Architecture -- and that I conjecture could be proved to be steadfast, under some reasonable (though not necessarily realistic, since the universe is a mysterious place!) assumptions about the AGI system's environment.

I don't actually prove any steadfastness result here -- I just sketch a vague conjecture, which if formalized and proved would deserve the noble name "Chinese Parent Theorem."

I got partway through a proof yesterday and it seemed to be going OK, but I've been distracted by more practical matters, and so for now I decided to just post the basic idea here instead...

Proving Friendly AI

Eliezer Yudkowsky has described his goal concerning “proving Friendly AI” informally as follows:

The putative proof in Friendly AI isn't proof of a physically good outcome when you interact with the physical universe.

You're only going to try to write proofs about things that happen inside the highly deterministic environment of a CPU, which means you're only going to write proofs about the AI's cognitive processes.

In particular you'd try to prove something like "this AI will try to maximize this goal function given its beliefs, and it will provably preserve this entire property (including this clause) as it self-modifies".

It seems to me that proving something like this shouldn’t be sooooo hard to achieve if one assumes some basic fixed “meta-architectural” structure on the part of the AI, rather than permitting total unrestricted self-modification. Such a meta-architecture can be assumed without placing any limits on the AI’s algorithmic information content, for example.

Of course, preservation of the meta-architecture can be assumed as part of the AI system's goal function. So by assuming a meta-architecture, one may be able to prove a result restricted to a certain broad class of goal functions ... and the question becomes whether that class is broad enough to be interesting.

So my feeling is that, if one wants to pursue such a research direction, it makes sense to begin by proving theorems restricted to goals embodying some assumptions about fixed program structure -- and then try to improve the theorems by relaxing the assumptions.

A Simple AGI Meta-Architecture with the Appearance of Steadfastness

After writing the first draft of this post, I discussed the "provably steadfast AGI" problem with a clever Chinese friend, and she commented that what the self-modifying AGI needs (in order to maintain its original goal content as it self-modifies) is a traditional Chinese parent, who will watch the system from the outside as it self-modifies, and continually nag it and pester it and remind it of its original goals.

At first I thought this was just funny, but then it occurred to me that it was actually the same idea as my meta-architecture! My GoalEvaluator component (in the meta-architecture below) is basically a ChineseParent component, living separately from the rest of the system and providing ongoing, fixed goal evaluation.

The thought-experiment I've been exploring is: an AGI system S with the following high-level meta-architecture:

  • Goal Evaluator = external system (not part of S) that tells S, for each possible future world (including environment states and internal program states), how well this world fulfills the goal. This is the "Chinese parent."
  • HistoricalRepository = database storing the past history of S’s internal states and actions, as well as information about the environment during S’s past
  • Operating Program = the program that S is governing its actions by, at a given point in time
    • chosen by the Metaprogram as the best program the Searcher has found, where "best" is judged as "highest probability of goal achievement" based on the output of the Predictor and the Goal Evaluator
  • Predictor = program that estimates, given a candidate operating program P and a possible future world W, the odds of P leading to W
  • Searcher = program that searches through program space to find a new program optimizing a provided objective function
  • Tester = hard-wired program that estimates the quality of a candidate Predictor, using a simple backtesting methodology
    • That is, the Tester assesses how well a Predictor would have performed in the past, using the data in the HistoricalRepository
  • Metaprogram = fixed program that uses Searcher program to find a good
    • Searcher program (judged by the quality of the programs it finds, as judged by the Predictor program)
    • Predictor program (as judged by the Tester’s assessments of its predictions)
    • Operating Program (judged by Predictor working with Goal Evaluator, according to the idea of choosing an Operating Program with the maximum expected goal achievement)
    • The metaprogram also determines the amount of resources to allocate to searching for a Searcher versus a Predictor versus an OP, according to a fixed algorithm for parameter adaptation.

Next, assume that the goal embodied by the GoalEvaluator includes, as a subgoal, the preservation of the overall meta-architecture described above.

Then, it seems to me one can very likely prove that the program will maintain its architecture while seeking to maximize F, under the assumptions that

  • the Goal Evaluator (i.e. the "Chinese parent") does its job correctly
  • no one messes with the underlying hardware.

If proved, this would be the so-called "Chinese Parent Theorem" ;-) .... I don't have time to work on such a proof right now, but would be happy to collaborate with someone on this!

As noted above, this approach doesn't allow full self-modification; it assumes certain key parts of the AGI (meta)architecture are hard-wired. But the hard-wired parts are quite basic and leave a lot of flexibility. So a "Chinese Parent Theorem" of this nature would cover a fairly broad and interesting class of goal functions, it seems to me.

What happens if one implements the Goal Evaluator according to the same architecture, though? In this case, one must postulate a meta-Goal-Evaluator, whose goal is to specify the goals for the first Goal Evaluator: the Chinese Grandparent! Eventually the series must end, and one must postulate an original ancestor Goal Evaluator that operates according to some other architecture. Maybe it's a human, maybe it's CAV, maybe it's some hard-wired code. Hopefully it's not a bureaucratic government committee ;-)

Niggling Practical Matters and Future Directions

Of course, this general schema could be implemented using OpenCog or any other practical AGI architecture as a foundation -- in this case, OpenCog is "merely" the initial condition for the Predictor and Searcher. In this sense, the approach is not extraordinarily impractical.

However, one major issue arising with the whole meta-architecture proposed is that, given the nature of the real world, it's hard to estimate how well the Goal Evaluator will do its job! If one is willing to assume the above meta-architecture, and if a proof along the lines suggested above can be found, then the “predictably beneficial” part of the problem of "predictably beneficial AGI" is largely pushed into the problem of the Goal Evaluator.

Returning to the "Chinese parent" metaphor, what I suggest may be possible to prove is that given an effective parent, one can make a steadfast child -- if the child is programmed to obey the parent's advice about its goals, which include advice about its meta-architecture. The hard problem is then ensuring that the parent's advice about goals is any good, as the world changes! And there's always the possibility that the parents ideas about goals shift over time based on their interaction with the child (bringing us into the domain of modern or postmodern Chinese parents ;-D)

Thus, I suggest, the really hard problem of making predictably beneficial AGI probably isn't "preservation of formally-defined goal content under self-modification." This may be hard if one enables total self-modification, but I suggest it's probably not that hard if one places some fairly limited restrictions on self-modification. The hypothetical Chinese Parent Theorem vaguely outlined here can probably be proved and then strengthened pretty far, reducing meta-architectural assumptions considerably.

The really hard problem, I suspect, is how to create a GoalEvaluator that correctly updates goal content as new information about the world is obtained, and as the world changes -- in a way that preserves the spirit of the original goals even if the details of the original goals need to change. Because the "spirit" of goal content is a very subjective thing.

One approach to this problem, hinted above, would be to create a GoalEvaluator operating according to CAV . In that case, one would be counting on (a computer-aggregated version of) collective human intuition to figure out how to adapt human goals as the world, and human information about it, evolves. This is of course what happens now -- but the dynamic will be much more complex and more interesting with superhuman AGIs in the loop. Since interacting with the superhuman AGI will change human desires and intuitions in all sorts of ways, it's to be expected that such a system would NOT eternally remain consistent with original "legacy human" goals, but would evolve in some new and unpredicted direction....

A deep and difficult direction for theory, then, would be to try to understand the expected trajectories of development of systems including


  • a powerful AGI, with a Chinese Parent meta-architecture as outlined here (or something similar), whose GoalEvaluator is architected via CAV based on the evolving state of some population of intelligent agents
  • the population of intelligent agents, as ongoingly educated and inspired by both the world and the AGI


as they evolve over time and interact with a changing environment that they explore ever more thoroughly.

Sounds nontrivial!