Friday, March 26, 2010

The GOLEM Eats the Chinese Parent (Toward An AGI Meta-Architecture Enabling Both Goal Preservation and Radical Self-Improvement)

I thought more about the ideas in my previous blog post on the "Chinese Parent Theorem," and while I didn't do a formal proof yet, I did write up the ideas a lot more carefully

GOLEM: Toward An AGI Meta-Architecture Enabling Both Goal Preservation and Radical Self-Improvement

and IMHO they make even more sense now....

Also, I changed the silly name "Chinese Parent Meta-Architecture" to the sillier name "GOLEM" which stands for "Goal-Oriented LEarning Meta-architecture"

The GOLEM ate the Chinese Parent!

I don't fancy that GOLEM, in its present form, constitutes a final solution to the problem of "making goal preservation and radical self-improvement compatible" -- but I'm hoping it points in an interesting and useful direction.

(I still have some proofs about GOLEM sketched in the margins of a Henry James story collection, but the theorems are pretty weak and I'm not sure when I'll have time to type them in. If they were stronger theorems I would be more inspired for it. Most of the work in typing them in would be in setting up the notations ;p ....)

But Would It Be Creative?

In a post on the Singularity email list, Mike Tintner made the following complaint about GOLEM:

Why on earth would you want a "steadfast" AGI? That's a contradiction of AGI.

If your system doesn't have the capacity/potential to revolutionise its goals - to have a major conversion, for example, from religiousness to atheism, totalitarianism to free market liberalism, extreme self-interest and acquisitiveness to extreme altruism, rational thinking to mystical thinking, and so on (as clearly happens with humans), gluttony to anorexia - then you don't have an AGI, just another dressed-up narrow AI.

The point of these examples should be obviously not that an AGI need be an intellectual, but rather that it must have the capacity to drastically change

  1. the priorities of its drives/goals,
  2. the forms of its goals

and even in some cases:

3. eliminate certain drives (presumably secondary ones) altogether

My answer was as follows:

I believe one can have an AGI that is much MORE creative and flexible in its thinking than humans, yet also remains steadfast in its top-level goals...

As an example, imagine a human whose top-level goal in life was to do what the alien god on the mountain wanted. He could be amazingly creative in doing what the god wanted -- especially if the god gave him

  • broad subgoals like "do new science", "invent new things", "help cure suffering" , "make artworks", etc.
  • real-time feedback about how well his actions were fulfilling the goals, according to the god's interpretation
  • advice on which hypothetical actions seemed most likely to fulfill the goals, according to the god's interpretation

But his creativity would be in service of the top-level goal of serving the god...

This is like the GOLEM architecture, where

  • the god is the GoalEvaluator
  • the human is the rest of the GOLEM architecture

I fail to see why this restricts the system from having incredible, potentially far superhuman creativity in working on the goals assigned by the god...

Part of my idea is that the GoalEvaluator can be a narrow AI, thus avoiding an infinite regress where we need an AGI to evaluate the goal-achievement of another AGI...

Can the Goal Evaluator Really Be a Narrow AI?

A dialogue with Abram Demski on the Singularity email list led to some changes to the original GOLEM paper.

The original version of GOLEM states that the GoalEvaluator would be a Narrow AI, and failed to make the GoalEvaluator rely on the Searcher to do its business...

Abram's original question, about this original version, was "Can the Goal Evaluator Really Be a Narrow AI?"

My answer was:

The terms narrow-AI and AGI are not terribly precise...

The GoalEvaluator needs to basically be a giant simulation engine, that tells you: if program P is run, then the probability of state W ensuing is p. Doing this effectively could involve some advanced technologies like probabilistic inference, along with simulation technology. But it doesn't require an autonomous, human-like motivational system. It doesn't require a system that chooses its own actions based on its goals, etc.

The question arises, how does the GoalEvaluator's algorithmics get improved, though? This is where the potential regress occurs. One can have AGI_2 improving the algorithms inside AGI_1's GoalEvaluator. The regress can continue, till eventually one reaches AGI_n whose GoalEvaluator is relatively simple and AGi-free...


After some more discussion, Abram made some more suggestions, which led me to generalize and rephrase his suggestions as follows:

If I understand correctly, what you want to do is use the Searcher to learn programs that predict the behavior of the GoalEvaluator, right? So, there is a "base goal evaluator" that uses sensory data and internal simulations, but then you learn programs that do approximately the same thing as this but much faster (and maybe using less memory)? And since this program learning has the specific goal of learning efficient approximations to what the GoalEvaluator does, it's not susceptible to wire-heading (unless the whole architecture gets broken)...

After the dialogue, I incorporated this suggestion into the GOLEM architecture (and the document linked from this blog post).

Thanks Abram!!

1 comment:

Tim Tyler said...

Thanks for the thoughts.

I felt some bits needed spelling out a bit more. I wondered which bits were considered to be the "control code", for instance.