- Create an AGI architecture that makes it very likely the AGI will pursue its goal-system content in a rational way based on the information available to it
- Create a goal system whose structure and dynamics render it likely for the AGI to maintain the spirit of its initial goal system content, even as it encounters radically different environmental phenomena or as it revises its own ideas or sourcecode
- Create goal system content that, if maintained as goal system content and pursued rationally, will lead the AGI system to be beneficial to humans
One potential solution proposed for the third problem, the goal system content problem, is Eliezer Yudkowsky's "Coherent Extrapolated Volition" (CEV) proposal. Roko Mijic has recently proposed some new ideas related to CEV, which place the CEV idea within a broader and (IMO) clearer framework. This blog post presents some ideas in the same direction, describing a variant of CEV called Coherent Aggregated Volition (CAV), which is intended to capture much of the same spirit as CEV, but with the advantage of being more clearly sensible and more feasibly implementable (though still very difficult to implement in full). In fact CAV is simple enough that it could be prototyped now, using existing AI tools.
(One side note before getting started: Some readers may be aware that Yudkowsky has often expressed the desire to create provably beneficial ("Friendly" in his terminology) AGI systems, and CAV does not accomplish this. It also is not clear that CEV, even if it were fully formalizable and implementable, would accomplish this. Also, it may be possible to prove interesting theorems about the benefits and limitations of CAV, even if not to prove some kind of absolute guarantee of CAV beneficialness; but the exploration of such theorems is beyond the scope of this blog post.)
Coherent Extrapolated Volition
In brief, Yudkowsky's CEV idea is described as follows:
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
This is a rather tricky notion, as exemplified by the following example, drawn from the CEV paper:
Suppose Fred decides to murder Steve, but when questioned, Fred says this is because Steve hurts other people, and needs to be stopped. Let's do something humans can't do, and peek inside Fred's mind-state. We find that Fred holds the verbal moral belief that hatred is never an appropriate reason to kill, and Fred hopes to someday grow into a celestial being of pure energy who won't hate anyone. We extrapolate other aspects of Fred's psychological growth, and find that this desire is expected to deepen and grow stronger over years, even after Fred realizes that the Islets worldview of "celestial beings of pure energy" is a myth. We also look at the history of Fred's mind-state and discover that Fred wants to kill Steve because Fred hates Steve's guts, and the rest is rationalization; extrapolating the result of diminishing Fred's hatred, we find that Fred would repudiate his desire to kill Steve, and be horrified at his earlier self.
I would construe Fred's volition not to include Fred's decision to kill Steve...
Personally, I would be extremely wary of any being that extrapolated my volition in this sort of manner, and then tried to impose my supposed "extrapolated volition" on me, telling me "But it's what you really want, you just don't know it." I suppose the majority of humans would feel the same way. This point becomes clearer if one replaces the above example with one involving marriage rather than murder:
Suppose Fred decides to marry Susie, but when questioned, Fred says this is because Susie is so smart and sexy. Let's do something humans can't do, and peek inside Fred's mind-state. We find that Fred holds the verbal moral belief that sex appeal is never an appropriate reason to marry, and Fred hopes to someday grow into a celestial being of pure energy who won't lust at all. We extrapolate other aspects of Fred's psychological growth, and find that this desire is expected to deepen and grow stronger over years, even after Fred realizes that the Islets worldview of "celestial beings of pure energy" is a myth. We also look at the history of Fred's mind-state and discover that Fred wants to marry Susie because Susie reminds him of his mother, and the rest is rationalization; extrapolating the result of diminishing Fred's unconscious sexual attraction to his mother, we find that Fred would repudiate his desire to marry Susie, and be disgusted with his earlier self.
I would construe Fred's volition not to include Fred's decision to marry Susie...
Clearly, the Yudkowskian notion of "volition" really has little to do with "volition" as commonly construed!!
While I can see the appeal of extrapolating Fred into "the Fred that Fred would like to be," I also think there is a lot of uncertainty in this process. If Fred has inconsistent aspects, there may be many possible future-Freds that Fred could evolve into, depending on both environmental feedback and internal (sometimes chaotic) dynamics. If one wishes to define the coherent extrapolated Future-Fred as the average of all these, then one must choose what kind of average to use, and one may get different answers depending on the choice. This kind of extrapolation is far from a simple matter -- and since "self" is not a simple matter either, it's not clear that current-Fred would consider all or any of these Future-Freds as being the same person as him.
In CAV as described here, I consider "volition" in the more typical sense -- rather than in the sense of Yudkowskian "extrapolated volition" -- as (roughly) "what a person or other intelligent agent chooses." So according to my conventional definition of volition, Fred's volition is to kill Steve and marry Susie.
Mijic's List of Desirable Properties
Roko Mijic has posited a number of general "desirable properties" for a superintelligence, and presented CEV as one among many possible concrete possibilities instantiating these principles:
- Meta-algorithm: Most goals the AI has will be harvested at run-time from human minds, rather than explicitly programmed in before run-time.
- Factually correct beliefs: Using the AI's superhuman ability to ascertain the correct answer to any factual question in order to modify preferences or desires that are based upon false factual beliefs.
- Singleton: Only one superintelligence is to be constructed, and it is to take control of the entire future light cone with whatever goal function is decided upon.
- Reflection: Individual or group preferences are reflected upon and revised, in the style of Rawls' reflective equilibrium.
- Preference aggregation: The set of preferences of a whole group are to be combined somehow.
The "factually correct beliefs" requirement also seems problematic, if enforced too harshly, in the sense that it's hard to tell how a person, who has adapted their beliefs and goals to certain factually incorrect beliefs, would react if presented with corresponding correct beliefs. Hypothesizing that a future AI will be able to correctly make this kind of extrapolation is not entirely implausible, but certainly seems speculative. After all, each individual's reaction to new beliefs is bound to depend on the reactions of others around them, and human minds and societies are complex systems, whose evolution may prove difficult for even a superintelligence to predict, given chaotic dynamics and related phenomena. My conclusion is that there should be a bias toward factual correctness, but that it shouldn't be taken to override individual preferences and attitudes in all cases. (It's not clear to me whether this contradicts Mijic's perspective or not.)
Coherent Aggregated Volition
What I call CAV is an attempt to capture much of the essential spirit of CEV (according to my own perspective on CEV), in a way that is more feasible to implement than the original CEV, and that is prototype-able now in simplified form.
Use the term "gobs" to denote "goal and belief set" (and use "gobses" to denote the plural of "gobs"). It is necessary to consider goals and beliefs together, rather than just looking at goals, because real-world goals are typically defined in terms whose interpretation depends on certain beliefs. Each human being or AGI may be interpreted to hold various gobses to various fuzzy degrees. There is no requirement that a gobs be internally logically consistent.
A "gobs metric" is then a distance on the space of gobses. Each person or AI may also agree with various gobs metrics to various degrees, but it seems likely that individuals' gobs metrics will differ less than their gobses.
Suppose one is given a population of intelligent agents -- like the human population -- with different gobses. Then one can try to find a gobs that maximizes the four criteria of
- logical consistency
- average similarity to the various gobses in the population
- amount of evidence in support of the various beliefs in the gobs
The use of a multi-extremal optimization algorithm to seek a gobs defined as above is what I call CAV. The "CAV" label seems appropriate since this is indeed a system attempting to achieve both coherence (measured via compactness + consistency) and an approximation to the "aggregate volition" of all the agents in the population.
Of course there are many "free parameters" here, such as
- how to carry out the averaging (for instance one could use a p'th-power average with various p values)
- what underlying computational model to use to measure compactness (different gobs may come along with different metrics of simplicity on the space of computational models)
- what logical formalism to use to gauge consistency
- how to define the multi-extremal optimization: does one seek a Pareto optimum?; does one weight the different criteria and if so according to what weighting function?
- how to measure evidence
- what optimization algorithm to use
However, the basic notion should be clear, even so.
If one wants to take the idea a step further, one can seek to use a gobs metric that maximizes the criteria of
- compactness of computational representation
- average similarity to the gobs metrics of the minds in the population
where one must then assume some default similarity measure (i.e.m etric) among gobs metrics. (Carrying it further than this certainly seems to be overkill.)
One can also use a measure of evidence defined in a similar manner, via combination of a compactness criterion and an average similarity criterion. These refinements don't fundamentally change the nature of CAV.
Relation between CEV and CAV
It is possible that CEV, as roughly described by Yudkowsky, could lead to a gobs that would serve as a solution to the CAV maximization problem. However, there seems no guarantee of this. It is possible that the above maximization problem may have a reasonably good solution, and yet Yudkowskian CEV may still diverge or lead to a solution very far from any of the gobses in the population.
As a related data point, I have found in some experiments with the PLN probabilistic reasoning system that if one begins with a set of inconsistent beliefs, and attempts to repair it iteratively (by replacing one belief with a different one that is more consistent with the others, and then repeating this process for multiple beliefs), one sometimes arrives at something VERY different from the initial belief-set. And this can occur even if there is a consistent belief set that is fairly close to the original belief-set by commonsensical similarity measures. While this is not exactly the same thing as CEV, the moral is clear: iterative refinement is not always a good optimization method for turning inconsistent belief-sets into nearby consistent ones.
Another, more qualitative observation, is that I have the uneasy feeling CEV seeks to encapsulate the essence of humanity in a way that bypasses the essential nature of being human...
CEV wants to bypass the process of individual and collective human mental growth, and provide a world that is based on the projected future of this growth. But, part of the essence of humanity is the process of growing past one's illusions and shortcomings and inconsistencies.... Part of Fred's process-of-being-Fred is his realizing on his own that he doesn't really love Susie in the right way ... and, having the super-AI decide this for him and then sculpt his world accordingly, subtracts a lot of Fred's essential humanity.
Maybe the end-state of resolving all the irrationalities and inconsistencies in a human mind (including the unconscious mind) is something that's not even "human" in any qualitative, subjective sense...
On the other hand, CAV tries to summarize humanity, and then would evolve along with humanity, thus respecting the process aspect of humanity, not trying to replace the process of humanity with its expected end-goal... And of course, because of this CAV is likely to inherit more of the "bad" aspects of humanity than CEV -- qualitatively, it just feels "more human."
Relation of CAV to Mijic's Criteria
CAV appears to adhere to the spirit of Mijic's Meta-algorithm, Factual correctness and Preference aggregation criteria. It addresses factual correctness in a relatively subtle way, differentiating between "facts" supported by different amounts of evidence according to a chosen theory of evidence.
CAV is independent of Mijic's "singleton" criterion -- it could be used to create a singleton AI, or an AI intended to live in a population of roughly equally powerful AIs. It could also be used to create an ensemble of AIs, by varying the various internal parameters of CAV.
CAV does not explicitly encompass Mijic's "reflection" criterion. It could be modified to do so, in a fairly weak way, such as replacing the criterion
- average similarity to the various gobses in the population
- average similarity to the various gobses displayed by individuals in the population when in a reflective frame of mind
This might be wise, as it would avoid including gobses from people in the throes of rage or mania. However, it falls far short of the kind of deep reflection implied in the original CEV proposal.
One could also try to teach the individuals in the population to be more reflective on their goals and beliefs before applying CAV. This would surely be a good idea, but doesn't modify the definition of CAV, of course.
It seems that it would be possible to prototype CAV in a fairly simple way, by considering a restricted class of AI agents, for instance OpenCog-controlled agents, or even simple agents whose goals and beliefs are expressed explicitly in propositional-logic form. The results of such an experiment would not necessarily reflect the results of CAV on humans or highly intelligent AGI agents, but nevertheless such prototyping would doubtless teach us something about the CAV process.
I have formulated a method for arriving at AGI goal system content, intended to serve as part of an AGI system oriented beneficially toward humans and other sentient beings. This method is called Coherent Aggregated Volition, and is in the general spirit of Yudkowsky's CEV proposal as understood by the author, but differs dramatically from CEV in detail. It may be understood as a simpler, more feasible approach than CEV to fulfiling Mijic's criteria.
One thing that is apparent from the above detailed discussion of CAV is the number of free parameters involved. We consider this a feature not a bug, and we strongly suspect that CEV would also have this property if it were formulated with a similar degree of precision. Furthermore, the parameter-dependence of CEV may seem particularly disturbing if one considers it in the context of one's own personal extrapolated volitions. Depending on the setting of some weighting parameter, CEV may make a different decision as to whether Fred "really" wants to marry Susie or not!!
What this parameter-dependence means is that CAV is not an automagical recipe for producing a single human-friendly goal system content set, but rather a general approach that can be used by thoughtful humans or AGIs to produce a family of different human-friendly goal system content sets. Different humans or groups applying CAV might well argue about the different parameters, each advocating different results! But this doesn't eliminate the difference between CAV and other approaches to goal system content that don't even try to achieve broad-based beneficialness.
Compared to CEV, CAV is rather boring and consists "merely" of a coherent, consistent variation on the aggregate of a population's goals and beliefs, rather than an attempt to extrapolate what the members of the population in some sense "wish they wanted or believed." As the above discussion indicates, CAV in itself is complicated and computationally expensive enough. However, it is also prototype-able; and we suspect that in the not too distant future, CAV may actually be a realistic thing to implement on the human-population scale, whereas we doubt the same will be true of CEV. Once the human brain is well understood and non-invasively scannable, then some variant of CAV may well be possible to implement in powerful computers; and if the projections of Kurzweil and others are to be believed, this may well happen within the next few decades.
Returning to the three aspects of beneficial AGI outlined at the start of this essay: I believe that development of the currently proposed OpenCog design has a high chance of leading to an AGI architecture capable of pursuing its goal-system content in a rational way; and this means that (in my world-view) the main open question regarding beneficial AGI pertains to the stability of goal systems under environmental variation and systemic self-modification. I have some ideas for how to handle this using dynamical systems theory, but these must wait for a later post!