The relation between minds, goals and values is complex and subtle. Here I will sketch a theory that aims to come to grips with key aspects of this subtlety -- articulating what is means for a value system to be coherent, and how one can start with incoherent value systems (like humans currently have) and use them as seeds to evolve coherent value systems. I will also argue that as AGI moves beyond human level toward superintelligence, there is reason to believe coherent value systems will become the norm.

## Interdependence of Goals and Minds

In modern AI it’s become standard to model intelligent systems as goal-achieving systems, and often more specifically as systems that seek to maximize expected future reward, for some precisely defined reward function.

In a blog post 12 years ago I articulated some limitations to the expected-reward-maximization approach typical in reinforcement learning work; however these limitations do not apply to goal-maximization construed more broadly as “acting so as to maximize some mathematical function of expected future histories” (where this function doesn’t have to be a time-discounted expected reward).

In the intervening years, much broader perspectives on the nature of intelligence such as Open-Ended Intelligence have also become part of the discourse.

My position currently is that goal-achievement is a major part of what humans do, and will be a major part of what any human-like AGI does. There are also non-goal-focused self-organization processes that are critical to human intelligence, and this will probably also be true for any human-like, roughly human-level AGI. There may also be other sorts of general intelligences in which goal pursuit plays a much smaller role.

Nick Bostrom (e.g. in his book

If treating goals as a separate thing from the rest of cognitive processing and cognitive content isn’t going to work in an AGI context -- then supplying an externally-defined goal to an AGI system can only be considered as seeding the process of that AGI constructing its own goals according to its own self and world understanding. Goals will generally co-evolve with the goal-pursing cognitive processes in the AGI’s mind, and also with the non-goal-oriented self-organizing processes in the AGI’s mind.

*Superintelligence*) and others have advanced the idea that a mind’s goal system content should be considered as basically independent of other aspects of that mind — and on this basis have written a lot about examples like massively superhumanly intelligent minds with goals like turning all matter in the universe into paperclips. But looking at how goals co-evolve with the rest of cognitive content and processing in human minds, I have never been convinced of this proposed independence. One question is to what extent various sorts of minds could in principle be paired with various sorts of goals; another (and more interesting and relevant) question is, given a particular sort of mind, what are the actual odds of this mind evolving into a condition where it pursues a particular sort of goal.If treating goals as a separate thing from the rest of cognitive processing and cognitive content isn’t going to work in an AGI context -- then supplying an externally-defined goal to an AGI system can only be considered as seeding the process of that AGI constructing its own goals according to its own self and world understanding. Goals will generally co-evolve with the goal-pursing cognitive processes in the AGI’s mind, and also with the non-goal-oriented self-organizing processes in the AGI’s mind.

## Goals and Values -- for Humanity and Beyond

The relation between goals and

**is somewhat complex, but to simplify, we can say that often***values*####
- a mind values something to the degree it estimates that thing can contribute to its goals

- a mind’s goals can be viewed as having a world in which its values are realized

But whether one thinks about goals or values, from an AGI standpoint the question remains: What sorts of goals and values should we encourage for our AGI systems, given that humanity's value systems are clearly deeply flawed and self-contradictory and fractious, yet are what we currently have. We don't want our AGIs to slavishly emulate our current screwed-up values, but we also don't want them to go off in a totally different direction that has no resemblance to anything meaningful to us. So what's the right strategy -- or it is just, teach the AGI well and let it learn and evolve and hope for the best?

Eliezer Yudkowsky has advocated some interesting ideas about how to create appropriate values for a superhuman AGI system, via starting with human values and then iterating (“In CEV [Coherent Extrapolated Volition], an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together”).

Eliezer Yudkowsky has advocated some interesting ideas about how to create appropriate values for a superhuman AGI system, via starting with human values and then iterating (“In CEV [Coherent Extrapolated Volition], an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together”).

I have explored variations of this such as Coherent Blended Volition, which have some practical advantages relative to the original CEV concept, but which I was never entirely happy with.

Overall, I have long considered it an important and under-appreciated pursuit to understand what kinds of goals are most likely to be found in a highly intelligent and evolved AGI mind — and what kinds of goals we should be focusing on putting into our early-stage AGI systems right now.

Clearly it is a better idea to fill our current AGI systems with goals related to compassion, love, mutual aid and learning and understanding — as opposed to say, world domination or pure selfish personal resource accumulation — but beyond this, are there subtler properties of AGI goal systems we should be thinking about?

## Coherent Value Systems

I will argue here that some value systems have “better” intrinsic properties than others in a purely formal sense, setting aside their particular contents.

I will give a simple mathematical characterization of what I call “coherent value systems”, and discuss the qualitative properties of such value systems — basically, a coherent value system is one that evaluates the value of each localized action or state in a way that’s consistent with its evaluation of the value of all the other actions or states that this localized one contains, is part of, or interacts with. Valuing each part in a way that is completely consisting with its valuing of sub-parts, greater wholes and co-parts of wholes.

I will argue that coherent value systems are intrinsically more efficient than incoherent ones — suggesting (quite speculatively but with clear logic) that ultimately, in a setting supporting flexible evolution of multiple kinds of minds, those with coherent value systems are likely to dominate.

While different in the details of formulation and argument, conceptual this is along the lines of an argument long made by Mark Waser and others, that as human-level intelligence gives way to superintelligence, primitive human values are likely to give way to values that are in some sense superbeneficial. Qualitatively, Waser’s “Univeralist” value system appears to meet the coherence criteria outlined here.

On the other hand, typical human value systems clearly are not very coherent in this sense. With this in mind, I will explore question of how, starting with an incoherent value system (like a current human value system), one might create a coherent value system that is seeded by and substantially resembles this initial incoherent value system. This addresses basically the same problem that Yudkowsky’s CEV tries to address, but in what seems to be a clearer and more scientifically/mathematically grounded manner.

The key property I want to explore here is “coherence” of value systems — meaning that when one has an entity decomposable into the parts, then what the value system rates as high value for the parts, is consistent with that the value system rates as high value for the whole.

## Toward a Formal Theory of Value Coherence

The key property I want to explore here is “coherence” of value systems — meaning that when one has an entity decomposable into the parts, then what the value system rates as high value for the parts, is consistent with that the value system rates as high value for the whole.

Human value systems, if inferred implicitly from human behavior, often appear to violate this coherence principle. However it seems feasible to take a value system that is “incoherent” in this sense and (in a very rough sense) normalize it into coherence.

To see how this may be possible, we have to dig a bit into the math of value system coherence and some of its indirect consequences.

Consider a universe U as a set of atomic entities. Let P denote the power set of U (the set of subsets of U). Then consider “individuals” as subsets of P— e.g. the person Ben Goertzel, or the country USA are individuals. (Ideally we should consider individuals as fuzzy subsets of P, but we will set things up so that without loss of generality we can look at the crisp case.) Let V denote the ``indiverse” or set of individuals associated with U. The members of the set (of subsets of U) defining the individual A will be referred to as “instances” of A.

One can posit some criteria for what constitutes an admissible individual — e.g. one can posit there needs to be some process of finite complexity that generates all members of the individual. The particulars of these criteria are not critical to the notions we’re developing here.

Next consider a value function v that maps from P x V into (some subset of) the real numbers. In this picture a “value system” is the graph of a value function.

We can interpret v(x,A) as the value of subset x in the context of individual A.

Let # denote a disjoint union operator on individuals in V (one could generalize and look at disjoint coproduct in a categorial setting, but I’m not sure we need that for starters…) .

Then:

Define a value function v to be *coherent* if for all individuals A, B in V,

*argmax { v(x, A#B)) | x in A # B } = (argmax { v(y, A) | y in A} ) # (argmax { v(z, B) | z in B} )*

I.e., what this says is: The instance of the individual A # B with maximum value according to v, is obtained by taking the instance of the individual A with maximum value and joining it (via #) with the instance of the individual B with maximum value.

One could generalize this a bit by asking e.g. that the instances x of A#B for which v(x) is in the top decile across A#B, are mostly of the form y#z where v(y) is in the top decile across A and v(z) is in the top decile across B. But this doesn’t seem to change the conceptual picture much, so for the moment we’ll stick with the stricter definition of coherence in terms of argmax.

In the case of fuzzy individuals, the definition of coherence might look more like

*argmax { v(x, A#B) * m(x, A#B) | x in V } = (argmax { v(y, A) * m(y,A) | y in V} ) # (argmax { v(z, B) * m(z,B) | z in V} )*

where e.g. m(x,A) denotes the fuzzy membership degree of x in individual A. However, the story is the same here as in the crisp case because we can simply define

*v1(x, A) = v(x,A) * m(x, A)*

and then apply the crisp definition. I.e. on a formal level, the fuzziness can be baked into the context-dependence of the value function.

##

**Intuitive Meaning of Value System Coherence**

For a coherent value system, what is best for a society of humans will necessarily involve each human within the society doing what the value system considers the best thing for them to do. For a coherent value system, doing the best thing over a long period of time involved, over each shorter subinterval of time, doing what the value system considers the best thing to do then.

Consider, for instance, the function that assigns an activity the value v(A) defined as the amount of pleasure that doing A brings directly to a certain human mind. This value system is almost never coherent, for real humans. This means for almost all humans, short-term “living in the moment” hedonism is not coherent (for the obvious reason that deferring gratification often bring more pleasure altogether, given the way the real human world works).

For an incoherent value system, there will exist “evil” from the view of that value system — i.e. there will exist tradeoffs wherein maximizing value for one entity results in some other entity not maximizing value.

Intuitively, for a value system to be coherent, what’s best for an individual entity E has to be: What’s best overall for the totality of entities influenced by E.

There is some wiggle room in the definition of “overall” here, which becomes clear when one looks at how to normalize an incoherent value system to obtain a coherent one.

## Formal Properties of Coherent Value Systems

The definition of “coherence” turns out to enforce some fairly strict requirements on what a coherent value function can be.

This can be seen in an elegant way via a minor adaptation of the arguments in Knuth and Skilling’s classic paper Foundations of Inference.

This section of the post gets a bit nitty-gritty and the reader who hasn't read (and doesn't want to take time to now read) Knuth and Skilling's paper may want to skip it over.

In essence, one just needs to replace the set-theoretic union in their framework with disjoint union # on individuals defined as follows: If x and y are disjoint then x#y is their disjoint union, and if x and y intersect then x#y is undefined.

Looking at Section 3 of

*Foundations of Inference*, let us consider*r(A) = max{ v(x,A) | x in A}*

as the real number corresponding to the individual A.

*Symmetry 0: A*

**r(A) < r(B)**
is clearly true due to the nature of maximum.

*Symmetry 1: A < B ==> { A # C < B # C, and C # A < C # B } , in the case all the disjoint unions are well-defined.*

This is true due to the nature of union, in the case that all the disjoint unions are well-defined.

*Symmetry 2: (A # B) # C = A # ( B # C) , where either both sides are well-defined or neither are.*

*Symmetry 3: (A x T) # (B x T) = (A # B) x T, where either both sides are well-defined or neither are*

*Symmetry 4: (A x B) x C = A x (B x C)*

Finally, consider an ordered chain of individuals e.g. A < B < C < T, and use the notation e.g. [A,T] to signify that A precedes T in this chain. We can then define a derived chaining operation that acts on adjacent intervals, so that e.g. [A,B] , [B,C] = [A, C].

If we use the notation

*a = [A,B]*

*b = [B, C]*

*c = [C, T]*

then we have

*Symmetry 5: (a, b), c = a, (b, c)*

which works unproblematically in our setting, as the distinction between ordinary and disjoint union is not relevant.

Looking at the mapping between individuals and values in the context of Knuth and Skilling's mapping between lattice elements and numerical values, how can we interpret

*c = a + b corresponds to C = A # B*

in the present context? To calculate c = a + b if A and B are known, one would

- find z_a in A so that v(z_a, A) = a
- find z_b in B so that v(z_b, B) = b
- let z_c = z_a # z_b

Then, via the coherence rule

z_c = maxzr { v(z_C, C) | x in C }

and one can set

*c = v(z_C,C)*

The treatment of direct product and chain composition in

*Foundations of Inference*carries over directly here, as there is nothing different about direct products and inclusion in our setting versus the setting they consider.
Axioms 1-5 from Foundations of Inference Section 4 appear to follow directly, the only caveat being that the equations are only to be used when the individuals involved are disjoint.

Section 5.1 in Foundations of Inference deals specifically with the case of disjoint arguments, which is the case of core interest here.

The overall conclusion is: If v is a coherent value function, then value-assignments of the form

*r(A) = max{ v(x,A) | x in A}*

should behave like monotone scalings of probabilities.

This means, that, for instance, they should obey the formula

*r(S1 # … # Sn) = r(S1) + … + r(Sn)*

-- or else

*r(S1 # … # Sn) = r(S1) +^ … +^ r(Sn)*

where

*a +^ b = f( f^{-1}(a) + f^{-1}(b) )*

for some monotone function f.

##
**Using Incoherent Value Systems to Seed Learning of Coherent Value Systems**

Now let's get to the punchline -- given the above notion and characterization of value coherence, how might one create a coherent value system that still retains some of the core qualitative aspects of an incoherent value system such as, say, current human value systems?

Given an incoherent value system v, one can define a related, derived coherent value system v' as follows.

The basic idea is to define an error function E1(r’) via the sum over all pairs (S1, S2) of

*( r’(S1 # S2) - ( r’(S1) + r’(S2) ) )^2*

and another error function E2(q) as the sum over all S of

*( w(S) * ( r’(S) - r(S) ) )^2*

where w(S) is an a priori weight specifying how much a given individual S is valued — e.g. S could be valued proportional to simplicity or proportional to relatedness to a specific base system, etc. ...

[this could be made more sophisticated, e.g. via accounting for intersection of different S in various ways, but this simple version will be sufficient for making the current conceptual points].

… and then look for Pareto optima of the problem of minimizing E1 and E2.

One can then use an iterative algorithm to find a v’(x,A) leading to r’(S) that live on this Pareto frontier, using the original v(x,A) as an initial condition.

This is somewhat analogous to Eliezer Yudkowsky’s notion of “coherent extrapolated volition”, but much more clearly defined.

The optimal iterative algorithm to use here is not clear and this is likely a quite subtle question as the intersection of machine learning/reasoning and numerical analysis. However, some simple evocative thoughts pointing in the direction of an appropriate heuristic algorithm may be conceptually interesting.

Along these lines, one can think about an iterative algorithm of the following nature.

Given A and A = B#C, let

*v1(A | B)*

denote the maximum value for A that is achievable via choosing the maximum-value instance of B, and then choosing the maximal-value instance of C that can co-exist with this.

Qualitatively, this means: How much value can we provide for A via maximizing the value of some sub-individual B of A. For instance, how much value can we provide for me by first maximizing value for my lungs, or how much value can we provide for my family by first maximizing value for me personally?

Given E = A # D, let

*v2(A | E)*

denote the maximum value for A that is achievable via choosing the maximum-value instance of E, and then choosing the maximal-value instance of D that can co-exist with this, and let

*v3(A | D)*

denote the maximum value for A that is achievable via choosing the maximum-value instance of D, and then choosing the maximal-value instance of E that can co-exist with this.

These measure: How much value can we provide for A via maximizing the value of some individual containing A, or of some individual that is composed with A to form a commonly containing individual? For instance, how much value can we provide for me via first maximizing the value of my family, or via first maximizing the value of the other people in my family?

If v is coherent, then v1=v2=v3=v.

In general, one could think about using (v1 + v2 + v3) (A) as an estimator for v’(A) to help guide the iterative optimization algorithm. This sum (v1 + v2 + v3) (A) is an estimate of the value providable for A via maximizing the value of a randomly chosen sub-individual, super-individual or connected-individual of A. This will often be a useful pointer in the direction of a more coherent value system than v, i.e. (v1 + v2 + v3) () is likely to be more coherent than v().

This particular estimator is relatively crude and much more sophisticated, qualitatively similar estimators can surely be created. But the idea I want to get across is that iterative pursuit of a coherent value system that is close to a given incoherent value system, with search seeded from this incoherent value system, may involve iterative steps through intermediate value systems that are conceptually reminiscent of the thinking behind CEV. That is, one can think about

- What kind of people would current humans like to be, if they could more fully realize their own values

- What would these hypothetical “revised better humans” value, and what kind of humans would THEY like to be

- etc.

This sort of iterative process, while rough and poorly-defined, is similar to v1 + v2 + v3 as defined above, and could be interesting as an avenue for iterating from current incoherent human values to a coherent value system living on the above-described Pareto frontier.

##
**Varieties of Coherent Value System**

Assuming there are multiple coherent value systems on the Pareto frontier, then one could guide the iterative search process toward a coherent value system in various different ways.

For instance, referencing the above simple estimator for simplicity of discussion, in constructing v1, v2 and v3 one could weight certain A, D and E more highly.

If one considers this weighting to be achieved via some cost function c(A) applied to individuals A, then one can think about the way different choices for c may impact the ultimate coherent value system obtained. (Of course the weight c() could be chosen the same as the weight w() used in the error function itself, and this would probably be the optimal choice in terms of effective guidance of optimization, but it’s not the only choice.)

E.g. a “selfish” v’ could be obtained by using a c that weights those S relating to a specific person very highly, and other S much less.

A consistently short-term-gratification oriented v’ could be obtained by using a c that weights S restricted to short periods of time very highly, and S restricted to longer periods of time much less. In many cultures this would rule out, e.g. a value system that values being happily married over the time-scale of years, but over the time-scale of hours values having sex with whomever one finds attractive. But a purely hedonistic value system that values a long period of time precisely according to the sum of the time-localized pleasures experienced during that period of time, may be perfectly coherent. Just as there can be a coherent value system that puts value on a time-local experience based substantially on both its immediate characteristics and its contribution to longer-term goals.

A value system that puts extremely high value on freedom of choice for individuals, but also extremely high value on societal order and structure, may be incoherent within the scope of human societies feasible in the context of modern human psychology and culture. A value system that prioritizes order and structure for society and obedience and submission for individuals is more likely to be coherent, as is one that values both freedom of choice and creative anarchic social chaos. The professed value systems of most contemporary influential political parties are, in this sense, obviously extremely incoherent.

##
**Are Intelligences with Coherent Value Systems More Efficien**t?

Arguably an intelligent system that directs its actions according to a coherent value system will, all else equal, be more efficient than one that directs its actions according to an incoherent one. This is because a mind with an incoherent value system will choose actions oriented toward maximizing value of one subset S1 of the world, and then later choose actions oriented toward maximizing some other subset S2 of the world — and will find that what it did in the context of S2 acts against what it did in the context of S1, and vice versa. Whereas for a mind with a coherent value system, actions chosen with respect to different subsets of the world will tend to reinforce and support each other, except where inference errors or unexpected properties of the world intervene.

This argument suggests that, in an evolutionary context involving competition between multiple intelligences, there will be a certain advantage to the ones with coherent value systems. However, this advantage doesn’t have to be decisive, because there may be other advantages enjoyed by entities with incoherent value systems. For instance, maintaining a coherent value system may sometimes be highly expensive in terms of space, time and energetic resources (it can be quite complex to figure out the implications of one’s actions for all the subsets of the world they impinge upon!).

My suspicion is that as computational and energetic resources become more ample and easily accessible by the competing cognizers in an evolutionary system, the efficiency advantage of a coherent value system becomes an increasingly significant factor. This suspicion seems a very natural and important candidate for further formal and qualitative exploration.

## No comments:

Post a Comment