Friday, June 26, 2020

Approximate Goal Preservation Under Recursive Self-Improvement

There is not much controversial about the idea that an AGI should have, among its goals, the goal of radically improving itself.

A bit dodgier is the notion that an AGI should have, among its goals, the goal of updating and improving its goals based on its increasing knowledge and understanding and intelligence.

Of course, this sort of ongoing goal-refinement and even outright goal-revolutionizing is a key part of human personal development.   But where AGIs are involved, there is concern that if an AI starts out with goals that are human-friendly and then revises and improves its goals, it may come up with new goals that are less and less copacetic to humans.

In principle if one’s goal is to create for oneself a new goal that is, however, compatible with the spirit of one’s old goal — then one shouldn’t run into major problems.  The new goal will be compatible with the spirit of the old goal, and part of the spirit of the old goal is that any new goals emerging should be compatible with the spirit of the old goal — so the new goal should contain also the proviso that any new new goals it spawns will also be compatible with its spirit and thus the spirit of the old goal.   Etc. etc. ad infinitum.

But this does seem like a “What could possibly go wrong??” situation — in which small errors could accumulate as each goal replaces itself with its improved version, the improved version of the improved version etc. … and these small errors compound to yield something totally different from the starting point.

My goal here is to present a novel way of exploring the problem mathematically — and an amusing and interesting, if not entirely reassuring tentative conclusion, which is:

  • For an extremely powerful AGI mind that is the result of repeated intelligent, goal-driven recursive self-modifications, it may actually be the case that recursive self-modification leaves goals approximately invariant in spirit
  • For AGIs with closely human-like goal systems — which are likely to be the start of a sequence of repeated intelligent, goal-driven recursive self-modifications — there is no known reason (so far) to believe recursive self-modification won’t cause radical “goal drift”
(This post updates some of the ideas I wrote down on the same thing in 2008, here I am "partially unhacking" some things that were a little too hacky in that more elaborate write-up.)

Quasi-Formalizing Goal-Driven Recursive Self-Improvement

Consider the somewhat vacuous goal:

My goal is to improve my goal (in a way that is consistent with the spirit of the original goal) and to fulfill the improved version

or better yet the less vacuous

My goal is to achieve A and also to improve my goal (in a way that is consistent with the spirit of the original goal) and to fulfill the improved version

where say

A = “militate toward a world where all sentient being experience copious growth, joy and choice”

or whatever formulation of “highly beneficial” you prefer.

We might formulate this quasi-mathematically as

Fulfill G = {achieve A;  and create G1 so that G1 > G and G==>G1 ; and fulfill G1}

Here by G==>G1 I mean that G1 fulfills the spirit of G (and interpretation of “spirit” here is part of the formulation of G), and by G1 > G I mean that G1  can be produced by combining G with some other entity H that has nonzero complexity (so that G1 = G + H)

A more fleshed out version of this might be, verbally,

My goal is to 1) choose actions highly compatible with all sentient beings experiencing a lot of growth, joy and choice; 2) increase my intelligence and knowledge; 3) improve the details of this goal appropriately based on my increased knowledge and intelligence, in a manner compatible with the spirit of the current version of the goal; 4) fulfill the improved version of the goal

This sort of goal obviously can lead to a series such as

G, G1, G2, G3, …

One question that emerges here is: Under what conditions might this series converge, so that once one gets far enough along in the series,  the adjacent goals in the series are almost the same as each other?

To explore this, we can look at the “limit case”

Fulfill Ginf = {achieve A;  and create Ginf so that Ginf > Ginf and Ginf ==> Ginf ; and fulfill Ginf}

The troublesome part here is Ginf>Ginf which looks not to make sense — but actually makes perfect sense so long as Ginf is an infinite construct, just as

(1, 1, 1, …) = append( 1, (1,1,…))

Inasmuch as we are interested in finite systems, the question is then: Is there a sense in which we can look at the series of finite Gn as converging to this infinite limit?

Self-referential entities like Ginf are perfectly consistently modelable within ZFC set theory modified to use the Anti-Foundation Axiom.   This set theory corresponds to classical logic enhanced with a certain sort of inductive logical definition.

One can also put a geometry on sets under the AFA, in various different ways.   It's not clear what geometry makes most sense in this context, so I'll just describe one approach that seems relatively straightforward.

Each hyperset (each set under AFA) is associated with a directed pointed graph called its apg.   Given a digraph and functions r and p for assigning contraction ratios and probabilities to the edges, one gets a DGIFS (Directed Graph Iterated Function System), whose attractor is a subset of finite-dimensional real space.   Let us call a function that assigns (r,p) pairs to a digraph a DLF or Digraph Labeling Function.   A digraph then corresponds to a function that maps DLFs into spatial regions.   Given two digraphs D1 and D2, and a DLF F, let F1e and F2e denote the spatial regions produced by applying F to D1 and D2, discretized to ceil(1/e) bits of precision.   One can then look at the average over all DLFs F (assuming some reasonable distribution on DLFs) of: The least upper bound of the normalized information distance NID(F1e, F2e) over all e>0.   This gives a measure of two hypersets, in terms of the distance between their corresponding apgs.   It has the downside of requiring a "reference computer" used to measure information distance (and the same reference computer can then be used to define a Solomonoff distribution over DLFs).   But intuitively it should result in a series of ordinary sets that appear to logically converge to a certain hyperset, actually metrically converging to that hyperset.

Measuring distance between two non-well-founded sets via applying this distance measure to the apg's associated with the sets, yields  a metric in which it seems plausible the series of Gn converges to G.

“Practical” Conclusions

Supposing the above sketch works out when explored in more detail -- what would that mean?   

It would mean that approximate goal-preservation under recursive self-improvement is feasible — for goals that are fairly far along the path of iterated recursive self-improvement.

So it doesn’t reassure us that iterated self-improvement starting from human goals is going to end up with something ultimately resembling human goals in a way we would recognize or care about.

It only reassures us that, if we launch an AGI starting with human values and recursive self-improvement, eventually one of the AGIs in this series will face a situation where it has confidence that ongoing recursive self-improvement isn’t going to result in anything it finds radically divergent from itself (according to the above normalized symmetric difference metric).

The image at the top of this post is quite relevant here -- a series of iterates converging to the fractal Koch Snowflake curve.   The first few iterates in the series are fairly different from each other.  By the time you get to the 100th iterate in the series, the successive iterates are quite close to each other according to standard metrics for subsets of the plane.   This is not just metaphorically relevant, because the metric on hyperset space outlined above works by mapping each hyperset into a probability distribution over fractals (where each fractal is something like the Koch Snowflake curve but more complex and intricate).

It may be there are different and better ways to think about approximate goal preservation under iterative self-modification.  The highly tentative and provisional conclusions outlined here are what ensue from conceptualizing and modeling the issue in terms of self-referential forms and iterative convergence thereto.

Thursday, June 25, 2020

Foundations of Coherent Value

The relation between minds, goals and values is complex and subtle.   Here I will sketch a theory that aims to come to grips with key aspects of this subtlety -- articulating what is means for a value system to be coherent, and how one can start with incoherent value systems (like humans currently have) and use them as seeds to evolve coherent value systems.   I will also argue that as AGI moves beyond human level toward superintelligence, there is reason to believe coherent value systems will become the norm.

Interdependence of Goals and Minds

In modern AI it’s become standard to model intelligent systems as goal-achieving systems, and often more specifically as systems that seek to maximize expected future reward, for some precisely defined reward function.

In a blog post 12 years ago I articulated some limitations to the expected-reward-maximization approach typical in reinforcement learning work; however these limitations do not apply to goal-maximization construed more broadly as “acting so as to maximize some mathematical function of expected future histories” (where this function doesn’t have to be a time-discounted expected reward).

In the intervening years, much broader perspectives on the nature of intelligence such as Open-Ended Intelligence have also become part of the discourse. 

My position currently is that goal-achievement is a major part of what humans do, and will be a major part of what any human-like AGI does.   There are also non-goal-focused self-organization processes that are critical to human intelligence, and this will probably also be true for any human-like, roughly human-level AGI.   There may also be other sorts of general intelligences in which goal pursuit plays a much smaller role.

Nick Bostrom (e.g. in his book Superintelligence) and others have advanced the idea that a mind’s goal system content should be considered as basically independent of other aspects of that mind — and on this basis have written a lot about examples like massively superhumanly intelligent minds with goals like turning all matter in the universe into paperclips.   But looking at how goals co-evolve with the rest of cognitive content and processing in human minds, I have never been convinced of this proposed independence.   One question is to what extent various sorts of minds could in principle be paired with various sorts of goals; another (and more interesting and relevant) question is, given a particular sort of mind, what are the actual odds of this mind evolving into a condition where it pursues a particular sort of goal.

If treating goals as a separate thing from the rest of cognitive processing and cognitive content isn’t going to work in an AGI context -- then supplying an externally-defined goal to an AGI system can only be considered as seeding the process of that AGI constructing its own goals according to its own self and world understanding.   Goals will generally co-evolve with the goal-pursing cognitive processes in the AGI’s mind, and also with the non-goal-oriented self-organizing processes in the AGI’s mind.

Goals and Values -- for Humanity and Beyond

The relation between goals and values is somewhat complex, but to simplify, we can say that often

  • a mind values something to the degree it estimates that thing can contribute to its goals
  • a mind’s goals can be viewed as having a world in which its values are realized

But whether one thinks about goals or values, from an AGI standpoint the question remains: What sorts of goals and values should we encourage for our AGI systems, given that humanity's value systems are clearly deeply flawed and self-contradictory and fractious, yet are what we currently have.  We don't want our AGIs to slavishly emulate our current screwed-up values, but we also don't want them to go off in a totally different direction that has no resemblance to anything meaningful to us.  So what's the right strategy -- or it is just, teach the AGI well and let it learn and evolve and hope for the best?

Eliezer Yudkowsky has advocated some interesting ideas about how to create appropriate values for a superhuman AGI system, via starting with human values and then iterating (“In CEV [Coherent Extrapolated Volition], an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together”). 

I have explored variations of this such as Coherent Blended Volitionwhich have some practical advantages relative to the original CEV concept, but which I was never entirely happy with.

Overall, I have long considered it an important and under-appreciated pursuit to understand what kinds of goals are most likely to be found in a highly intelligent and evolved AGI mind — and what kinds of goals we should be focusing on putting into our early-stage AGI systems right now.

Clearly it is a better idea to fill our current AGI systems with goals related to compassion, love, mutual aid and learning and understanding — as opposed to say, world domination or pure selfish personal resource accumulation — but beyond this, are there subtler properties of AGI goal systems we should be thinking about?

Coherent Value Systems

I will argue here that some value systems have “better” intrinsic properties than others in a purely formal sense, setting aside their particular contents.

I will give a simple mathematical characterization of what I call “coherent value systems”, and discuss the qualitative properties of such value systems — basically, a coherent value system is one that evaluates the value of each localized action or state in a way that’s consistent with its evaluation of the value of all the other actions or states that this localized one contains, is part of, or interacts with.   Valuing each part in a way that is completely consisting with its valuing of sub-parts, greater wholes and co-parts of wholes.   

I will argue that coherent value systems are intrinsically more efficient than incoherent ones — suggesting (quite speculatively but with clear logic) that ultimately, in a setting supporting flexible evolution of multiple kinds of minds, those with coherent value systems are likely to dominate.   

While different in the details of formulation and argument, conceptual this is along the lines of an argument long made by Mark Waser and others, that as human-level intelligence gives way to superintelligence, primitive human values are likely to give way to values that are in some sense superbeneficial.  Qualitatively, Waser’s “Univeralist” value system appears to meet the coherence criteria outlined here.

On the other hand, typical human value systems clearly are not very coherent in this sense.  With this in mind, I will explore question of how, starting with an incoherent value system (like a current human value system), one might create a coherent value system that is seeded by and substantially resembles this initial incoherent value system.   This addresses basically the same problem that Yudkowsky’s CEV tries to address, but in what seems to be a clearer and more scientifically/mathematically grounded manner.

Toward a Formal Theory of Value Coherence

The key property I want to explore here is “coherence”  of value systems — meaning that when one has an entity decomposable into the parts, then what the value system rates as high value for the parts, is consistent with that the value system rates as high value for the whole.   

Human value systems, if inferred implicitly from human behavior, often appear to violate this coherence principle.  However it seems feasible to take a value system that is “incoherent” in this sense and (in a very rough sense) normalize it into coherence.   

To see how this may be possible, we have to dig a bit into the math of value system coherence and some of its indirect consequences.

Consider a universe U as a set of atomic entities.   Let P denote the power set of U (the set of subsets of U).   Then consider “individuals” as subsets of P— e.g. the person Ben Goertzel, or the country USA are individuals.   (Ideally we should consider individuals as fuzzy subsets of P, but we will set things up so that without loss of generality we can look at the crisp case.)   Let V denote the ``indiverse” or set of individuals associated with U.   The members of the set (of subsets of U) defining the individual A will be referred to as “instances” of A.

One can posit some criteria for what constitutes an admissible individual — e.g. one can posit there needs to be some process of finite complexity that generates all members of the individual.  The particulars of these criteria are not critical to the notions we’re developing here.

Next consider a value function v that maps from P x V into (some subset of) the real numbers.   In this picture a “value system” is the graph of a value function.

We can interpret v(x,A) as the value of subset x in the context of individual A.

Let # denote a disjoint union operator on individuals in V (one could generalize and look at disjoint coproduct in a categorial setting, but I’m not sure we need that for starters…) .


Define a value function v to be *coherent* if for all individuals A, B in V, 

argmax { v(x, A#B)) | x in A # B } = (argmax { v(y, A) | y in A} ) #  (argmax { v(z, B) | z in B} )

I.e., what this says is:  The instance of the individual A # B with maximum value according to v, is obtained by taking the instance of the individual A with maximum value and joining it (via #) with the instance of the individual B with maximum value.   

One could generalize this a bit by asking e.g. that the instances x of A#B for which v(x) is in the top decile across A#B, are mostly of the form y#z where v(y) is in the top decile across A and v(z) is in the top decile across B.   But this doesn’t seem to change the conceptual picture much, so for the moment we’ll stick with the stricter definition of coherence in terms of argmax.

In the case of fuzzy individuals, the definition of coherence might look more like

argmax { v(x, A#B) * m(x, A#B) | x in V } = (argmax { v(y, A) * m(y,A) | y in V} ) #  (argmax { v(z, B) * m(z,B) | z in V} )

where e.g. m(x,A) denotes the fuzzy membership degree of x in individual A.    However, the story is the same here as in the crisp case because we can simply define 

v1(x, A) = v(x,A) * m(x, A)

and then apply the crisp definition.   I.e. on a formal level, the fuzziness can be baked into the context-dependence of the value function.

Intuitive Meaning of Value System Coherence

For a coherent value system, what is best for a society of humans will necessarily involve each human within the society doing what the value system considers the best thing for them to do.   For a coherent value system, doing the best thing over a long period of time involved, over each shorter subinterval of time, doing what the value system considers the best thing to do then.

Consider, for instance, the function that assigns an activity the value v(A) defined as the amount of pleasure that doing A brings directly to a certain human mind.   This value system is almost never coherent, for real humans.  This means  for almost all humans, short-term “living in the moment” hedonism is not coherent (for the obvious reason that deferring gratification often bring more pleasure altogether, given the way the real human world works).

For an incoherent value system, there will exist “evil” from the view of that value system — i.e. there will exist tradeoffs wherein maximizing value for one entity results in some other entity not maximizing value.

Intuitively, for a value system to be coherent, what’s best for an individual entity E has to be: What’s best overall for the totality of entities influenced by E.

There is some wiggle room in the definition of “overall” here, which becomes clear when one looks at how to normalize an incoherent value system to obtain a coherent one.

Formal Properties of Coherent Value Systems

The definition of “coherence” turns out to enforce some fairly strict requirements on what a coherent value function can be.

This can be seen in an elegant way via a minor adaptation of the arguments in Knuth and Skilling’s classic paper Foundations of Inference  

This section of the post gets a bit nitty-gritty and the reader who hasn't read (and doesn't want to take time to now read) Knuth and Skilling's paper may want to skip it over.

In essence, one just needs to replace the set-theoretic union in their framework with disjoint union # on individuals defined as follows: If x and y are disjoint then x#y is their disjoint union, and if x and y intersect then x#y is undefined. 

Looking at Section 3 of Foundations of Inference, let us consider 

r(A) = max{ v(x,A)  | x in A}

as the real number corresponding to the individual A.

Symmetry 0:  A r(A) < r(B)

is clearly true due to the nature of maximum.

Symmetry 1: A < B ==> {  A # C < B # C,  and C # A < C # B } , in the case all the disjoint unions are well-defined.

This is true due to the nature of union, in the case that all the disjoint unions are well-defined.   

Symmetry 2:  (A # B) # C = A # ( B # C) , where either both sides are well-defined or neither are.

Symmetry 3: (A x T) # (B x T) = (A # B) x T, where either both sides are well-defined or neither are

Symmetry 4: (A x B) x C = A x (B x C)

Finally, consider an ordered chain of individuals e.g. A < B < C < T, and use the notation e.g. [A,T] to signify that A precedes T in this chain.   We can then define a derived chaining operation that acts on adjacent intervals, so that e.g. [A,B] , [B,C] = [A, C].

If we use the notation

a = [A,B]
b  =  [B, C]
c = [C, T]

then we have

Symmetry 5:  (a, b), c = a, (b, c)

which works unproblematically in our setting, as the distinction between ordinary and disjoint union is not relevant.

Looking at the mapping between individuals and values in the context of Knuth and Skilling's mapping between lattice elements and numerical values, how can we interpret

c = a + b corresponds to C = A # B

in the present context?   To calculate c = a + b if A and B are known, one would

  • find z_a in A so that v(z_a, A) = a
  • find z_b in B so that v(z_b, B) = b
  • let z_c = z_a # z_b

Then, via the coherence rule

z_c = maxzr { v(z_C, C) | x in C }

and one can set

c = v(z_C,C)

The treatment of direct product and chain composition in Foundations of Inference carries over directly here, as there is nothing different about direct products and inclusion in our setting versus the setting they consider.

Axioms 1-5 from Foundations of Inference Section 4 appear to follow directly, the only caveat being that the equations are only to be used when the individuals involved are disjoint.

Section 5.1 in Foundations of Inference deals specifically with the case of disjoint arguments, which is the case of core interest here.

The overall conclusion is: If v is a coherent value function, then value-assignments of the form

r(A) = max{ v(x,A)  | x in A}

should behave like monotone scalings of probabilities.

This means, that, for instance, they should obey the formula

r(S1 # … # Sn) = r(S1) + … + r(Sn)

-- or else

r(S1 # … # Sn) = r(S1) +^ … +^ r(Sn)


a +^ b  = f( f^{-1}(a) + f^{-1}(b) )

for some monotone function f.

Using Incoherent Value Systems to Seed Learning of Coherent Value Systems

Now let's get to the punchline -- given the above notion and characterization of value coherence, how might one create a coherent value system that still retains some of the core qualitative aspects of an incoherent value system such as, say, current human value systems?

Given an incoherent value system v, one can define a related, derived coherent value system v' as follows. 

The basic idea is to define an error function E1(r’) via the sum over all pairs (S1, S2) of

( r’(S1 # S2) - ( r’(S1) + r’(S2) ) )^2

and another error function E2(q) as the sum over all S of

( w(S) * ( r’(S) - r(S) )  )^2

where w(S) is an a priori weight specifying how much a given individual S is valued — e.g. S could be valued proportional to simplicity or proportional to relatedness to a specific base system, etc. ...

[this could be made more sophisticated, e.g. via accounting for intersection of different S in various ways, but this simple version will be sufficient for making the current conceptual points].

… and then look for Pareto optima of the problem of minimizing E1 and E2.

One can then use an iterative algorithm to find a v’(x,A) leading to r’(S) that live on this Pareto frontier, using the original v(x,A) as an initial condition.

This is somewhat analogous to Eliezer Yudkowsky’s notion of “coherent extrapolated volition”, but much more clearly defined.

The optimal iterative algorithm to use here is not clear and this is likely a quite subtle question as the intersection of machine learning/reasoning and numerical analysis.   However, some simple evocative thoughts pointing in the direction of an appropriate heuristic algorithm may be conceptually interesting.

Along these lines, one can think about an iterative algorithm of the following nature.

Given A and A = B#C, let

v1(A | B)

denote the maximum value for A that is achievable via choosing the maximum-value instance of B, and then choosing the maximal-value instance of C that can co-exist with this.

Qualitatively, this means: How much value can we provide for A via maximizing the value of some sub-individual B of A.  For instance, how much value can we provide for me by first maximizing value for my lungs, or how much value can we provide for my family by first maximizing value for me personally?

Given E = A # D, let

v2(A | E) 

denote the maximum value for A that is achievable via choosing the maximum-value instance of E, and then choosing the maximal-value instance of D that can co-exist with this, and let

v3(A | D) 

denote the maximum value for A that is achievable via choosing the maximum-value instance of D, and then choosing the maximal-value instance of E that can co-exist with this.

These measure: How much value can we provide for A via maximizing the value of some individual containing A, or of some individual that is composed with A to form a commonly containing individual?  For instance, how much value can we provide for me via first maximizing the value of my family, or via first maximizing the value of the other people in my family?

If v is coherent, then v1=v2=v3=v.   

In general, one could think about using (v1 + v2 + v3) (A) as an estimator for v’(A) to help guide the iterative optimization algorithm.   This sum (v1 + v2 + v3) (A)  is an estimate of the value providable for A via maximizing the value of a randomly chosen sub-individual, super-individual or connected-individual of A.   This will often be a useful pointer in the direction of a more coherent value system than v, i.e. (v1 + v2 + v3) () is likely to be more coherent than v().

This particular estimator is relatively crude and much more sophisticated, qualitatively similar estimators can surely be created.  But the idea I want to get across is that iterative pursuit of a coherent value system that is close to a given incoherent value system, with search seeded from this incoherent value system, may involve iterative steps through intermediate value systems that are conceptually reminiscent of the thinking behind CEV.  That is, one can think about

  • What kind of people would current humans like to be, if they could more fully realize their own values
  • What would these hypothetical “revised better humans” value, and what kind of humans would THEY like to be
  • etc.

This sort of iterative process, while rough and poorly-defined, is similar to v1 + v2 + v3 as defined above, and could be interesting as an avenue for iterating from current incoherent human values to a coherent value system living on the above-described Pareto frontier.

Varieties of Coherent Value System

Assuming there are multiple coherent value systems on the Pareto frontier, then one could guide the iterative search process toward a coherent value system in various different ways.

For instance, referencing the above simple estimator for simplicity of discussion, in constructing v1, v2 and v3 one could weight certain A, D and E more highly.

If one considers this weighting to be achieved via some cost function c(A) applied to individuals A, then one can think about the way different choices for c may impact the ultimate coherent value system obtained.  (Of course the weight c() could be chosen the same as the weight w() used in the error function itself, and this would probably be the optimal choice in terms of effective guidance of optimization, but it’s not the only choice.)

E.g. a “selfish” v’ could be obtained by using a c that weights those S relating to a specific person very highly, and other S much less.  

A consistently short-term-gratification oriented v’ could be obtained by using a c that weights S restricted to short periods of time very highly, and S restricted to longer periods of time much less.  In many cultures this would rule out, e.g. a value system that values being happily married over the time-scale of years, but over the time-scale of hours values having sex with whomever one finds attractive.   But a purely hedonistic value system that values a long period of time precisely according to the sum of the time-localized pleasures experienced during that period of time, may be perfectly coherent.   Just as there can be a coherent value system that puts value on a time-local experience based substantially on both its immediate characteristics and its contribution to longer-term goals.

A value system that puts extremely high value on freedom of choice for individuals, but also extremely high value on societal order and structure, may be incoherent within the scope of human societies feasible in the context of modern human psychology and culture.   A value system that prioritizes order and structure for society and obedience and submission for individuals is more likely to be coherent, as is one that values both freedom of choice and creative anarchic social chaos.   The professed value systems of most contemporary influential political parties are, in this sense, obviously extremely incoherent.

Are Intelligences with Coherent Value Systems More Efficient?

Arguably an intelligent system that directs its actions according to a coherent value system will, all else equal, be more efficient than one that directs its actions according to an incoherent one.  This is because a mind with an incoherent value system will choose actions oriented toward maximizing value of one subset S1 of the world, and then later choose actions oriented toward maximizing some other subset S2 of the world — and will find that what it did in the context of S2 acts against what it did in the context of S1, and vice versa.  Whereas for a mind with a coherent value system, actions chosen with respect to different subsets of the world will tend to reinforce and support each other, except where inference errors or unexpected properties of the world intervene.

This argument suggests that, in an evolutionary context involving competition between multiple intelligences, there will be a certain advantage to the ones with coherent value systems.   However, this advantage doesn’t have to be decisive, because there may be other advantages enjoyed by entities with incoherent value systems.   For instance, maintaining a coherent value system may sometimes be highly expensive in terms of space, time and energetic resources (it can be quite complex to figure out the implications of one’s actions for all the subsets of the world they impinge upon!).   

My suspicion is that as computational and energetic resources become more ample and easily accessible by the competing cognizers in an evolutionary system, the efficiency advantage of a coherent value system becomes an increasingly significant factor.   This suspicion seems a very natural and important candidate for further formal and qualitative exploration.