To follow this blog by email, give your address here...

Saturday, July 04, 2020

The Developmental Role of Incoherent Multi-Value Systems in Open-Ended Intelligence


So I have written in a recent post about what it would mean for a value system to be coherent -- i.e. fully self-consistent -- and I have noted that human value systems tend to be wildly incoherent.   I have posited that coherence is an interesting property to think about in terms of designing and fostering emergence of AGI value systems.

Now it's time for the other shoe to drop -- I want to talk a bit about Open-Ended Intelligence and why incoherence in value systems (and multivalue systems) may be valuable and productive in the context of minds that are undergoing radical developmental changes in the context of an intelligent broader world.

(For more on open-ended intelligence, see the panel at AGI-20 a couple weeks ago, and Weaver's talk at AGI-16)

My earlier post on value system coherence focused on the case where a mind is concerned with maximizing a single value function.   Here I will broaden the scope a bit to minds that have multiple value functions -- which is how we have generally thought about values and goals in OpenCog, and which I think is a less inaccurate mathematical model of human intelligence.   This shift from value systems to multivalue systems opens the door to a bunch of other issues related to the nature of mental development, and the relationship between developing minds and their external environments.

TL;DR of my core point here is -- in an open-ended intelligence that is developing in a world filled with other broader intelligences, incoherence with respect to current value function sets may build toward coherence with respect to future value function sets.

As a philosophical aphorism, this may seem obvious, once you sort through all the technical-ish terminology.  However, building a bridge leading to this philosophical obvious-ness from the math of goal-pursuit as value-function-optimization is somewhat entertaining (to those of us with certain peculiar tastes, anyway) and highlights a few other interesting points along the way.

In the next section of this post I will veer fairly far into the formal logic/math direction, but then in the final two sections will veer back toward practical and philosophical aspects...

So let's go step by step...


1) Conceptual starting-point: Open-ended intelligence is better approximated by the quest for Pareto-optimality across a possibly large set of different objective functions, than by attempting to optimize any one objective function...   (This is not to say that Pareto-optimality questing fully captures the nature of open-ended intelligence or complex self-organization and autopoiesis etc. -- it surely doesn't -- just that it captures some core aspects that single-goal-function-optimization doesn't.)

2) One can formulate a notion of what it means for a set of value functions to be coherent as a group.  Basically, the argmax(F) in the definition of value-system-coherence is just replaced with "being located on the Pareto frontier of F1, F2...,Fn".  The idea is that the Pareto frontier of the values for a composite system should be the composition of the Pareto frontiers of the values for the components of the composite.

3) One can also think about the "crypticity" or difficulty of discovering a certain value system (a term due to Charles H. Bennett from way back).  Given a certain amount R of resources and a constraint C and a probability p, one can ask what is the most coherent value system one can find with probability >p that satisfies C, using the available resources.  Or if C is fuzzy, one can ask what is the most coherent value system one can find with probability >p that is on the Pareto frontier of coherence and C, given the available resources.

4) So open-ended intelligence involves [among other things] the emergence of coherent multivalued value-systems (multivalue systems) that involve a large number of different value functions, and that are tractably-discoverable (i.e. not too cryptic)

5) Suppose one is given a set of value-functions as initial "constraints", say C1, C2 ,..., CK -- and is then looking for the most coherent multivalue system one can find with high odds using limited resources, that is compatible with C1,...,Ck?  I.e. one is asking, what is the most coherent tractably-findable value system compatible with the initial values?

Then, suppose one is alternatively looking at a subset of the initial values, say C1,...,Ck ... and looking for the most coherent tractably-findable value system compatible with these?

6) The most coherent tractably-findable value systems according to C1,...,Ck -- may not be compatible with the most coherent tractably-findable value systems according to C1,...,CK.   Why? The reason for this would be: In some cases, adding in the extra value functions (k+1,...,K) may make it computationally simpler to find Pareto optima involving the original k value functions (1,...,k).   This could be the case if there were interaction informations between the  value functions 1,...,k and the value functions k+1,...,K

7) So we have here a sort of Fundamental Principle of Valuable Value-Incoherence -- i.e. if you have limited resources and you want to build toward multivalued coherence in the context of a bunch of different initial value-functions, the best routes could be through value-systems that are fairly incoherent in the context of various subsets of this bunch of initial value-functions.

8) So if a system is in a situation where new external value functions that will serve as constraints are progressively revealed over time, and these new external value functions have interaction information with one's previous constraint-value-functions, then one may find that one's current incoherence helps build toward one's future coherence.   

9) This seems especially relevant to the context of development in the context of a world filled with broader intelligences than oneself -- in which case one is indeed being confronted with (and developing to internalize) new external value functions that are related to one's prior value functions in complex ways.

10) So in this sort of context (development in a world that keeps feeding new stuff that's informationally interactive w/ the old), it could be that seeking coherence is suboptimal in a similar way to how seeking piece count in the early stages of a chess game, or seeking board coverage in the early stages of an Othello game, is suboptimal....  Instead one often wants to seek mobility and maximization of options, in the early to mid stages of such games ... and the same may be the case w/ value systems in this sort of situation...

11) A major question then becomes: When and how big are there actual tradeoffs btw multivalue system coherence and open-mindness (aka agility/mobility)....  What is the sense in which an incoherent system can have more information than a coherent one?

12) It is possible that the theory of paraconsistent logic might yield some insight here.    If you assume value system coherence as an axiom, then for a mind to have an incoherent value system will make it an overall inconsistent system (what sort of paraconsistency it will have depends on various details) -- whereas for a mind to have a coherent value system will land it in the realm of Godelian restrictions (i.e. via Godel's Second Incompleteness Theorem and its variants...)

13)  If you look at the set of theorems provable by a consistent logic, there's a limit due to Godel.  If you look at the set of theorems provable in a paraconsistent logic (e.g. a dialetheist logic, aka a logic in which there are true statements whose negations are also true) it can be "larger" in a sense, e.g. a dialetheic logic can prove its own Godel sentence as well as its own soundness.  This doesn't show that a paraconsistent logic can be more informative than a consistent one, but it opens the door for this to maybe be true...   It seems we are now pushing in directions where modern math-logic isn't yet fully fleshed out.  

14) The notion of an "experimental logic" seems also relevant here, .. basically a dynamic process in which new axioms are added to one's logic over time.   This is one analogue in logic-system-land of "development" in psychology-land...   Of course if one assumes there is a finite program whose behavior corresponds to some fixed logic generating the new axioms, then one can't escape Godel this way.  But if one assumes the new axioms are emanating in part from some imperfectly understood external source (which could be a hypercomputer for all one knows... or at least could be massively more intelligent/complex than stuff one can understand), then one has a funky situation. 

15) Also it seems one could capture a sort of experimental logic as a relevance-logic layer on top of dialetheic logic.  I.e. assume a dialetheic logic that can generate everything, and then put a relevance/importance distribution on axioms, and then the development process is one of gradually extending importance to more and more axioms....  This sort of open-ended logic potentially is in some useful senses fundamentally informationally richer than consistent logic... and in the domain of reasoning about values, incoherent value systems could open the door to this sort of breadth...

(Possibly relevantly -- While researching the above, I encountered the paper "Expanding the Logic of Paradox with a Difference-Making Relevant Implication" by Peter Verdée, which made me wonder whether relevance logic is somehow morphic to the theory of algorithmic causal dags....   I.e. in a relevance logic one basically only accepts the conclusion to follow from the premises, if there is some compressibility of the conclusion based on the premise list alone, without including the other axioms of the logic  ... )

Back to basics

OK well that got pretty deep and convoluted...

So let's go back to the basic conclusion/concept I gave at the beginning -- in an open-ended intelligence that is developing in a world filled with other broader intelligences, incoherence with respect to current value function sets may build toward coherence with respect to future value function sets.

In the current commercial/academic AI mainstream, the default way of thinking about AI motivation is in terms of the maximization of expected reward.   Hutter's beautiful and important theory of Universal AI takes this as a premise for many of its core theorems, for example.


In my practical proto-AGI work with OpenCog, I have preferred to use motivational systems with multiple goals and not average these into a single meta-goal.

On the other hand, I have also been intrigued by the notion of open-ended intelligence, and in general by the conceptualization and modeling of intelligences as SCADS, Self-organizing Complex Adaptive Dynamical Systems, in which goals arise and are pursued and then discarded as part of the broader self-organizing dynamics of system and environment.

What I'm suggesting here is that approximations of the SCADS perspective on open-ended intelligences may be constructed by looking at systems with large numbers of goals (aka. multivalue systems) that are engaged in developmental processes wherein new values are ongoingly added in an informationally rich interaction with  an intelligent external environment.

The ideas sketched here may form a partial bridge between the open-ended intelligence perspective -- which captures the fundamental depth of intelligence and mind -- and the function-optimization perspective, which has a lot of practical value in terms of current real-world system engineering and experimentation.

This line of thinking also exposes some areas in which modern math, logic and computing are not yet adequately developed.   There are relations between paraconsistent logic, gradual typing systems as are likely valuable in integrative multi-paradigm AGI systems,  the fundamental nature of value in developing intelligences, and the nature of creativity and radical novelty -- which we are barely at the edge of being able to formalize ... which is both fascinating and frustrating, in that there clearly are multiple PhD theses and research papers between here and a decent mathematical/conceptual understanding of these matters... (or alternately, a few seconds of casual thought by a decent posthuman AGI mind...)

Philosophical Post-lude

If one digs a bit deeper in a conceptual sense, beyond the math and the AI context, what we're talking about here in a way is a bridge between utilitarian-type thinking  (which has been highly valuable in economics and evolutionary biology and other areas, yet also clearly has fundamental limits) and more postmodernist type thinking (which views minds as complex self-organizing systems ongoingly reconstructing themselves and their realities in a polyphonic interactive inter-constructive process with other minds).   

Conventional RL based ML is utilitarianism projected into the algorithmic and mechanical domain, whereas Open-Ended Intelligence is postmodernism and a bit of Eastern philosophy projected into the realm of modern science.  

Expanding and generalizing the former so that it starts to approximate significant aspects of the latter, is interesting both for various practical engineering and science reasons, and as part of the general project of stretching the contemporary technosphere to a point where it can make rich contact with broader "non-reductionist" aspects of the universe it has hitherto mainly ignored.

Om!








Friday, June 26, 2020

Approximate Goal Preservation Under Recursive Self-Improvement


There is not much controversial about the idea that an AGI should have, among its goals, the goal of radically improving itself.

A bit dodgier is the notion that an AGI should have, among its goals, the goal of updating and improving its goals based on its increasing knowledge and understanding and intelligence.

Of course, this sort of ongoing goal-refinement and even outright goal-revolutionizing is a key part of human personal development.   But where AGIs are involved, there is concern that if an AI starts out with goals that are human-friendly and then revises and improves its goals, it may come up with new goals that are less and less copacetic to humans.

In principle if one’s goal is to create for oneself a new goal that is, however, compatible with the spirit of one’s old goal — then one shouldn’t run into major problems.  The new goal will be compatible with the spirit of the old goal, and part of the spirit of the old goal is that any new goals emerging should be compatible with the spirit of the old goal — so the new goal should contain also the proviso that any new new goals it spawns will also be compatible with its spirit and thus the spirit of the old goal.   Etc. etc. ad infinitum.

But this does seem like a “What could possibly go wrong??” situation — in which small errors could accumulate as each goal replaces itself with its improved version, the improved version of the improved version etc. … and these small errors compound to yield something totally different from the starting point.

My goal here is to present a novel way of exploring the problem mathematically — and an amusing and interesting, if not entirely reassuring tentative conclusion, which is:

  • For an extremely powerful AGI mind that is the result of repeated intelligent, goal-driven recursive self-modifications, it may actually be the case that recursive self-modification leaves goals approximately invariant in spirit
  • For AGIs with closely human-like goal systems — which are likely to be the start of a sequence of repeated intelligent, goal-driven recursive self-modifications — there is no known reason (so far) to believe recursive self-modification won’t cause radical “goal drift”
(This post updates some of the ideas I wrote down on the same thing in 2008, here I am "partially unhacking" some things that were a little too hacky in that more elaborate write-up.)

Quasi-Formalizing Goal-Driven Recursive Self-Improvement



Consider the somewhat vacuous goal:

My goal is to improve my goal (in a way that is consistent with the spirit of the original goal) and to fulfill the improved version

or better yet the less vacuous

My goal is to achieve A and also to improve my goal (in a way that is consistent with the spirit of the original goal) and to fulfill the improved version

where say

A = “militate toward a world where all sentient being experience copious growth, joy and choice”

or whatever formulation of “highly beneficial” you prefer.

We might formulate this quasi-mathematically as

Fulfill G = {achieve A;  and create G1 so that G1 > G and G==>G1 ; and fulfill G1}

Here by G==>G1 I mean that G1 fulfills the spirit of G (and interpretation of “spirit” here is part of the formulation of G), and by G1 > G I mean that G1  can be produced by combining G with some other entity H that has nonzero complexity (so that G1 = G + H)

A more fleshed out version of this might be, verbally,

My goal is to 1) choose actions highly compatible with all sentient beings experiencing a lot of growth, joy and choice; 2) increase my intelligence and knowledge; 3) improve the details of this goal appropriately based on my increased knowledge and intelligence, in a manner compatible with the spirit of the current version of the goal; 4) fulfill the improved version of the goal

This sort of goal obviously can lead to a series such as

G, G1, G2, G3, …

One question that emerges here is: Under what conditions might this series converge, so that once one gets far enough along in the series,  the adjacent goals in the series are almost the same as each other?

To explore this, we can look at the “limit case”

Fulfill Ginf = {achieve A;  and create Ginf so that Ginf > Ginf and Ginf ==> Ginf ; and fulfill Ginf}

The troublesome part here is Ginf>Ginf which looks not to make sense — but actually makes perfect sense so long as Ginf is an infinite construct, just as

(1, 1, 1, …) = append( 1, (1,1,…))

Inasmuch as we are interested in finite systems, the question is then: Is there a sense in which we can look at the series of finite Gn as converging to this infinite limit?

Self-referential entities like Ginf are perfectly consistently modelable within ZFC set theory modified to use the Anti-Foundation Axiom.   This set theory corresponds to classical logic enhanced with a certain sort of inductive logical definition.

One can also put a geometry on sets under the AFA, in various different ways.   It's not clear what geometry makes most sense in this context, so I'll just describe one approach that seems relatively straightforward.

Each hyperset (each set under AFA) is associated with a directed pointed graph called its apg.   Given a digraph and functions r and p for assigning contraction ratios and probabilities to the edges, one gets a DGIFS (Directed Graph Iterated Function System), whose attractor is a subset of finite-dimensional real space.   Let us call a function that assigns (r,p) pairs to a digraph a DLF or Digraph Labeling Function.   A digraph then corresponds to a function that maps DLFs into spatial regions.   Given two digraphs D1 and D2, and a DLF F, let F1e and F2e denote the spatial regions produced by applying F to D1 and D2, discretized to ceil(1/e) bits of precision.   One can then look at the average over all DLFs F (assuming some reasonable distribution on DLFs) of: The least upper bound of the normalized information distance NID(F1e, F2e) over all e>0.   This gives a measure of two hypersets, in terms of the distance between their corresponding apgs.   It has the downside of requiring a "reference computer" used to measure information distance (and the same reference computer can then be used to define a Solomonoff distribution over DLFs).   But intuitively it should result in a series of ordinary sets that appear to logically converge to a certain hyperset, actually metrically converging to that hyperset.

Measuring distance between two non-well-founded sets via applying this distance measure to the apg's associated with the sets, yields  a metric in which it seems plausible the series of Gn converges to G.

“Practical” Conclusions


Supposing the above sketch works out when explored in more detail -- what would that mean?   

It would mean that approximate goal-preservation under recursive self-improvement is feasible — for goals that are fairly far along the path of iterated recursive self-improvement.

So it doesn’t reassure us that iterated self-improvement starting from human goals is going to end up with something ultimately resembling human goals in a way we would recognize or care about.

It only reassures us that, if we launch an AGI starting with human values and recursive self-improvement, eventually one of the AGIs in this series will face a situation where it has confidence that ongoing recursive self-improvement isn’t going to result in anything it finds radically divergent from itself (according to the above normalized symmetric difference metric).

The image at the top of this post is quite relevant here -- a series of iterates converging to the fractal Koch Snowflake curve.   The first few iterates in the series are fairly different from each other.  By the time you get to the 100th iterate in the series, the successive iterates are quite close to each other according to standard metrics for subsets of the plane.   This is not just metaphorically relevant, because the metric on hyperset space outlined above works by mapping each hyperset into a probability distribution over fractals (where each fractal is something like the Koch Snowflake curve but more complex and intricate).

It may be there are different and better ways to think about approximate goal preservation under iterative self-modification.  The highly tentative and provisional conclusions outlined here are what ensue from conceptualizing and modeling the issue in terms of self-referential forms and iterative convergence thereto.


Thursday, June 25, 2020

Foundations of Coherent Value

The relation between minds, goals and values is complex and subtle.   Here I will sketch a theory that aims to come to grips with key aspects of this subtlety -- articulating what is means for a value system to be coherent, and how one can start with incoherent value systems (like humans currently have) and use them as seeds to evolve coherent value systems.   I will also argue that as AGI moves beyond human level toward superintelligence, there is reason to believe coherent value systems will become the norm.

Interdependence of Goals and Minds


In modern AI it’s become standard to model intelligent systems as goal-achieving systems, and often more specifically as systems that seek to maximize expected future reward, for some precisely defined reward function.

In a blog post 12 years ago I articulated some limitations to the expected-reward-maximization approach typical in reinforcement learning work; however these limitations do not apply to goal-maximization construed more broadly as “acting so as to maximize some mathematical function of expected future histories” (where this function doesn’t have to be a time-discounted expected reward).

In the intervening years, much broader perspectives on the nature of intelligence such as Open-Ended Intelligence have also become part of the discourse. 

My position currently is that goal-achievement is a major part of what humans do, and will be a major part of what any human-like AGI does.   There are also non-goal-focused self-organization processes that are critical to human intelligence, and this will probably also be true for any human-like, roughly human-level AGI.   There may also be other sorts of general intelligences in which goal pursuit plays a much smaller role.

Nick Bostrom (e.g. in his book Superintelligence) and others have advanced the idea that a mind’s goal system content should be considered as basically independent of other aspects of that mind — and on this basis have written a lot about examples like massively superhumanly intelligent minds with goals like turning all matter in the universe into paperclips.   But looking at how goals co-evolve with the rest of cognitive content and processing in human minds, I have never been convinced of this proposed independence.   One question is to what extent various sorts of minds could in principle be paired with various sorts of goals; another (and more interesting and relevant) question is, given a particular sort of mind, what are the actual odds of this mind evolving into a condition where it pursues a particular sort of goal.

If treating goals as a separate thing from the rest of cognitive processing and cognitive content isn’t going to work in an AGI context -- then supplying an externally-defined goal to an AGI system can only be considered as seeding the process of that AGI constructing its own goals according to its own self and world understanding.   Goals will generally co-evolve with the goal-pursing cognitive processes in the AGI’s mind, and also with the non-goal-oriented self-organizing processes in the AGI’s mind.

Goals and Values -- for Humanity and Beyond


The relation between goals and values is somewhat complex, but to simplify, we can say that often

  • a mind values something to the degree it estimates that thing can contribute to its goals
  • a mind’s goals can be viewed as having a world in which its values are realized

But whether one thinks about goals or values, from an AGI standpoint the question remains: What sorts of goals and values should we encourage for our AGI systems, given that humanity's value systems are clearly deeply flawed and self-contradictory and fractious, yet are what we currently have.  We don't want our AGIs to slavishly emulate our current screwed-up values, but we also don't want them to go off in a totally different direction that has no resemblance to anything meaningful to us.  So what's the right strategy -- or it is just, teach the AGI well and let it learn and evolve and hope for the best?

Eliezer Yudkowsky has advocated some interesting ideas about how to create appropriate values for a superhuman AGI system, via starting with human values and then iterating (“In CEV [Coherent Extrapolated Volition], an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together”). 

I have explored variations of this such as Coherent Blended Volitionwhich have some practical advantages relative to the original CEV concept, but which I was never entirely happy with.

Overall, I have long considered it an important and under-appreciated pursuit to understand what kinds of goals are most likely to be found in a highly intelligent and evolved AGI mind — and what kinds of goals we should be focusing on putting into our early-stage AGI systems right now.

Clearly it is a better idea to fill our current AGI systems with goals related to compassion, love, mutual aid and learning and understanding — as opposed to say, world domination or pure selfish personal resource accumulation — but beyond this, are there subtler properties of AGI goal systems we should be thinking about?

Coherent Value Systems


I will argue here that some value systems have “better” intrinsic properties than others in a purely formal sense, setting aside their particular contents.

I will give a simple mathematical characterization of what I call “coherent value systems”, and discuss the qualitative properties of such value systems — basically, a coherent value system is one that evaluates the value of each localized action or state in a way that’s consistent with its evaluation of the value of all the other actions or states that this localized one contains, is part of, or interacts with.   Valuing each part in a way that is completely consisting with its valuing of sub-parts, greater wholes and co-parts of wholes.   

I will argue that coherent value systems are intrinsically more efficient than incoherent ones — suggesting (quite speculatively but with clear logic) that ultimately, in a setting supporting flexible evolution of multiple kinds of minds, those with coherent value systems are likely to dominate.   

While different in the details of formulation and argument, conceptual this is along the lines of an argument long made by Mark Waser and others, that as human-level intelligence gives way to superintelligence, primitive human values are likely to give way to values that are in some sense superbeneficial.  Qualitatively, Waser’s “Univeralist” value system appears to meet the coherence criteria outlined here.

On the other hand, typical human value systems clearly are not very coherent in this sense.  With this in mind, I will explore question of how, starting with an incoherent value system (like a current human value system), one might create a coherent value system that is seeded by and substantially resembles this initial incoherent value system.   This addresses basically the same problem that Yudkowsky’s CEV tries to address, but in what seems to be a clearer and more scientifically/mathematically grounded manner.

Toward a Formal Theory of Value Coherence


The key property I want to explore here is “coherence”  of value systems — meaning that when one has an entity decomposable into the parts, then what the value system rates as high value for the parts, is consistent with that the value system rates as high value for the whole.   

Human value systems, if inferred implicitly from human behavior, often appear to violate this coherence principle.  However it seems feasible to take a value system that is “incoherent” in this sense and (in a very rough sense) normalize it into coherence.   

To see how this may be possible, we have to dig a bit into the math of value system coherence and some of its indirect consequences.

Consider a universe U as a set of atomic entities.   Let P denote the power set of U (the set of subsets of U).   Then consider “individuals” as subsets of P— e.g. the person Ben Goertzel, or the country USA are individuals.   (Ideally we should consider individuals as fuzzy subsets of P, but we will set things up so that without loss of generality we can look at the crisp case.)   Let V denote the ``indiverse” or set of individuals associated with U.   The members of the set (of subsets of U) defining the individual A will be referred to as “instances” of A.

One can posit some criteria for what constitutes an admissible individual — e.g. one can posit there needs to be some process of finite complexity that generates all members of the individual.  The particulars of these criteria are not critical to the notions we’re developing here.

Next consider a value function v that maps from P x V into (some subset of) the real numbers.   In this picture a “value system” is the graph of a value function.

We can interpret v(x,A) as the value of subset x in the context of individual A.

Let # denote a disjoint union operator on individuals in V (one could generalize and look at disjoint coproduct in a categorial setting, but I’m not sure we need that for starters…) .

Then: 

Define a value function v to be *coherent* if for all individuals A, B in V, 

argmax { v(x, A#B)) | x in A # B } = (argmax { v(y, A) | y in A} ) #  (argmax { v(z, B) | z in B} )

I.e., what this says is:  The instance of the individual A # B with maximum value according to v, is obtained by taking the instance of the individual A with maximum value and joining it (via #) with the instance of the individual B with maximum value.   

One could generalize this a bit by asking e.g. that the instances x of A#B for which v(x) is in the top decile across A#B, are mostly of the form y#z where v(y) is in the top decile across A and v(z) is in the top decile across B.   But this doesn’t seem to change the conceptual picture much, so for the moment we’ll stick with the stricter definition of coherence in terms of argmax.

In the case of fuzzy individuals, the definition of coherence might look more like

argmax { v(x, A#B) * m(x, A#B) | x in V } = (argmax { v(y, A) * m(y,A) | y in V} ) #  (argmax { v(z, B) * m(z,B) | z in V} )

where e.g. m(x,A) denotes the fuzzy membership degree of x in individual A.    However, the story is the same here as in the crisp case because we can simply define 

v1(x, A) = v(x,A) * m(x, A)

and then apply the crisp definition.   I.e. on a formal level, the fuzziness can be baked into the context-dependence of the value function.


Intuitive Meaning of Value System Coherence


For a coherent value system, what is best for a society of humans will necessarily involve each human within the society doing what the value system considers the best thing for them to do.   For a coherent value system, doing the best thing over a long period of time involved, over each shorter subinterval of time, doing what the value system considers the best thing to do then.

Consider, for instance, the function that assigns an activity the value v(A) defined as the amount of pleasure that doing A brings directly to a certain human mind.   This value system is almost never coherent, for real humans.  This means  for almost all humans, short-term “living in the moment” hedonism is not coherent (for the obvious reason that deferring gratification often bring more pleasure altogether, given the way the real human world works).

For an incoherent value system, there will exist “evil” from the view of that value system — i.e. there will exist tradeoffs wherein maximizing value for one entity results in some other entity not maximizing value.

Intuitively, for a value system to be coherent, what’s best for an individual entity E has to be: What’s best overall for the totality of entities influenced by E.

There is some wiggle room in the definition of “overall” here, which becomes clear when one looks at how to normalize an incoherent value system to obtain a coherent one.

Formal Properties of Coherent Value Systems


The definition of “coherence” turns out to enforce some fairly strict requirements on what a coherent value function can be.

This can be seen in an elegant way via a minor adaptation of the arguments in Knuth and Skilling’s classic paper Foundations of Inference  

This section of the post gets a bit nitty-gritty and the reader who hasn't read (and doesn't want to take time to now read) Knuth and Skilling's paper may want to skip it over.

In essence, one just needs to replace the set-theoretic union in their framework with disjoint union # on individuals defined as follows: If x and y are disjoint then x#y is their disjoint union, and if x and y intersect then x#y is undefined. 

Looking at Section 3 of Foundations of Inference, let us consider 

r(A) = max{ v(x,A)  | x in A}

as the real number corresponding to the individual A.

Symmetry 0:  A r(A) < r(B)

is clearly true due to the nature of maximum.

Symmetry 1: A < B ==> {  A # C < B # C,  and C # A < C # B } , in the case all the disjoint unions are well-defined.

This is true due to the nature of union, in the case that all the disjoint unions are well-defined.   

Symmetry 2:  (A # B) # C = A # ( B # C) , where either both sides are well-defined or neither are.

Symmetry 3: (A x T) # (B x T) = (A # B) x T, where either both sides are well-defined or neither are

Symmetry 4: (A x B) x C = A x (B x C)

Finally, consider an ordered chain of individuals e.g. A < B < C < T, and use the notation e.g. [A,T] to signify that A precedes T in this chain.   We can then define a derived chaining operation that acts on adjacent intervals, so that e.g. [A,B] , [B,C] = [A, C].

If we use the notation

a = [A,B]
b  =  [B, C]
c = [C, T]

then we have

Symmetry 5:  (a, b), c = a, (b, c)

which works unproblematically in our setting, as the distinction between ordinary and disjoint union is not relevant.

Looking at the mapping between individuals and values in the context of Knuth and Skilling's mapping between lattice elements and numerical values, how can we interpret

c = a + b corresponds to C = A # B

in the present context?   To calculate c = a + b if A and B are known, one would

  • find z_a in A so that v(z_a, A) = a
  • find z_b in B so that v(z_b, B) = b
  • let z_c = z_a # z_b


Then, via the coherence rule

z_c = maxzr { v(z_C, C) | x in C }

and one can set

c = v(z_C,C)

The treatment of direct product and chain composition in Foundations of Inference carries over directly here, as there is nothing different about direct products and inclusion in our setting versus the setting they consider.

Axioms 1-5 from Foundations of Inference Section 4 appear to follow directly, the only caveat being that the equations are only to be used when the individuals involved are disjoint.

Section 5.1 in Foundations of Inference deals specifically with the case of disjoint arguments, which is the case of core interest here.

The overall conclusion is: If v is a coherent value function, then value-assignments of the form

r(A) = max{ v(x,A)  | x in A}

should behave like monotone scalings of probabilities.

This means, that, for instance, they should obey the formula

r(S1 # … # Sn) = r(S1) + … + r(Sn)

-- or else

r(S1 # … # Sn) = r(S1) +^ … +^ r(Sn)

where

a +^ b  = f( f^{-1}(a) + f^{-1}(b) )

for some monotone function f.

Using Incoherent Value Systems to Seed Learning of Coherent Value Systems


Now let's get to the punchline -- given the above notion and characterization of value coherence, how might one create a coherent value system that still retains some of the core qualitative aspects of an incoherent value system such as, say, current human value systems?

Given an incoherent value system v, one can define a related, derived coherent value system v' as follows. 

The basic idea is to define an error function E1(r’) via the sum over all pairs (S1, S2) of

( r’(S1 # S2) - ( r’(S1) + r’(S2) ) )^2

and another error function E2(q) as the sum over all S of

( w(S) * ( r’(S) - r(S) )  )^2

where w(S) is an a priori weight specifying how much a given individual S is valued — e.g. S could be valued proportional to simplicity or proportional to relatedness to a specific base system, etc. ...

[this could be made more sophisticated, e.g. via accounting for intersection of different S in various ways, but this simple version will be sufficient for making the current conceptual points].

… and then look for Pareto optima of the problem of minimizing E1 and E2.

One can then use an iterative algorithm to find a v’(x,A) leading to r’(S) that live on this Pareto frontier, using the original v(x,A) as an initial condition.

This is somewhat analogous to Eliezer Yudkowsky’s notion of “coherent extrapolated volition”, but much more clearly defined.

The optimal iterative algorithm to use here is not clear and this is likely a quite subtle question as the intersection of machine learning/reasoning and numerical analysis.   However, some simple evocative thoughts pointing in the direction of an appropriate heuristic algorithm may be conceptually interesting.

Along these lines, one can think about an iterative algorithm of the following nature.

Given A and A = B#C, let

v1(A | B)

denote the maximum value for A that is achievable via choosing the maximum-value instance of B, and then choosing the maximal-value instance of C that can co-exist with this.

Qualitatively, this means: How much value can we provide for A via maximizing the value of some sub-individual B of A.  For instance, how much value can we provide for me by first maximizing value for my lungs, or how much value can we provide for my family by first maximizing value for me personally?

Given E = A # D, let

v2(A | E) 

denote the maximum value for A that is achievable via choosing the maximum-value instance of E, and then choosing the maximal-value instance of D that can co-exist with this, and let

v3(A | D) 

denote the maximum value for A that is achievable via choosing the maximum-value instance of D, and then choosing the maximal-value instance of E that can co-exist with this.

These measure: How much value can we provide for A via maximizing the value of some individual containing A, or of some individual that is composed with A to form a commonly containing individual?  For instance, how much value can we provide for me via first maximizing the value of my family, or via first maximizing the value of the other people in my family?

If v is coherent, then v1=v2=v3=v.   

In general, one could think about using (v1 + v2 + v3) (A) as an estimator for v’(A) to help guide the iterative optimization algorithm.   This sum (v1 + v2 + v3) (A)  is an estimate of the value providable for A via maximizing the value of a randomly chosen sub-individual, super-individual or connected-individual of A.   This will often be a useful pointer in the direction of a more coherent value system than v, i.e. (v1 + v2 + v3) () is likely to be more coherent than v().

This particular estimator is relatively crude and much more sophisticated, qualitatively similar estimators can surely be created.  But the idea I want to get across is that iterative pursuit of a coherent value system that is close to a given incoherent value system, with search seeded from this incoherent value system, may involve iterative steps through intermediate value systems that are conceptually reminiscent of the thinking behind CEV.  That is, one can think about

  • What kind of people would current humans like to be, if they could more fully realize their own values
  • What would these hypothetical “revised better humans” value, and what kind of humans would THEY like to be
  • etc.




This sort of iterative process, while rough and poorly-defined, is similar to v1 + v2 + v3 as defined above, and could be interesting as an avenue for iterating from current incoherent human values to a coherent value system living on the above-described Pareto frontier.

Varieties of Coherent Value System


Assuming there are multiple coherent value systems on the Pareto frontier, then one could guide the iterative search process toward a coherent value system in various different ways.

For instance, referencing the above simple estimator for simplicity of discussion, in constructing v1, v2 and v3 one could weight certain A, D and E more highly.

If one considers this weighting to be achieved via some cost function c(A) applied to individuals A, then one can think about the way different choices for c may impact the ultimate coherent value system obtained.  (Of course the weight c() could be chosen the same as the weight w() used in the error function itself, and this would probably be the optimal choice in terms of effective guidance of optimization, but it’s not the only choice.)

E.g. a “selfish” v’ could be obtained by using a c that weights those S relating to a specific person very highly, and other S much less.  

A consistently short-term-gratification oriented v’ could be obtained by using a c that weights S restricted to short periods of time very highly, and S restricted to longer periods of time much less.  In many cultures this would rule out, e.g. a value system that values being happily married over the time-scale of years, but over the time-scale of hours values having sex with whomever one finds attractive.   But a purely hedonistic value system that values a long period of time precisely according to the sum of the time-localized pleasures experienced during that period of time, may be perfectly coherent.   Just as there can be a coherent value system that puts value on a time-local experience based substantially on both its immediate characteristics and its contribution to longer-term goals.

A value system that puts extremely high value on freedom of choice for individuals, but also extremely high value on societal order and structure, may be incoherent within the scope of human societies feasible in the context of modern human psychology and culture.   A value system that prioritizes order and structure for society and obedience and submission for individuals is more likely to be coherent, as is one that values both freedom of choice and creative anarchic social chaos.   The professed value systems of most contemporary influential political parties are, in this sense, obviously extremely incoherent.

Are Intelligences with Coherent Value Systems More Efficient?


Arguably an intelligent system that directs its actions according to a coherent value system will, all else equal, be more efficient than one that directs its actions according to an incoherent one.  This is because a mind with an incoherent value system will choose actions oriented toward maximizing value of one subset S1 of the world, and then later choose actions oriented toward maximizing some other subset S2 of the world — and will find that what it did in the context of S2 acts against what it did in the context of S1, and vice versa.  Whereas for a mind with a coherent value system, actions chosen with respect to different subsets of the world will tend to reinforce and support each other, except where inference errors or unexpected properties of the world intervene.

This argument suggests that, in an evolutionary context involving competition between multiple intelligences, there will be a certain advantage to the ones with coherent value systems.   However, this advantage doesn’t have to be decisive, because there may be other advantages enjoyed by entities with incoherent value systems.   For instance, maintaining a coherent value system may sometimes be highly expensive in terms of space, time and energetic resources (it can be quite complex to figure out the implications of one’s actions for all the subsets of the world they impinge upon!).   

My suspicion is that as computational and energetic resources become more ample and easily accessible by the competing cognizers in an evolutionary system, the efficiency advantage of a coherent value system becomes an increasingly significant factor.   This suspicion seems a very natural and important candidate for further formal and qualitative exploration.