The Multiverse According to Ben: Coherent Extrapolated Valuation (CEVa): Another Stab at Defining a Future-Looking Human-Friendly Value System

--> Today I'll come back to an old topic -- my old chum Eliezer Yudkowsky's intriguing yet ill-founded notion of "Coherent ExtrapolatedVolition" (CEV).

The core idea of CEV is, as Eli put it,

In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

This is a beautiful concept, but I think it fundamentally doesn't make sense and will never be directly useful (perhaps this is part of its beauty!).

Obviously you should not judge the CEV concept by the above "poetic" gloss though -- if you're curious read the whole paper linked above. It's interesting.

In the past I have suggested a few variations like averaging together what everyone on the planet wants, or making a conceptual blend of what everyone on the planet wants. However, these variations do lose a key aspect of the original CEV idea: that it's not peoples' current desires that we're accounting for, but rather the desires of other "better" beings that have been hypothetically created based on current people.

Here I will present a new variant, CEVa (Coherent Extrapolated Valuation), which I believe captures more of the spirit of the original.

The main reason I think the original CEV idea is incoherent is that "what person X wants to be" is not a coherent notion. Quite often, when a person becomes what they (thought they) wanted to be, they realize they didn't want that at all. To talk about "what a person wants to be, deep deep down" as distinct from what they consciously THINK they want to be -- this just wanders into the realm of the unacceptably nebulous, even though I do sorta grok what it means on an intuitive basis.

What I want to do here is try to rescue the original CEV idea by replacing the "what a person wants to be" part with something a bit more concrete (though still not anywhere close to feasible to implement at the present time).

Eliezer has more recently talked less about CEV and more about "Rawlsian reflective equilibrium" as a conceptually related idea that's easier to formulate, or even as a near-equivalent of CEV. See this recent review of CEV and related ideas by Nick Tarleton. But I think the Rawlsian approach lacks the bite of the original CEV, somehow. I'm more inspired to keep pushing on the original CEV to see if it can be made in some sense workable.

Continuity of Self

In a previous paper published in the Journal of Machine Consciousness, I addressed the question of: When does a descendant of a certain mind, count as a continuation of that mind? For instance I am a continuation of my 2 year old self, even though we are very, very different. What if tomorrow I got a brain implant and became 5% machine ... then a year later I became 10% machine ... then in a few decades I was essentially all machine. Suppose that as I got more and more machine in my brain, I became more and more cognitively different. Would I still be "myself" by 2050? In a sense yes, in a sense no.

What I introduced there was a notion of "continuity of self" -- i.e. when a mind M changes its into another different mind M", there is the question of whether M' feels it is (and models itself as) the same entity as M. What I suggest is that, if one has a long chain of minds so that each element in the chain has continuity of self with the previous entity, then a later entity on the chain should be considered, in a sense, a later version of every earlier entity on the chain.

So if I upgraded my brain with machine parts on a gradual schedule as I suggested above, probably there would be continuity of self all along, and at each stage I would feel like I was continuously growing and evolving (just as I've done over my life so far), even though eventually the changes would accumulate and become tremendous. But if I upgraded 50% of my brain at once, the change might be so sudden and discontinuous that after the upgrade, I really did not feel like myself anymore.

Coherent Extrapolated Valuation: individualized version

Most probably you've seen where I'm going already.

Suppose we consider, for each person in a society at a certain point in time, the set of forward-going paths beginning from that person -- but possessing continuity of self at each step along the way.

Now let's add one more ingredient: Let's ask at each step of the way, whether the change is recognized as desirable. There are two aspects here: desirable in hindsight and desirable in foresight. When mind M changes into mind M', we can ask: if M could see M', would it think the change was for the better ... and we can ask: does M', looking backward, think the change is for the better? How to weight these two aspects of desirability is basically a "parameter choice" in CEVa.

If we can weight each step on a path of mind-evolution as to desirability, then we can also weight a whole path as to desirability, via averaging the desirabilities of the various steps. This requires an assumption of some time-discounting factor: nearer-term changes have got to be weighted higher than further-term changes, according to some time series with a finite sum. This set of temporal weights is another parameter choice in CEVa.

Given a person at a particular time, then, we can look at the self-continuing forward-going paths started at that person, and we can weight each of these paths via its desirability.

This gives the first version of CEVa: We can associate with a person, not just their value judgments at the present time, but the value judgments of all the minds existing along self-continuing forward-going mind-evolution paths from their present mind. We can then weight these different minds, and make an overall weighted average of "the judgment of the current person M and all the minds M' they might eventually become, where the latter are weighted by the desirability along the path from M to M' ".

There are a lot of free parameters here and I certainly don't know how to compute this in practice. However, it seems like a reasonably fair interpretation of Eliezer's original notion of "the person that a certain person wishes they were."

Coherent Extrapolated Valuation: collective version

There is still a gaping flaw in the CEVa version I've just outlined, though: it's too individual-centric. It doesn't really make sense to think about the evolution of human minds as individuals, given the degree of collective experience and collective intelligence in modern humanity.

Instead it probably makes more sense to look at potential futures of a whole SOCIETY of minds. One can then ask, for a society S and then a slightly changed society S': how desirable is the change, from the point of view of S, and also from the point of view of S'?

One can calculate desirability based on individual minds within the society -- but also based on "group intelligences" existing within the society, such as families, corporations or even the whole society considered as a sort of "global brain."

Weighting the desirabilities of individuals versus those of larger groups involves some subtlety in terms of "subtracting off for overlap." Also, identifying what is a coherent enough entity to count in the average may become subtle, especially if we see the emergence of "mindplexes" in which multiple minds fuse together in various partial ways to form mixed individual/collective intelligences. But these complexities are not really bugs in CEVa -- they're just complexities of the actual situation being analyzed.

This "collective" CEVa -- CEVav2 -- is my current suggestion regarding how to transform the original CEV idea into something related that is at least conceptually sound.

Now, one possibility is that when one does CEVa (version 1 or 2) one does not find anything coherent. One may find that some individuals or groups and their self-continuing descendants have values X, and others have values Y, and X and Y are very different. In that case, if one has need to come up with a single coherent value system, one can try to do a conceptual blend and come up with something new and coherent that incorporates key aspects of X and Y and also has other desirable merits like simplicity or various aesthetic qualities.

Ethics is Solved! Woo hoo!!

Ethics now becomes simple! To figure out if you should run in front of that train to save that baby, at risk of your own life -- you merely simulate all possible future evolutions of human society (including those involving transcendence to various transhuman entities), calculate a certain weighting function for each one, and then figure out what each mind at each level of organization in each possible future evolution of society would want you to do regarding the baby. Simple as pie! Ah, and you'd better do the calculation quickly or the baby will get squashed while you're programming your simulator... and then no pie for you ...

Oh yeah -- and there are some further subtleties I swept under the transhuman rug in the above. For instance, what if a trajectory of self-modification results in something without a self, or something that makes no judgments about some situations but does about others. Does one assume continuity-of-self or not, when dealing with selfless hypothetical future entities and their hypothetical future evolutions? How, quantitatively, does one incorporate "number of judgments" (weight of evidence) into a composite value assessment? But I am reasonably comfortable assuming that a superhuman AGI capable of doing the CEVa calculations, will also be capable of handling these matters and the various other loose ends.

No But Really -- So What?

To my own taste, at least, CEVa is a lot clearer conceptually than the original CEV, and meatier than Rawlsian reflective equilibrium and related notions. Perhaps it's less beautiful, in some correlated way, but so it goes....

On the other hand, CEVa does share with the original CEV the trait of not being remotely useful in practice at the present time. We simply have no way to compute this sort of thing.

Furthermore, there are so many free parameters in the definition of CEVa that it seems likely one could tweak it in many different ways to get many different answers to the same question. This is not a bug in CEVa, either -- it would be the case in any reasonably concrete idea in the vicinity of CEV....

If there is any value to this sort of thought-exercise -- aside from its inarguable value as weird-brow entertainment for a small crew of futurist geeks -- it is probably as a way of clarifying conceptually what we actually mean by "desirable" or "valuable" in a future-looking sense. I, for one, genuinely DO want to make choices that my future self-continuing descendants would think are good, not just choices that my current incarnation thinks are good based on its own immediate knowledge and reactions. I don't want to make choices that my current self HATES just because my future evolutions have a very different set of values than my current self -- but very often I'm faced with hard choices between different options that seem confusingly, roughly equally valuable to me... and I would really LOVE to get input from the superminds I will one day give rise to. I have no good way to get such input, alas (despite what Terrence McKenna said sometimes, mushrooms are a pretty noisy channel...), but still, the fact that I like this idea, says something about how I am thinking about value systems and mind evolution.

I doubt very much we are going to "hard-code" complex ethical systems into future AGIs. Ethics is just not that simple. Rather, we will code in some general principles and processes, and AGI systems will learn ethics via experience and instruction and self-reflection, as intelligent minds in the world must.

HOWEVER -- at very least -- when we guide AGI systems to create their own value systems, we can point them to CEV and CEVa and Rawlsian coherence and the whole mess of other approaches to human ethics ... and who knows, maybe this may help them understand what the heck we mean by "what we want deep down."

Or on the other hand, such notions may end up being no use to the first superhuman AGIs at all -- they may be able to form their own ideas about what humans want deep down via their own examination of the nitty-gritty of human life. They may find our hairy human abstractions less informative than specific data about human behaviors, from which they can then abstract in their own ways.

But hey, providing them with multiple forms of guidance seems more likely to help than to hurt.... And at very least, this stuff is fun to think about! (And if you read the first link above, you will know that Mr. Yudkowsky has warned us against the dangers of things that are fun to think about ... but please rest assured I spent most of my time thinking about more useful but more tedious aspects of AGI ;-p )

4 comments:

ZARZUELAZEN2:03 AM
This comment has been removed by the author.
ZARZUELAZEN2:27 AM
The CEV is based on a false premise, that mental properties are completely reducible to physical ones.

In other words, the mistake here is an overly reductionistic approach to ethics - the mistaken idea that you could reduce ethics to purely mechanical terms.
The notion that you could somehow mechanically 'extrapolate' a person's future values without any consciousness present is based on a reductionistic fallacy.
The whole *point* of conscious experience is to to evolve our values into the future. In other words...the only way for our values to evolve is for us to live our lives...you cannot predict in advance how a person's values will evolve, without actually *being* a person.

The 'orthogonality' postulate of Bostrom/Yudkowsky and the resulting ludcrious notion of the 'paperclip monster' is based on the same big ontological mistaken as CEV: the idea that mental properties are completely reducible to physical ones.
But there's another argument that is sufficient to rebut CEV and orthogonality:
Intelligence and values are not the same thing, but they *are* related. The fabric of knowledge is a unified whole, and the whole history of science is of domains that were once thought to be separate later found to be related. This fact alone is enough to cast serious doubt on 'orthogonality', quite apart from the ontological mistake I talked about above.

Any AGI or general-intelligence mind needs 3 distinct systems:
Evaluation system/Basic Drives, Decision-making system and Planning system.

Whilst these 3 systems are not the same, they *are* related, and there is little basis for thinking that you can arbitrarily chop and change one without it seriously affecting the other 2.

In other words, emotions (basic drives), decision-making (policy) and planning (high-level values) all *depend* on each other for smooth functioning.
If an AI has the wrong values, this will seriously limit its decision-making system, thus falsifying orthogonality. In the one example of a general purpose intelligence that we know about (humans) this is clearly true - cutting out a person's basic emotions/drives results in severe degradation of decision-making abilities - 'paralysis by analysis'.

The correct approach to ethics lies in knowledge representation and ontology. One needs to identity the correct a-priori ('universal') categories of thought that are *necessary* prerequisites to thought in the first place. Kant had the right idea all those years ago!

Once we've identified the 'universal categories' , we code them up, and our job is basically done. The categories should form the 'seeds' for our AGI to do all the rest of the learning on its own.

In other words, identity the basic ontological 'primitives' of ethics (the conceptual seeds), code these up, and let the AGI learn the rest on its own. The seeds are the conceptual scaffolding on which the AGI would then build, based on empirical learning of human values.

Of course, Bayesian induction isn't a fully general method of reasoning under uncertainty. Real rationality is *abduction* not induction (Bayesian induction is actually just a special sense of abduction).

It is through abduction that science really works, building coherent categories on top of the a-priori universal categories of thought that serve as the conceptual seeds.

See my A-Z list of the basic concepts needed to understand abduction here:
http://www.zarzuelazen.com/ConceptLearning.html

My list connects to wikipedia articles, be sure that you read *all* of these, and I promise, after reflecting on what you've read, it will become clear how Bayesian induction is really just a special case of the real rationality, *abduction*.
After you've grasped this, all the AGI problems will start to fall like dominos and the road to Singularity will finally be clear...
zariuq10:09 AM
It would be cool to see a group try to approximate this for the globe.

A framework whose parameters require us to map out the global ethical spectrum of preferences is good ;-)
[Even if it doesn't just tell us what to do on a silver platter :p]
samantha10:23 PM
I am surprised this thing is still knocking about. Many, including myself, commented on its flaws at the time of its birth and I thought it was pretty much dead letter. The few times I have run into Eliezer since then and mentioned it he seemed to more or less disown it as defective but was not clear that there was something new and better. At least that was my perception of the situation.

One of the strang flaws of the CEV is that an AGI or something else that is super smart but not necessarily autonomous as a being, will somehow extrapolate with its powers of computation what we would in fact want if we were quite a bit different from the way we are, at least in qualitative ways and in quantitative ways of how well we can process how much information. Worse it seeks to extrapolate with general human desires and proclivities, our evolved psychology if you will, held as more or less a given but leading these other things vary. However one of the most likely results of being quantitatively and qualitatively much better and different is that we would likely see through and eschew much of this change programming and its implied goal structure basis.

Then there is the problem that the original model seemed to seek to do some kind of universal maximization of human happiness, a grand utilitarianism. But happiness is notoriously sketchy to define much less optimize. And if indeed you start with assuming our evolved psychology is kept constant then what makes beings with that psychology experience happiness may not be at all adequate to such advanced theoretical versions of ourselves.

So a deep problem of CEV is that it has no workable notion of what the "good" actually is beyond some hand waving toward our what our evolutionary psychology plus what works in inter-relationship between peers is as much of "good" as can be thought about.

What of those that don't like what this CEV, if one could possibly build one, comes up with? What is this CEV intersects some other evolved technological species' CEV equivalent? Can the ethical basis be extended?

Sunday, August 28, 2016

Coherent Extrapolated Valuation (CEVa): Another Stab at Defining a Future-Looking Human-Friendly Value System

4 comments: