-->
Today I'll come back to an old topic -- my old chum Eliezer
Yudkowsky's intriguing yet ill-founded notion of "Coherent ExtrapolatedVolition" (CEV).
The core idea of CEV is, as Eli put it,
In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
This is a beautiful concept, but I think it fundamentally
doesn't make sense and will never be directly useful (perhaps this is part of
its beauty!).
Obviously you should not judge the CEV concept by the above "poetic" gloss though -- if you're curious read the whole paper linked above. It's interesting.
In the past I have suggested a few variations like averaging
together what everyone on the planet wants, or making a conceptual blend of
what everyone on the planet wants.
However, these variations do lose a key aspect of the original CEV idea:
that it's not peoples' current desires that we're accounting for, but rather
the desires of other "better" beings that have been hypothetically
created based on current people.
Here I will present a new variant, CEVa (Coherent
Extrapolated Valuation), which I believe captures more of the spirit of the
original.
The main reason I think the original CEV idea is incoherent
is that "what person X wants to be" is not a coherent notion. Quite often, when a person becomes what they
(thought they) wanted to be, they realize they didn't want that at all. To talk about "what a person wants to
be, deep deep down" as distinct from what they consciously THINK they want
to be -- this just wanders into the realm of the unacceptably nebulous, even
though I do sorta grok what it means on an intuitive basis.
What I want to do here is try to rescue the original CEV
idea by replacing the "what a person wants to be" part with something
a bit more concrete (though still not anywhere close to feasible to implement
at the present time).
Eliezer has more recently talked less about CEV and more
about "Rawlsian reflective equilibrium" as a conceptually related idea that's
easier to formulate, or even as a near-equivalent of CEV. See this recent review of CEV and related ideas by Nick Tarleton. But I think the
Rawlsian approach lacks the bite of the original CEV, somehow. I'm more inspired to keep pushing on the
original CEV to see if it can be made in some sense workable.
Continuity of Self
In a previous paper published in the Journal of Machine Consciousness, I addressed the question of: When
does a descendant of a certain mind, count as a continuation of that mind? For instance I am a continuation of my 2
year old self, even though we are very, very different. What if tomorrow I got a brain implant and
became 5% machine ... then a year later I became 10% machine ... then in a few
decades I was essentially all machine.
Suppose that as I got more and more machine in my brain, I became more
and more cognitively different. Would I
still be "myself" by 2050? In
a sense yes, in a sense no.
What I introduced there was a notion of "continuity of
self" -- i.e. when a mind M changes its into another different mind
M", there is the question of whether M' feels it is (and models itself as) the same entity as
M. What I suggest is that, if one has a
long chain of minds so that each element in the chain has continuity of self
with the previous entity, then a later entity on the chain should be
considered, in a sense, a later version of every earlier entity on the
chain.
So if I upgraded my brain with machine parts on a gradual
schedule as I suggested above, probably there would be continuity of self all
along, and at each stage I would feel like I was continuously growing and
evolving (just as I've done over my life so far), even though eventually the
changes would accumulate and become tremendous.
But if I upgraded 50% of my brain at once, the change might be so sudden
and discontinuous that after the upgrade, I really did not feel like myself
anymore.
Coherent Extrapolated
Valuation: individualized version
Most probably you've seen where I'm going already.
Suppose we consider, for each person in a society at a
certain point in time, the set of forward-going paths beginning from that
person -- but possessing continuity of self at each step along the way.
Now let's add one more ingredient: Let's ask at each step of
the way, whether the change is recognized as desirable. There are two aspects here: desirable in
hindsight and desirable in foresight.
When mind M changes into mind M', we can ask: if M could see M', would
it think the change was for the better ... and we can ask: does M', looking
backward, think the change is for the better?
How to weight these two aspects of desirability is basically a
"parameter choice" in CEVa.
If we can weight each step on a path of mind-evolution as to
desirability, then we can also weight a whole path as to desirability, via
averaging the desirabilities of the various steps. This requires an assumption of some
time-discounting factor: nearer-term changes have got to be weighted higher
than further-term changes, according to some time series with a finite
sum. This set of temporal weights is
another parameter choice in CEVa.
Given a person at a particular time, then, we can look at
the self-continuing forward-going paths started at that person, and we can
weight each of these paths via its desirability.
This gives the first version of CEVa: We can associate with
a person, not just their value judgments at the present time, but the value
judgments of all the minds existing along self-continuing forward-going
mind-evolution paths from their present mind.
We can then weight these different minds, and make an overall weighted
average of "the judgment of the current person M and all the minds M' they
might eventually become, where the latter are weighted by the desirability
along the path from M to M' ".
There are a lot of free parameters here and I certainly
don't know how to compute this in practice.
However, it seems like a reasonably fair interpretation of Eliezer's
original notion of "the person that a certain person wishes they were."
Coherent Extrapolated Valuation: collective version
There is still a gaping flaw in the CEVa version I've just
outlined, though: it's too individual-centric. It doesn't really make sense to think about
the evolution of human minds as individuals, given the degree of collective
experience and collective intelligence in modern humanity.
Instead it probably makes more sense to look at potential
futures of a whole SOCIETY of minds.
One can then ask, for a society S and then a slightly changed society
S': how desirable is the change, from the point of view of S, and also from the
point of view of S'?
One can calculate desirability based on individual minds
within the society -- but also based on "group intelligences"
existing within the society, such as families, corporations or even the whole
society considered as a sort of "global brain."
Weighting the desirabilities of individuals versus those of
larger groups involves some subtlety in terms of "subtracting off for
overlap." Also, identifying what
is a coherent enough entity to count in the average may become subtle,
especially if we see the emergence of "mindplexes" in which multiple
minds fuse together in various partial ways to form mixed individual/collective
intelligences. But these complexities
are not really bugs in CEVa -- they're just complexities of the actual
situation being analyzed.
This "collective" CEVa -- CEVav2 -- is my current
suggestion regarding how to transform the original CEV idea into something
related that is at least conceptually sound.
Now, one possibility is that when one does CEVa (version 1
or 2) one does not find anything coherent.
One may find that some individuals or groups and their self-continuing
descendants have values X, and others have values Y, and X and Y are very different. In that case, if one has need to come up
with a single coherent value system, one can try to do a conceptual blend and
come up with something new and coherent that incorporates key aspects of X and
Y and also has other desirable merits like simplicity or various aesthetic
qualities.
Ethics is
Solved! Woo hoo!!
Ethics now becomes simple!
To figure out if you should run in front of that train to save that
baby, at risk of your own life -- you merely simulate all possible future
evolutions of human society (including those involving transcendence to various
transhuman entities), calculate a certain weighting function for each one, and
then figure out what each mind at each level of organization in each possible
future evolution of society would want you to do regarding the baby. Simple as pie! Ah, and you'd better do the calculation
quickly or the baby will get squashed while you're programming your
simulator... and then no pie for you ...
Oh yeah -- and there are some further subtleties I swept
under the transhuman rug in the above.
For instance, what if a trajectory of self-modification results in
something without a self, or something that makes no judgments about some
situations but does about others. Does
one assume continuity-of-self or not, when dealing with selfless hypothetical
future entities and their hypothetical future evolutions? How, quantitatively, does one incorporate
"number of judgments" (weight of evidence) into a composite value
assessment? But I am reasonably
comfortable assuming that a superhuman AGI capable of doing the CEVa
calculations, will also be capable of handling these matters and the various
other loose ends.
No But Really -- So
What?
To my own taste, at least, CEVa is a lot clearer
conceptually than the original CEV, and meatier than Rawlsian reflective equilibrium and
related notions. Perhaps it's less
beautiful, in some correlated way, but so it goes....
On the other hand, CEVa does share with the original CEV the
trait of not being remotely useful in practice at the present time. We simply have no way to compute this sort of
thing.
Furthermore, there are so many free parameters in the
definition of CEVa that it seems likely one could tweak it in many different
ways to get many different answers to the same question. This is not a bug in CEVa, either -- it
would be the case in any reasonably concrete idea in the vicinity of CEV....
If there is any value to this sort of thought-exercise --
aside from its inarguable value as weird-brow entertainment for a small crew of
futurist geeks -- it is probably as a way of clarifying conceptually what we
actually mean by "desirable" or "valuable" in a
future-looking sense. I, for one,
genuinely DO want to make choices that my future self-continuing descendants
would think are good, not just choices that my current incarnation thinks are
good based on its own immediate knowledge and reactions. I don't want to make choices that my current
self HATES just because my future evolutions have a very different set of
values than my current self -- but very often I'm faced with hard choices
between different options that seem confusingly, roughly equally valuable to
me... and I would really LOVE to get input from the superminds I will one day
give rise to. I have no good way to get
such input, alas (despite what Terrence McKenna said sometimes, mushrooms are a
pretty noisy channel...), but still, the fact that I like this idea, says
something about how I am thinking about value systems and mind evolution.
I doubt very much we are going to "hard-code"
complex ethical systems into future AGIs.
Ethics is just not that simple.
Rather, we will code in some general principles and processes, and AGI
systems will learn ethics via experience and instruction and self-reflection,
as intelligent minds in the world must.
HOWEVER -- at very least -- when we guide AGI systems to
create their own value systems, we can point them to CEV and CEVa and Rawlsian
coherence and the whole mess of other approaches to human ethics ... and who
knows, maybe this may help them understand what the heck we mean by "what
we want deep down."
Or on the other hand, such notions may end up being no use
to the first superhuman AGIs at all -- they may be able to form their own ideas
about what humans want deep down via their own examination of the nitty-gritty
of human life. They may find our hairy
human abstractions less informative than specific data about human behaviors, from
which they can then abstract in their own ways.
But hey, providing them with multiple forms of guidance
seems more likely to help than to hurt....
And at very least, this stuff is fun to think about! (And if you read the first link above, you will know that Mr. Yudkowsky has warned us against the dangers of things that are fun to think about ... but please rest assured I spent most of my time thinking about more useful but more tedious aspects of AGI ;-p )
4 comments:
The CEV is based on a false premise, that mental properties are completely reducible to physical ones.
In other words, the mistake here is an overly reductionistic approach to ethics - the mistaken idea that you could reduce ethics to purely mechanical terms.
The notion that you could somehow mechanically 'extrapolate' a person's future values without any consciousness present is based on a reductionistic fallacy.
The whole *point* of conscious experience is to to evolve our values into the future. In other words...the only way for our values to evolve is for us to live our lives...you cannot predict in advance how a person's values will evolve, without actually *being* a person.
The 'orthogonality' postulate of Bostrom/Yudkowsky and the resulting ludcrious notion of the 'paperclip monster' is based on the same big ontological mistaken as CEV: the idea that mental properties are completely reducible to physical ones.
But there's another argument that is sufficient to rebut CEV and orthogonality:
Intelligence and values are not the same thing, but they *are* related. The fabric of knowledge is a unified whole, and the whole history of science is of domains that were once thought to be separate later found to be related. This fact alone is enough to cast serious doubt on 'orthogonality', quite apart from the ontological mistake I talked about above.
Any AGI or general-intelligence mind needs 3 distinct systems:
Evaluation system/Basic Drives, Decision-making system and Planning system.
Whilst these 3 systems are not the same, they *are* related, and there is little basis for thinking that you can arbitrarily chop and change one without it seriously affecting the other 2.
In other words, emotions (basic drives), decision-making (policy) and planning (high-level values) all *depend* on each other for smooth functioning.
If an AI has the wrong values, this will seriously limit its decision-making system, thus falsifying orthogonality. In the one example of a general purpose intelligence that we know about (humans) this is clearly true - cutting out a person's basic emotions/drives results in severe degradation of decision-making abilities - 'paralysis by analysis'.
The correct approach to ethics lies in knowledge representation and ontology. One needs to identity the correct a-priori ('universal') categories of thought that are *necessary* prerequisites to thought in the first place. Kant had the right idea all those years ago!
Once we've identified the 'universal categories' , we code them up, and our job is basically done. The categories should form the 'seeds' for our AGI to do all the rest of the learning on its own.
In other words, identity the basic ontological 'primitives' of ethics (the conceptual seeds), code these up, and let the AGI learn the rest on its own. The seeds are the conceptual scaffolding on which the AGI would then build, based on empirical learning of human values.
Of course, Bayesian induction isn't a fully general method of reasoning under uncertainty. Real rationality is *abduction* not induction (Bayesian induction is actually just a special sense of abduction).
It is through abduction that science really works, building coherent categories on top of the a-priori universal categories of thought that serve as the conceptual seeds.
See my A-Z list of the basic concepts needed to understand abduction here:
http://www.zarzuelazen.com/ConceptLearning.html
My list connects to wikipedia articles, be sure that you read *all* of these, and I promise, after reflecting on what you've read, it will become clear how Bayesian induction is really just a special case of the real rationality, *abduction*.
After you've grasped this, all the AGI problems will start to fall like dominos and the road to Singularity will finally be clear...
It would be cool to see a group try to approximate this for the globe.
A framework whose parameters require us to map out the global ethical spectrum of preferences is good ;-)
[Even if it doesn't just tell us what to do on a silver platter :p]
I am surprised this thing is still knocking about. Many, including myself, commented on its flaws at the time of its birth and I thought it was pretty much dead letter. The few times I have run into Eliezer since then and mentioned it he seemed to more or less disown it as defective but was not clear that there was something new and better. At least that was my perception of the situation.
One of the strang flaws of the CEV is that an AGI or something else that is super smart but not necessarily autonomous as a being, will somehow extrapolate with its powers of computation what we would in fact want if we were quite a bit different from the way we are, at least in qualitative ways and in quantitative ways of how well we can process how much information. Worse it seeks to extrapolate with general human desires and proclivities, our evolved psychology if you will, held as more or less a given but leading these other things vary. However one of the most likely results of being quantitatively and qualitatively much better and different is that we would likely see through and eschew much of this change programming and its implied goal structure basis.
Then there is the problem that the original model seemed to seek to do some kind of universal maximization of human happiness, a grand utilitarianism. But happiness is notoriously sketchy to define much less optimize. And if indeed you start with assuming our evolved psychology is kept constant then what makes beings with that psychology experience happiness may not be at all adequate to such advanced theoretical versions of ourselves.
So a deep problem of CEV is that it has no workable notion of what the "good" actually is beyond some hand waving toward our what our evolutionary psychology plus what works in inter-relationship between peers is as much of "good" as can be thought about.
What of those that don't like what this CEV, if one could possibly build one, comes up with? What is this CEV intersects some other evolved technological species' CEV equivalent? Can the ethical basis be extended?
Post a Comment