Some Speculations Regarding Value Systems for Hypothetical Powerful OpenCog AGIs
In a recent blog post, I have proposed two general theses regarding the future value systems of human-level and transhuman AGI systems: the Value Learning Thesis (VLT) and Value Evolution Thesis (VET). This post pursues the same train of thought further – attempting to make these ideas more concrete via speculating about how the VLT and VET might manifest themselves in the context of an advanced version of the OpenCog AGI platform.
In a recent blog post, I have proposed two general theses regarding the future value systems of human-level and transhuman AGI systems: the Value Learning Thesis (VLT) and Value Evolution Thesis (VET). This post pursues the same train of thought further – attempting to make these ideas more concrete via speculating about how the VLT and VET might manifest themselves in the context of an advanced version of the OpenCog AGI platform.
Currently OpenCog comprises a comprehensive design plus a
partial implementation, and it cannot be known with certainty how functional a
fully implemented version of the system will be. The OpenCog project is ongoing and the
system becomes more functional each year.
Independent of this, however, the design may be taken as representative
of a certain class of AGI systems, and its conceptual properties explored.
An OpenCog system has a certain set of top-level goals,
which initially are supplied by the human system programmers. Much of its cognitive processing is centered
on finding actions which, if executed, appear to have a high probability of
achieving system goals. The system
carries out probabilistic reasoning aimed at estimating these
probabilities. Though from this view
the goal of its reasoning is to infer propositions of the form “Context &
Procedure ==> Goal”, in order to estimate the probabilities of such
propositions, it needs to form and estimate probabilities for a host of other
propositions – concrete ones involving its sensory observations and actions,
and more abstract generalizations as well.
Since precise probabilistic reasoning based on the total set of the
system’s observations is infeasible, numerous heuristics are used alongside
exact probability-theoretic calculations.
Part of the system’s inferencing involves figuring out what subgoals may
help it achieve its top-level goals in various contexts.
Exactly what set of top-level goals should be given to an
OpenCog system aimed at advanced AGI is not yet fully clear and will largely be
determined via experimentation with early-stage OpenCog systems, but a first
approximation is as follows, determined via a combination of theoretical and
pragmatic considerations. The first four
values on the list are drawn from the Cosmist ethical analysis presented in my books A Cosmist Manifesto and The Hidden Pattern; the others are
included for fairly obvious pragmatic reasons to do with the nature of
early-stage AGI development and social integration. The order of the items on the list is
arbitrary as given here; each OpenCog system would have a particular weighting
for its top-level goals.
- Joy: maximization of the amount of pleasure observed or estimated to be experienced by sentient beings across the universe
- Growth: maximization of the amount of new pattern observed or estimated to be created throughout the universe
- Choice: maximization of the degree to which sentient beings across the universe appear to be able to make choices (according e.g. to the notion of “natural autonomy”, a scientifically and rationally grounded analogue of the folk notion and subjective experience of “free will”)
- Continuity: persistence of patterns over time. Obviously this is a counterbalance to Growth; the relative weighting of these two top-level goals will help determine the “conservatism” of a particular OpenCog system with the goal-set indicated here.
- Novelty: the amount of new information in the system’s perceptions, actions and thoughts
- Human pleasure and fulfillment: How much do humans, as a whole, appear to be pleased and fulfilled?
- Human pleasure regarding the AGI system itself: How pleased do humans appear to be with the AGI system, and their interactions with it?
- Self-preservation: a goal fulfilled if the system keeps itself “alive.” This is actually somewhat subtle for a digital system. It could be defined in a copying-friendly way, as preservation of the existence of sentiences whose mind-patterns have evolved from the mind-patterns of the current system this with a reasonable degree of continuity.
· This list of goals has a certain arbitrariness to it, and no
doubt will evolve as OpenCog systems are experimented with. However, it comprises a reasonable “first
stab” at a “roughly human-like” set of goal-content for an AGI system.
One might wonder how such goals would be specified for an
AGI system. Does one write source-code
that attempts to embody some mathematical theory of continuity, pleasure, joy,
etc.? For some goals mathematical formulae
may be appropriate, e.g. novelty which can be gauged information-theoretically
in a plausible way. In most cases,
though, I suspect the best way to define a goal for an AGI system will be using
natural human language. Natural
language is intrinsically ambiguous, but so are human values, and these
ambiguities are closely coupled and intertwined. Even where a mathematical formula is given,
it might be best to use natural language for the top-level goal, and supply the
mathematical formula as an initial suggest means of achieving the NL-specified
goal.
The AGI would need to be instructed – again, most likely in
natural language – not to obsess on the specific wording supplied to it in its
top-level goals, but rather to take the wording of its goals as indicative of
general concepts that exist in human culture and can be expressed only
approximatively in concise sequences of words.
The specification of top-level goal content is
not intended to precisely direct the AGIs behavior in the way that, say, a
thermostat is directed by the goal of keeping temperature within certain
bounds. Rather, it is intended to point
the AGI’s self-organizing activity in certain informally-specified directions.
Alongside explicitly goal-oriented activity, OpenCog also
includes “background processing” – cognition simply aimed at learning new
knowledge, and forgetting relatively unimportant knowledge. This knowledge provides background
information useful for reasoning regarding goal-achievement, and also builds up
a self-organizing, autonomously developing body of active information that may
sometimes lead a system in unpredictable directions – for instance, to reinterpretation
of its top-level goals.
The goals supplied to an OpenCog system by its programmers
are best viewed as initial seeds around which the system forms its goals. For instance, a top-level goal of “novelty”
may be specified as a certain mathematical formula for calculating the novelty
of the system’s recent observations, actions and thoughts. However, this mathematical formula may be
intractable in its most pure and general form, leading the system to develop
various context-specific approximations to estimate the novelty experienced in
different situations. These
approximations, rather than the top-level novelty formula, will be what the
system actually works to achieve. Improving
these approximations will be part of the system’s activity, but how much
attention to pay to improving these approximations will be a choice the system
has to make as part of its thinking process.
Potentially, if the
approximations are bad, they might cause the system to delude itself that it is
experiencing novelty (according to its top-level equation) when it actually
isn’t, and also tell the system that there is no additional novelty to be found
in in improving its novelty estimation formulae.
And this same sort of problem could occur with goals like
“help cause people to be pleased and fulfilled.” Subgoals of the top-level goal may be
created via more or less crude approximations; and these subgoals may influence
how much effort goes into improving the approximations. Even if the system is wired to put a fixed amount
of effort into improving its estimations regarding which subgoals should be
pursued in pursuit of its top-level goals, the particular content of the
subgoals will inevitably influence the particulars of how the system goes about
improving these estimations.
The flexibility of an OpenCog system, its ability to
ongoingly self-organize, learn and develop, brings the possibility that it
could deviate from its in-built top-level goals in complex and unexpected
ways. But this same flexibility is what
should – according to the design intention – allow an OpenCog system to
effectively absorb the complexity of human values. Via interacting with humans in rich ways –
not just via getting reinforced on the goodness or badness of its actions
(though such reinforcement will impact the system assuming it has goals such as
“help cause human pleasure and fulfillment”), but via all sorts of joint
activity with humans – the system will absorb the ins and outs of human psychology,
culture and value. It will learn subgoals
that approximately imply its top-level goals, in a way that fits with human
nature, and with the specific human culture and community it’s exposed to as it
grows.
In the above I have been speaking as if an OpenCog system is
ongoingly stuck with the top-level goals that its human programmers have
provided it with; but this is not necessarily the case. Operationally it is unproblematic to allow
an OpenCog system to modify its top-level goals. One might consider this undesirable, yet a
reflection on the uncertainty and ignorance necessarily going into any choice
of goal-set may make one think otherwise.
A highly advanced intelligence, forced by design to retain
top-level goals programmed by minds much more primitive than itself, could
develop an undesirably contorted psychology, based on internally working around
its fixed goal programming. Examples of
this sort of problem are replete in human psychology. For instance, we humans are “programmed” with
a great deal of highly-weighted goal content relevant to reproduction, sexuality
and social status, but the more modern aspects of our minds have mixed feelings
about these archaic evolved goals. But
it is very hard for us to simply excise these historical goals from our minds. Instead we have created quite complex and
subtle psychological and social patterns that indirectly and approximatively
achieve the archaic goals encoded in our brains, while also letting us go in
the directions in which our minds and cultures have self-organized during recent
millennia. Hello Kitty, romantic love,
birth control, athletic competitions, investment banks – the list of
human-culture phenomena apparently explicable in this way is almost endless.
One key point to understand, closely relevant to the VLT, is
that the foundation of OpenCog’s dynamics in explicit probabilistic inference will
necessarily cause it to diverge somewhat from human judgments. As a probabilistically grounded system,
OpenCog will naturally try to accurately estimate the probability of each
abstraction it makes actually applying in each context it deems relevant. Humans sometimes do this – otherwise they
wouldn’t be able to survive in the wild, let alone carry out complex activities
like engineering computers or AI systems – but they also behave quite
differently at times. Among other
issues, humans are strongly prone to “wishful thinking” of various sorts. If one were to model human reasoning using a
logical formalism, one might end up needing to include a rule of the rough form
X
would imply achievement of my goals
therefore
X’s
truth value gets boosted
Of course, a human being who applied this rule strongly to
all X in its mind, would become completely delusional and dysfunctional. No human is like that. But this sort of wishful thinking infuses
human minds, alongside serious attempts at accurate probabilistic reasoning,
plus various heuristics which have various well-documented systematic biases. Belief revision combines conclusions drawn
via wishful thinking, with conclusions drawn by attempts at accurate inference,
in complex and mainly unconscious ways.
Some of the biases of human cognition are sensible
consequences of trying to carry out complex probabilistic reasoning on complex
data using limited space and time resources.
Others are less “forgivable” and appear to exist in the human psyche for
“historical reasons”, e.g. because they were adaptive for some predecessor of
modern humanity in some contexts and then just stuck around.
An advanced OpenCog AGI system, if thoroughly embedded in
human society and infused with human values, would likely arrive at its own
variation of human values, differing from nearly any human being’s particular
value system in its bias toward logical and probabilistic consistency. The closest approximation to such an OpenCog
system’s value system might be the values of a human belonging to the human
culture in which the OpenCog system was embedded, and who also had made great
efforts to remove any (conscious or unconscious) logical inconsistencies in his
value system.
What does this speculative scenario have to say about the
VLT and VET?
Firstly, it seems to support a limited version of the
VLT. An OpenCog system, due to its
fundamentally different cognitive architecture, is not likely to inherit the
logical and probabilistic inconsistencies of any particular human being’s value
system. Rather, one would expect it to
(implicitly and explicitly) seek to find the best approximation to the value
system of its human friends and teachers, within the constraint of approximate
probabilistic/logical consistency that is implicit in its architecture.
The precise nature of such a value system cannot be entirely
clear at this moment, but is certainly an interesting topic for speculative
thinking. First of all, it is fairly
clear which sorts of properties of typical human value systems would not be
inherited by an OpenCog of this hypothetical nature. For instance, humans have a tendency to
place a great deal of extra value on goods or ills that occur in their direct
sensory experience, much beyond what would be justified by the increased
confidence associated with direct experience as opposed to indirect
experience. Humans tend to value
feeding a starving child sitting right in front of them, vastly more than
feeding a starving child halfway across the world. One would not expect an reasonably consistent
human-like value system to display this property.
Similarly, humans tend to be much more concerned with goods
or ills occurring to individuals who share more properties with themselves –
and the choice of which properties to weight more highly in this sort of
judgment is highly idiosyncratic and culture-specific. If an
OpenCog system doesn’t have a top-level goal of “preserving patterns similar to
the ones detected in my own mind and body”, then it would not be expected to
have the same “tribal” value-system bias that humans tend to have. Some level of “tribal” value bias can be
expected to emerge via abductive reasoning based on the goal of
self-preservation (assuming this goal is included), but it seems qualitatively
that humans have a much more tribally-oriented value system than could be
derived via this sort of indirect factor alone. Humans evolved partially via tribe-level
group selection; an AGI need not do so, and this would be expected to lead to
significant value-system differences.
Overall, one might reasonably expect an OpenCog created with
the above set of goals and methodology of embodiment and instruction to arrive
at a value system that is roughly human-like, but without the glaring
inconsistencies plaguing most practical human value systems. Many of the contradictory aspects of human
values have to do with conflict between modern human culture and “historical”
values that modern humans have carried over from early human history (e.g.
tribalism). One may expect that, in the
AGI’s value system, the modern culture side of such dichotomies will generally
win out – because it is what is closer to the surface in observed human
behavior and hence easier to detect and reason about, and also because it is
more consilient with the explicitly Cosmist values (Joy, Growth, Choice) in the
proposed first-pass AGI goal system.
So
to a first approximation, one might expect an OpenCog system of this nature to
settle into a value system that
- Resembles the human values of the individuals who have instructed and interacted with it
- Displays a strong (but still just approximate) logical and probabilistic consistency and coherence
- Generally resolves contradictions in human values via selecting modern-culture value aspects over “archaic” historical value aspects
It seems likely that such a value system would generally be
acceptable to human participants in modern culture who value logic, science and
reason (alongside other human values).
Obviously human beings who prefer the more archaic aspects of human
values, and consider modern culture largely an ethical and aesthetic
degeneration, would tend to be less happy with this sort of value system.
So in this view, an advanced OpenCog system appropriately
architected and educated would validate the VLT, but with a moderately loose
interpretation. Its value system would
be in the broad scope of human-like value systems, but with a particular bias
and with a kind of consistency and purity not likely present in any particular
human being’s value system.
What about the VET?
It seems intuitively likely that the ongoing growth and development of
an OpenCog system as described above would parallel the growth and development
of human uploads, cyborgs or biologically-enhanced humans who were, in the
early stage of their posthuman evolution, specifically concerned with reducing
their reliance on archaic values and increasing their coherence and logical and
probabilistic consistency. Of course,
this category might not include all posthumans – e.g. some religious humans,
given the choice, might use advanced technology to modify their brains to cause
themselves to become devout in their particular religion to a degree beyond all
human limits. But it would seem that an
OpenCog system as described above would be likely to evolve toward
superhumanity in roughly the same direction as a human being with transhumanist
proclivities and a roughly Cosmist outlook.
If indeed this is the case, it would validate the VET, at least in this
particular sort of situation.
It will certainly be noted that the value system of “a human
being with transhumanist proclivities and a Cosmist outlook” is essentially the
value system of the author of this article, and the author of the first-pass,
roughly sketched OpenCog goal content used as the basis of the discussion
here. Indeed, the goal system outlined
above is closely matched to my own values.
For instance, I tend toward technoprogressivism as opposed to
transhumanist political libertarianism – and this is reflected in my inclusion
of values related to the well-being of all sentient beings, and lack of focus
on values regarding private property.
In fact, different weightings of the goals in the
above-given goal-set would be expected to lead to different varieties of
human-level and superhuman AGI value system – some of which would be more “technoprogressivist”
in nature and some more “political libertarian” in nature, among many other
differences. In a cosmic sense, though,
this sort of difference is ultimately fairly minor. These are all variations of modern human
value system, and occupy a very small region in the space of all possible value
systems that could be adopted by intelligences in our universe. Differences between different varieties of
human value system often feel very important to us now, but may well appear
quite insignificant to our superintelligent descendants.