Here we sketch a possible explanation for the well-known difficulty of measuring intermediate progress toward human-level AGI is provided, via extending the notion of cognitive synergy to a more refined notion of ”tricky cognitive synergy.”
The Puzzle: Why Is It So Hard to Measure Partial Progress Toward Human-Level AGI?
A recurrent difficulty in the AGI field is the difficulty of creating a good test for intermediate progress toward the goal of human-level AGI.
It’s not entirely straightforward to create tests to measure the final achievement of human-level AGI, but there are some fairly obvious candidates here. There’s the Turing Test (fooling judges into believing you’re human, in a text chat) the video Turing Test, the Robot College Student test (passing university, via being judged exactly the same way a human student would), etc. There’s certainly no agreement on which is the most meaningful such goal to strive for, but there’s broad agreement that a number of goals of this nature basically make sense.
On the other hand, how does one measure whether one is, say, 50 percent of the way to human-level AGI? Or, say, 75 or 25 percent?
It’s possible to pose many ”practical tests” of incremental progress toward human-level AGI, with the property that IF a proto-AGI system passes the test using a certain sort of architecture and/or dynamics, then this implies a certain amount of progress toward human-level AGI based on particular theoretical assumptions about AGI. However, in each case of such a practical test, it seems intuitively likely to a significant percentage of AGI researcher that there is some way to ”game” the test via designing a system specifically oriented toward passing that test, and which doesn’t constitute dramatic progress toward AGI.
Some examples of practical tests of this nature would be
- The Wozniak ”coffee test”: go into an average American house and figure out how to make coffee, including identifying the coffee machine, figuring out what the buttons do, finding the coffee in the cabinet, etc.
- Story understanding – reading a story, or watching it on video, and then answering questions about what happened (including questions at various levels of abstraction)
- Passing the elementary school reading curriculum (which involves reading and answering questions about some picture books as well as purely textual ones)
- Learning to play an arbitrary video game based on experience only, or based on experience plus reading instructions
One interesting point about tests like this is that each of them seems to some AGI researchers to encapsulate the crux of the AGI problem, and be unsolvable by any system not far along the path to human-level AGI – yet seems to other AGI researchers, with different conceptual perspectives, to be something probably game-able by narrow-AI methods. And of course, given the current state of science, there’s no way to tell which of these practical tests really can be solved via a narrow-AI approach, except by having a lot of people try really hard over a long period of time.
A question raised by these observations is whether there is some fundamental reason why it’s hard to make an objective, theory-independent measure of intermediate progress toward advanced AGI. Is it just that we haven’t been smart enough to figure out the right test – or is there some conceptual reason why the very notion of such a test is problematic?
We don’t claim to know for sure – but in this brief note we’ll outline one possible reason why the latter might be the case.
Is General Intelligence Tricky?
The crux of our proposed explanation has to do with the sensitive dependence of the behavior of many complex systems on the particulars of their construction. Often-times, changing a seemingly small aspect of a system’s underlying structures or dynamics can dramatically affect the resulting high-level behaviors. Lacking a recognized technical term to use here, we will refer to any high-level emergent system property whose existence depends sensitively on the particulars of the underlying system as tricky. Formulating the notion of trickiness in a mathematically precise way is a worthwhile pursuit, but this is a qualitative essay so we won’t go that direction here.
Thus, the crux of our explanation of the difficulty of creating good tests for incremental progress toward AGI is the hypothesis that general intelligence, under limited computational resources, is tricky.
Now, there are many reasons that general intelligence might be tricky in the sense we’ve defined here, and we won’t try to cover all of them here. Rather, we’ll focus on one particular phenomenon that we feel contributes a significant degree of trickiness to general intelligence.
Is Cognitive Synergy Tricky?
One of the trickier aspects of general intelligence under limited resources, we suggest, is the phenomenon of cognitive synergy.
The cognitive synergy hypothesis, in its simplest form, states that human-level AGI intrinsically depends on the synergetic interaction of multiple components (for instance, as in the OpenCog design, multiple memory systems each supplied with its own learning process). In this hypothesis, for instance, it might be that there are 10 critical components required for a human-level AGI system. Having all 10 of them in place results in human-level AGI, but having only 8 of them in place results in having a dramatically impaired system – and maybe having only 6 or 7 of them in place results in a system that can hardly do anything at all.
Of course, the reality is almost surely not as strict as the simplified example in the above paragraph suggests. No AGI theorist has really posited a list of 10 crisply-defined subsystems and claimed them necessary and sufficient for AGI. We suspect there are many different routes to AGI, involving integration of different sorts of subsystems. However, if the cognitive synergy hypothesis is correct, then human-level AGI behaves roughly like the simplistic example in the prior paragraph suggests. Perhaps instead of using the 10 components, you could achieve human-level AGI with 7 components, but having only 5 of these 7 would yield drastically impaired functionality – etc. Or the same phenomenon could be articulated in the context of systems without any distinguishable component parts, but only continuously varying underlying quantities. To mathematically formalize the cognitive synergy hypothesis in a general way becomes complex, but here we’re only aiming for a qualitative argument. So for illustrative purposes, we’ll stick with the ”10 components” example, just for communicative simplicity.
Next, let’s suppose that for any given task, there are ways to achieve this task using a system that is much simpler than any subset of size 6 drawn from the set of 10 components needed for human-level AGI, but works much better for the task than this subset of 6 components(assuming the latter are used as a set of only 6 components, without the other 4 components).
Note that this supposition is a good bit stronger than mere cognitive synergy. For lack of a better name, we’ll call it tricky cognitive synergy. The tricky cognitive synergy hypothesis would be true if, for example, the following possibilities were true:
- creating components to serve as parts of a synergetic AGI is harder than creating components intended to serve as parts of simpler AI systems without synergetic dynamics
- components capable of serving as parts of a synergetic AGI are necessarily more complicated than components intended to serve as parts of simpler AGI systems.
These certainly seem reasonable possibilities, since to serve as a component of a synergetic AGI system, a component must have the internal flexibility to usefully handle interactions with a lot of other components as well as to solve the problems that come its way. In terms of our concrete work on the OpenCog integrative proto-AGI system, these possibilities ring true, in the sense that tailoring an AI process for tight integration with other AI processes within OpenCog, tends to require more work than preparing a conceptually similar AI process for use on its own or in a more task-specific narrow AI system.
It seems fairly obvious that, if tricky cognitive synergy really holds up as a property of human-level general intelligence, the difficulty of formulating tests for intermediate progress toward human-level AGI follows as a consequence. Because, according to the tricky cognitive synergy hypothesis, any test is going to be more easily solved by some simpler narrow AI process than by a partially complete human-level AGI system.
We haven’t proved anything here, only made some qualitative arguments. However, these arguments do seem to give a plausible explanation for the empirical observation that positing tests for intermediate progress toward human-level AGI is a very difficult prospect. If the theoretical notions sketched here are correct, then this difficulty is not due to incompetence or lack of imagination on the part of the AGI community, nor due to the primitive state of the AGI field, but is rather intrinsic to the subject matter. And if these notions are correct, then quite likely the future rigorous science of AGI will contain formal theorems echoing and improving the qualitative observations and conjectures we’ve made here.
If the ideas sketched here are true, then the practical consequence for AGI development is, very simply, that one shouldn’t worry all that much about producing compelling intermediary results. Just as 2/3 of a human brain may not be much use, similarly, 2/3 of an AGI system may not be much use. Lack of impressive intermediary results may not imply one is on a wrong development path; and comparison with narrow AI systems on specific tasks may be badly misleading as a gauge of incremental progress toward human-level AGI.
Hopefully it’s clear that the motivation behind the line of thinking presented here is a desire to understand the nature of general intelligence and its pursuit – not a desire to avoid testing our AGI software! Truly, as AGI engineers, we would love to have a sensible rigorous way to test our intermediary progress toward AGI, so as to be able to pose convincing arguments to skeptics, funding sources, potential collaborators and so forth -- as well as just for our own edification. We really, really like producing exciting intermediary results, on projects where that makes sense. Such results, when they come, are extremely informative and inspiring to the researchers as well as the rest of the world! Our motivation here is not a desire to avoid having the intermediate progress of our efforts measured, but rather a desire to explain the frustrating (but by now rather well-established) difficulty of creating such intermediate goals for human-level AGI in a meaningful way.
If we or someone else figures out a compelling way to measure partial progress toward AGI, we will celebrate the occasion. But it seems worth seriously considering the possibility that the difficulty in finding such a measure reflects fundamental properties of the subject matter – such as the trickiness of cognitive synergy and other aspects of general intelligence.