In the background, using spare time here and there, over the last few years I've managed to write down a series of sketchy research papers summarizing key aspects of what has been a long-running thread in my mind for a very long time: A general theory of general intelligence.
And by that I mean a REALLY REALLY general theory of general intelligence ... including the phenomenological aspect of "what it is to be a general intelligence" ... including consciousness from first, second and third person perspectives ... and including the dynamics via which minds help construct each other, and minds and physical reality co-create each other. But also encompassing practical information about how human brains achieve general intelligence and why they achieve it the way they do, and how to effectively create general intelligence in various synthetic substrates, such as computer software.
I certainly don't claim to be there yet. However, after a few sketchy papers hastily typed out in late nights during the last year, I feel like I finally have a complete outline of such a theory. I know what needs to be in there, and quite a lot of what should be rigorous theorems in such a theory, I now have at least in the form of rough but explicitly articulated conjectures.
In this blog post I'm going to briefly run through these various papers and explain how I believe they build together toward a GTGI. I'll also highlight some of the gaps that I think will need to be filled in to complete the GTGI story along these lines.
Starting from the philosophical, this paper
outlines a high level perspective on "life, the universe and everything" that bridges cognitive science, theoretical physics, analytical philosophy, phenomenological philosophy and more.
In part this paper was intended as a sequel to the book
that I co-edited with Damien Broderick. The book reviews some of the copious evidence that psi phenomena exist, but doesn't try to explain how they might work. The Euryphysics paper tries to outline a world-model within which a rational yet non-reductive explanation of psi might be constructed -- by constructing a very broad world-model going beyond traditional categories such as physical mental and cultural reality.
"Euryphysics" means "the wider world" -- a core concept is that there is a broader domain of existence of which our 4D spacetime continuum and our individual minds are just small parts. The regularities governing this broader domain are not entirely "physics-like" (evolution described by concise sets of differential equations) and may be more "mind-like" in some sense. Aspects of "consciousness" may be best considerable at the level of euryphysics rather than physics or individual psychology.
But how to build a theory of the Eurycosm? (Among other things, this could be -- a theory explaining interesting details about how mind and physical reality create each other.)
Let's start with elementary observations. The most elementary sort of observation is a distinction -- just an act of distinguishing some stuff from some other stuff. (Yes, I had some influence from G. Spencer Brown and his friend Lou Kauffmann.) This paper
Distinction Graphs and Graphtropy: A Formalized Phenomenological Layer Underlying Classical and Quantum Entropy, Observational Semantics and Cognitive Computation
introduces a theory of "distinction graphs" -- in which a link is drawn between two observations, relative to a given observer, the observer cannot distinguish them (while it's not clarified in the paper, basically an "observation" can be considered as "something that can be distinguished"). Graphtropy is introduced as an extension of logical entropy from partitions to distinction graphs, along with extensions like probabilistic and quantum distinction graphs. An analogue of the maximum entropy principle for distinction graphs is suggested.
Graphtropy gives a broad view of surprisingness, which has many values including giving a foundation for notions of time. As suggested in the Euryphysics paper, local time flow may be interpreted in terms of surprisingness gradients, and global time axes via stitching together compatible local time flows.
Among the various tools that can be useful for analyzing distinction graphs and associated structures is the notion of simplicity vs. complexity. This paper
attempts an axiomatic characterization of "what is simplicity"? Conventional simplicity measures like minimal program length emerge as a special case.
Simplicity allows articulation of "what is a pattern"? (A pattern is a representation-as-something-simpler.)
And this allows a nice formalization of the Peircean idea of "the tendency to take habits" -- which is equivalent to Smolin's Precedence Principle in quantum mechanics, or Sheldrake's morphic resonance principle, a plausible high level explanation for psi phenomena.
One would also like to construct something like probability theory that is natural on graphs (e.g. distinction graphs), in the same way that conventional probability is natural on sets. In this paper (inspired heavily by Knuth and Skilling's classic paper Foundations of Inference and its sequels),
I bite this bullet, giving a specific way of constructing intuitionistic "generalized probabilities" on top of graphs, hypergraphs, metagraphs and similar structures. The approach relies on some way of assigning "costs" to different graph transformations -- which is provided e.g. if one has a simplicity measure in hand.
It's also the case that if the nice symmetries needed to construct probabilities only hold approximately for a given domain -- then you get an uncertainty measure on that domain that is approximately probabilistic. I.e. the dependence of probability theory's rules on the underlying symmetry axioms is reasonably smooth, as I argued here:
Probability Theory Ensues from Assumptions of Approximate Consistency: A Simple Derivation and its Implications for AGI
(I only explicitly considered the case of classical probability theory, but the same arguments would hold for the intuitionistic case.)
Once you have probabilities, you have second order, third order and then... infinite-order probabilities (defined as distributions over spaces of infinite-order probabilities):
Are these useful? Well one can construct interesting models of aspects of phenomenological experience, using non-well-founded set theory (aka hypersets),
and layering uncertainty onto these models, you get infinite-order probabilities.
There is some unification not yet written out here: The hypersets I consider are modeled by apg's ("accessible pointed graphs", i.e. a digraph with a distinguished node N from which all other nodes can be reached), and a directed distinction graph can be interpreted as patchwork of apg's. One can build apg's up from distinction graphs, though I haven't written up that paper yet. Basically you model distinctioning as a directional process -- you ask if an observer already has made observation A, is it able to make observation B considering B as distinct from A? This gives a directed distinction graph, which is then a patchwork of apg's, i.e. a mesh of overlapping hypersets.
Given probability distributions and simplicity measures, one can start measuring intelligence in traditional ways ("traditional" in the sense of Legg and Hutter or my first book The Structure of Intelligence) ... one can look at intelligence as the ability to achieve complex goals in complex environments using limited resources...
Though it is also worth keeping in mind the wider nature of intelligence as Weaver articulated so richly in his PhD thesis
Another paper I haven't yet written up is a formalization of open-ended intelligence in terms of richness of pattern creation.
One can formalize the three key values of "Joy, Growth and Choice" in terms of graphtropy and pattern theory (Joy is patterns continuing, growth is new pattern being created, choice is graphtropy across pattern space) -- so relative to any local time-axis one can look at the amount of Joy/Growth/Choice being manifested which is one way of looking at the amount of open-ended intelligence.
One way to move from these intriguing generalities toward specific cognitive, computational and physics theories is to assume a specific computational model. In this paper
I articulate what seems an especially natural computational model for general intelligence (CoDDs, Combinatorial Decision Directed-acyclic-graphs), and I conjecture that if one assumes this computational model, then some nice compatibilities between graphtropic measures of complexity and simplicity-theoretic measures of complexity emerge. (Actually the paper talks about correlating algorithmic information with logical entropy but the generalization to graphtropy is not a big leap.)
A CoDD is basically a decision tree that is recursively nested so that a whole decision tree can serve as an input to a decision tree, and augmented with the the ability to replace two identical subtrees with two instances of a certain token (memo-ization). Repetition-replacement and recursion are enough to tweak decision trees into a Turing-complete computational model (which is basically the insight that SK-combinator calculus is universal, phrased a bit differently).
This computational model also leads to some interesting extensions of the basic model of pattern as "representation as something simpler", including the notion of "quattern" -- the quantum analogue of a classical pattern.
The paper doesn't draw any connections with distinction graphs -- but it's quite interesting to look at CoDDs whose leaves are observations related in a distinction graph.
My primary focus is on applying these GTGI-ish ideas to AI and cognitive science, but the applications to physics also can't be overlooked. In this verrrry sketchy notes-to-self type paper
I outline a possible path to creating unified (standard model + gravity) physics models via hypergraph models (including hypergraph links with causal interpretation). Spacetime is a hypergraph and event probabilities are estimated using Feynman type sums that add up terms corresponding to multiple spacetimes as well as multiple possible scenarios within each spacetime.
Ben Dribus, a mathematician who has developed his own much more in-depth graph-based physics models, has (in a personal communication) sketched a dynamical equation that works in my causal web model.
Another paper not yet written up regards the formal similarities between conservation of energy in physics and conservation of evidence (i.e. avoidance of double counting of evidence) in logic. One can view energy as the form that observation takes in a certain logic (that has observational semantics), and then physical dynamics as a process of derivation in this logic, with the consistency of the logic depending on the conservation of energy (which avoids double-counting evidence).
Extending this physics-ish line of thinking in a direction that also encompasses the cognitive, was a recent paper with a messy title:
Maximal Algorithmic Caliber and Algorithmic Causal Network Inference: General Principles of Real-World General Intelligence?
The basic idea here was to come up with physics-ish "dynamical laws of cognition" by replacing Shannon algorithm in MaxEnt type principles, with algorithmic information. Not yet done is to extend this to graphtropy -- by extending Maximum Caliber Principle to distinction graphs that evolve over time, and then creating a corresponding form of Maximal Algorithmic Caliber that works with Combinatorial Decision Dags whose primitives are observations in a distinction graph.
The "maximum caliber principle" is extended to a "maximum algorithmic caliber principle" that characterizes the possible worlds most likely to accord with a given set of observations -- one should assume the world has evolved with the maximum algorithmic caliber consistent with observations (basically, the most computationally dense way consistent with observations). Basically, this just means that if you don't know how the world has made your observations come about, you need to make some assumption. Lacking some simplicity prior, there are more possible worlds involving a lot of distinctions than a few, so the odds will be high (based on simple Principle of Indifference type symmetry arguments) that the underlying reality makes a lot of distinctions. Given a simplicity prior, the most likely worlds will be the ones that make about as many distinctions as the prior considers in the "reasonably likely" range.
Algorithmic Markov processes, the algorithmic-information analogue of ordinary statistical Markov processes, turn out to be the most rational hypothesis to use when inferring processes based on data. There are more possible processes similar to an algorithmic Markov process that obey your given constraints, than any other sort of processes. If you looked in the mind of a near maximally generally intelligent AIXI-tl type agent, you would see that it was implicitly or explicitly making the assumption that the world is often roughly an algorithmic Markov process.
To move from these highly general "laws of mind" toward laws of human-like mind one needs to look at the special situations for which human-like minds evolved. In the paper
I suggest that symmetries and other regularities in the environments and goals that an intelligence needs to deal with, should be mappable via (uncertain) morphisms into corresponding symmetries/regularities in the structure and dynamics of the intelligent system itself. I roughly formalize this correspondence in terms of category theory (which ultimately needs an intuitionistic probability-like quantity like the one I mentioned above, which however I only discovered/invented a few years after writing the Mind-World Correspondence paper).
As for what are the symmetries and regularities human-like minds in particular need to deal with, I made some concrete suggestions in
THE EMBODIED COMMUNICATION PRIOR: A CHARACTERIZATION OF GENERAL INTELLIGENCE IN THE CONTEXT OF EMBODIED SOCIAL INTERACTION
It should be noted that my suggestions are far more specific than what the great Yoshua Bengio proposed in his "consciousness prior" paper. Basically there he suggests that AGI needs a prior distribution that favors joint distributions that factor into forms where most weight goes to a small number of factors. This is a very sensible idea and does indeed tie in with the way working memory works in current human and AI minds. However, I think the structure and dynamics of human-like minds have been adapted heavily to considerably more specialized assumptions to do with modeling events in 4D spacetime, and specifically to handling communication among spatiotemporally embodied agents who share the same sensation and action space.
One feature of the environments and goals human-like minds are faced with, is that they tend to factorize into qualitatively different types of knowledge / perception / action -- e.g. procedural vs. declarative/semantic vs. attentional vs. sensory, etc. This leads to minds that have distinct yet closely coupled subcomponents that need to have robust capability to help each other out of difficult cognitive spots -- "Cognitive Synergy", which underpins the OpenCog AGI design I've been working on for 1-2 decades (depending how you count). The different types of human memory correspond closely to different aspects of the everyday human physical and social environment.
The Embodied Communication Prior includes "tendency to take habits" as a corollary. This leads to the amusing notion that, via reflexive application of morphic resonance to itself, the human sphere within our physical spacetime may have some "spooky synchronistic correlation" with other portions of the Eurycosm that also happen to display the tendency to take habits!
More prosaically, the paper
formalizes the concept of cognitive synergy on a category-theoretic foundation.
What is not articulated fully there is that, ultimately, the cognitive processing of real-world AGI systems can be viewed as: a set of interacting cognitive algorithms, each of which in a sense results from doing program specialization on the universal algorithm "form an algorithmic Markov model consistent with one's observations, and use it to drive inference about what procedures will achieve one's goals given the observed context", relative to focus on a specific sort of knowledge, memory or situation (e.g. procedural, sensory, declarative...). These specialized cognitive algorithms must be learned/evolved based on multiple constraints including energetic usage, minimizing spatial extent and maximizing processing speed, and interoperability among the different cognitive algorithms (so that they can see each others' internal states so as to help each other out when they get stuck).
Design of a framework like OpenCog may be viewed as performing this sort of program specialization "by hand", as we don't have automated program specializers capable of this degree of complexity. An AGI program specializer will be able to do it, but then we have a chicken-egg problem -- which is solved by human AGI system designers performing the first round of the iteration.
explains how the connection between language, action, perception and memory works in terms of the category-theoretic model of cognitive synergy.
gives some speculative ideas regarding how the human brain may implement some of these abstract structures (using multiple neural-net modules interconnected, e.g. different, closely cooperating architectures for corpus and hippocampus -- but not as simplistically interconnected as in currently popular deep or shallow neural net architectures).
This lets us revisit the vexed issue of "consciousness." My view is that consciousness is a universal property immanent in all existence, but that "human-like consciousness" has some special properties, which come out of the Embodied Communication Prior along with other factors. This paper
aims to identify what is special about human-like consciousness as opposed to other flavors.
This includes physical and computationally-cognitive correlates of the hyperset models of self, will and awareness alluded to earlier. Mapping between distinction graphs and hyperset apg's, can be seen as mapping between sensate-oriented and reflection-oriented reflexive meta-views of the same base subjective experience.
deals with the question of identity under conditions of gradual change -- arguing that if a mind changes slowly enough that, at each stage, it models where it came from, where it is and where it's going in terms of a unified self-construct.... then in essence it IS a unified self. This IMO solves the issue of "continuity of consciousness and identity" in a mind uploading context.
To realize these abstract GTGI ideas in practical AGI systems, one needs a series of bridging formalisms, toolkits and systems. This is something I'm currently working on within the TrueAGI / Atomese 2.0 research initiative (still an early-stage non-public thing), but one paper has recently crawled out of this particular research swamp:
Among other things, what is advocated there is a gradually typed approach to AI programming, wherein different cognitive processes corresponding to different types of memory/knowledge are realized using different type systems. Casting between these type systems is part of the process of cognitive synergy.
There is a Curry-Howard correspondence between a gradually typed language like this, and a paraconsistent logic. As cognitive processes must be probabilistic, what we ultimately have is a Curry-Howard correspondence between intuitionistically-probabilistic paraconsistent logic and a gradually typed probabilistic functional programming language.
The intuitionistic aspect of this logic, maps into the absence of highly general continuation-passing features in the language -- and it means that ultimately the logic can be reduced to operations on distinction graphs, and the corresponding programs can be reduced to e.g. CoDDs operating on elementary observations drawn from distinction graphs.
An AGI-oriented hypergraph knowledge store like the OpenCog Atomspace can be viewed as a CoDD that operates on the elementary observations made by a specific cognitive system, and abstracts from these observations to form programs for generating sets of observations from more compact descriptions. These include observations of what action-combinations tend to lead to what goals in what contexts. A programming language like Atomese 2.0 is a concise, workable way of creating higher level program constructs equivalent ultimately to CoDDs over distinction graphs.
So there you go. Turning all the above papers into a single coherent narrative would be many months of full-time work -- and then turning all the conjectures in the papers into actual theorems would be probably several years of full-time work. I'm not sure when I'll get to all that, since I have an insane number of other things on my plate. But I do feel like it's finally time for the "weaving together and rigorizing" phase of my GTGI quest -- I think that with the most recent few papers, among the ones listed above, the core ideas needed have finally fallen into place!