To follow this blog by email, give your address here...

Tuesday, May 12, 2020

Morphic Anti-Resonance

Morphic resonance — in which patterns that have previously occurred are more likely to re-occur — is a powerful force, characteristic of human minds and cultures and also of the quantum world (cf Smolin’s Precedence Principle)

But it’s also interesting to think about cases in which morphic anti-resonance holds...

I.e. with  morphic anti-resonance, when a pattern occurs, it is then LESS likely than otherwise would have been the case, to occur again...

Advanced financial markets could perhaps be like this (because a pattern once it's occurred is an exploitable behavior, so whomever sees the pattern has already been exploited may be extra-disinclined to enact it again)

The decline effect in psi could also be like this... once an experiment has worked, anti-resonance will cause it to stop working…

Now, it might seem anti-resonance is also a meta-level regularity expoitable by intelligence ... except that via reflexive application to itself, once anti-resonance kicks in, it will kick itself out ;D

Trickstery indeed…

What if clusters of morphic resonance are somehow balanced by clusters of morphic anti-resonance, leaving the overall cosmos morphically neutral-on-average but wildly high-variance...?

Toward a Formal Model of This Madness/Anti-Madness

If we look at the distribution over patterns in the multiverse, where p(R) indicates the probability of observing pattern R during a certain big chunk of spacetime, then

Compared to a multiverse with no such oddities,

  • A multiverse w/ morphic resonance will have a more pointy, peaked (i.e. lower entropy) distribution p()
  • A multiverse w/ morphic anti-resonance will have a flatter (i.e. higher entropy) distribution p()

So if we assume we have a multi-multiverse described as a probability distribution over multiverses, then we may posit that the average multiverse has a no-resonance p(), but this is achieved via having some multiverses with higher-entropy p() and some with lower-entropy p()

Path integrals must then be taken in the multi-multiverse not any base multiverse

Psi would then be a mix of

  • Morphic resonance and anti-resonance phenomena
  • Shifts from one multiverse to another, which involve shifts in the entropy of the multiversal pattern-probability distribution

Also note -- an observing mind's reality may be a paraconsistent patchwork of probability distributions (multiverses) rather than a single consistent  multiverse...

Dialectics of Creativity

Going further out on the limb -- Perhaps morphic resonance and anti-resonance enact the dialectic dance of creation vs. destruction?

I am reminded of economics approaches where anti-money is used in place of debt,

In economics the conserved-ish quantity is money, in physics it's energy, and in morphic pattern-omics it's synchronicity (degree of spooky resonance).   

Conservation of synchronicity suggests that a bit of morphic resonance over here is balanced out by a bit of morphic anti-resonance over there.   That's how morphic resonance can exist without morphically resonating the cosmos into a repetitive mush...

GTGI -- General Theory of General Intelligence... coming gradually...

In the background, using spare time here and there, over the last few years I've managed to write down a series of sketchy research papers summarizing key aspects of what has been a long-running thread in my  mind for a very long time: A general theory of general intelligence.  

And by that I mean a REALLY REALLY general theory of general intelligence ... including the phenomenological aspect of "what it is to be a general intelligence" ... including consciousness from first, second and third person perspectives ... and including the dynamics via which minds help construct each other, and minds and physical reality co-create each other.   But also encompassing practical information about how human brains achieve general intelligence and why they achieve it the way they do, and how to effectively create general intelligence in various synthetic substrates, such as computer software.

I certainly don't claim to be there yet.   However, after a few sketchy papers hastily typed out in late nights during the last year, I feel like I finally have a complete outline of such a theory.   I know what needs to be in there, and quite a lot of what should be rigorous theorems in such a theory, I now have at least in the form of rough but explicitly articulated conjectures.

In this blog post I'm going to briefly run through these various papers and explain how I believe they build together toward a GTGI.   I'll also highlight some of the gaps that I think will need to be filled in to complete the GTGI story along these lines.

Starting from the philosophical, this paper

outlines a high level perspective on "life, the universe and everything" that bridges cognitive science, theoretical physics, analytical philosophy, phenomenological philosophy and more.
In part this paper was intended as a sequel to the book

that I co-edited with Damien Broderick.   The book reviews some of the copious evidence that psi phenomena exist, but doesn't try to explain how they might work.   The Euryphysics paper tries to outline a world-model within which a rational yet non-reductive explanation of psi might be constructed -- by constructing a very broad world-model going beyond traditional categories such as physical mental and cultural reality.

"Euryphysics" means "the wider world" -- a core concept is that there is a broader domain of existence of which our 4D spacetime continuum and our individual minds are just small parts.   The regularities governing this broader domain are not entirely "physics-like" (evolution described by concise sets of differential equations) and may be more "mind-like" in some sense.   Aspects of "consciousness" may be best considerable at the level of euryphysics rather than physics or individual psychology.

But how to build a theory of the Eurycosm?   (Among other things, this could be -- a theory explaining interesting details about how mind and physical reality create each other.)

Let's start with elementary observations.   The most elementary sort of observation is a distinction -- just an act of distinguishing some stuff from some other stuff.  (Yes, I had some influence from G. Spencer Brown and his friend Lou Kauffmann.)    This paper

introduces a theory of "distinction graphs" -- in which a link is drawn between two observations, relative to a given observer, the observer cannot distinguish them (while it's not clarified in the paper, basically an "observation" can be considered as "something that can be distinguished").   Graphtropy is introduced as an extension of logical entropy from partitions to distinction graphs, along with extensions like probabilistic and quantum distinction graphs.   An analogue of the maximum entropy principle for distinction graphs is suggested.

Graphtropy gives a broad view of surprisingness, which has  many values including giving a foundation for notions of time.   As suggested in the Euryphysics paper, local time flow may be interpreted in terms of surprisingness gradients, and global time axes via stitching together compatible local time flows.

Among the various tools that can be useful for analyzing distinction graphs and associated structures is the notion of simplicity vs. complexity.   This paper

attempts an axiomatic characterization of "what is simplicity"?   Conventional simplicity measures like minimal program length emerge as a special case.

Simplicity allows articulation of "what is a pattern"?   (A pattern is a representation-as-something-simpler.)

And this allows a nice formalization of the Peircean idea of "the tendency to take habits" -- which is equivalent to Smolin's Precedence Principle in quantum mechanics, or Sheldrake's morphic resonance principle, a plausible high level explanation for psi phenomena.

One would also like to construct something like probability theory that is natural on graphs (e.g. distinction graphs), in the same way that conventional probability is natural on sets.   In this paper (inspired heavily by Knuth and Skilling's classic paper Foundations of Inference and its sequels),

I bite this bullet, giving a specific way of constructing intuitionistic "generalized probabilities" on top of graphs, hypergraphs, metagraphs and similar structures.   The approach relies on some way of assigning "costs" to different graph transformations -- which is provided e.g. if one has a simplicity measure in hand.

It's also the case that if the nice symmetries needed to construct probabilities only hold approximately for a given domain -- then you get an uncertainty measure on that domain that is approximately probabilistic.   I.e. the dependence of probability theory's rules on the underlying symmetry axioms is reasonably smooth, as I argued here:

(I only explicitly considered the case of classical probability theory, but the same arguments would hold for the intuitionistic case.)

Once you have probabilities, you have second order, third order and then... infinite-order probabilities (defined as distributions over spaces of infinite-order probabilities):

Are these useful?   Well one can construct interesting models of aspects of phenomenological experience, using non-well-founded set theory (aka hypersets),

and layering uncertainty onto these models, you get infinite-order probabilities.

There is some unification not yet written out here: The hypersets I consider are modeled by apg's ("accessible pointed graphs", i.e. a digraph with a distinguished node N from which all other nodes can be reached), and a directed distinction graph can be interpreted as patchwork of apg's.    One can build apg's up from distinction graphs, though I haven't written up that paper yet.   Basically you model distinctioning as a directional process -- you ask if an observer already has made observation A, is it able to make observation B considering B as distinct from A?    This gives a directed distinction graph, which is then a patchwork of apg's, i.e. a mesh of overlapping hypersets.

Given probability distributions and simplicity measures, one can start measuring intelligence in traditional ways ("traditional" in the sense of Legg and Hutter or my first book The Structure of Intelligence) ... one can look at intelligence as the ability to achieve complex goals in complex environments using limited resources...

Though it is also worth keeping in mind the wider nature of intelligence as Weaver articulated so richly in his PhD thesis

Another paper I haven't yet written up is a formalization of open-ended intelligence in terms of richness of pattern creation.  

One can formalize the three key values of "Joy, Growth and Choice" in terms of graphtropy and pattern theory (Joy is patterns continuing, growth is new pattern being created, choice is graphtropy across pattern space) -- so relative to any local time-axis one can look at the amount of Joy/Growth/Choice being manifested which is one way of looking at the amount of open-ended intelligence.

One way to move from these intriguing generalities toward specific cognitive, computational and physics theories is to assume a specific computational model.   In this paper

I articulate what seems an especially natural computational model for general intelligence (CoDDs, Combinatorial Decision Directed-acyclic-graphs), and I conjecture that if one assumes this computational model, then some nice compatibilities between graphtropic measures of complexity and simplicity-theoretic measures of complexity emerge.  (Actually the paper talks about correlating algorithmic information with logical entropy but the generalization to graphtropy is not a big leap.)

A CoDD is basically a decision tree that is recursively nested so that a whole decision tree can serve as an input to a decision tree, and augmented with the the ability to replace two identical subtrees with two instances of a certain token (memo-ization).    Repetition-replacement and recursion are enough to tweak decision trees into a Turing-complete computational model (which is basically the insight that SK-combinator calculus is universal, phrased a bit differently).

This computational model also leads to some interesting extensions of the basic model of pattern as "representation as something simpler", including the notion of "quattern" -- the quantum analogue of a classical pattern.

The paper doesn't draw any connections with distinction graphs -- but it's quite interesting to look at CoDDs whose leaves are observations related in a distinction graph.

My primary focus is on applying these GTGI-ish ideas to AI and cognitive science, but the applications to physics also can't be overlooked.    In this verrrry sketchy notes-to-self type paper

I outline a possible path to creating unified (standard model + gravity) physics models via hypergraph models (including hypergraph links with causal interpretation).    Spacetime is a hypergraph and event probabilities are estimated using Feynman type sums that add up terms corresponding to multiple spacetimes as well as multiple possible scenarios within each spacetime.  

Ben Dribus, a mathematician who has developed his own  much  more in-depth graph-based physics models, has (in a personal communication) sketched a dynamical equation that works in my causal web model.

Another paper not yet written up regards the formal similarities between conservation of energy in physics and conservation of evidence (i.e. avoidance of double counting of evidence) in logic.   One can view energy as the form that observation takes in a certain logic (that has observational semantics), and then physical dynamics as a process of derivation in this logic, with the consistency of the logic depending on the conservation of energy (which avoids double-counting evidence).

Extending this physics-ish line of thinking in a direction that also encompasses the cognitive, was a recent paper with a messy title:

The basic idea here was to come up with physics-ish "dynamical laws of cognition" by replacing Shannon algorithm in MaxEnt type principles, with algorithmic information.     Not yet done is to extend this to graphtropy -- by extending Maximum Caliber Principle to distinction graphs that evolve over time, and then creating a corresponding form of Maximal Algorithmic Caliber that works with Combinatorial Decision Dags whose primitives are observations in a distinction graph.

The "maximum caliber principle" is extended to a "maximum algorithmic caliber principle" that characterizes the possible worlds most likely to accord with a given set of observations -- one should assume the world has evolved with the maximum algorithmic caliber consistent with observations (basically, the most computationally dense way consistent with observations).   Basically, this just means that if you don't know how the world has made your observations come about, you need to make some assumption.   Lacking some simplicity prior, there are more possible worlds involving a lot of distinctions than a few, so the odds will be high (based on simple Principle of Indifference type symmetry arguments) that the underlying reality makes a lot of distinctions.   Given a simplicity prior, the most likely worlds will be the ones that make about as many distinctions as the prior considers in the "reasonably likely" range.

Algorithmic Markov processes, the algorithmic-information analogue of ordinary statistical Markov processes, turn out to be the most rational hypothesis to use when inferring processes based on data.   There are more possible processes similar to an algorithmic Markov process that obey your given constraints, than any other sort of processes.    If you looked in the mind of a near maximally generally intelligent AIXI-tl type agent, you would see that it was implicitly or explicitly making the assumption that the world is often roughly an algorithmic Markov process.

To move from these highly general "laws of mind" toward laws of human-like mind one needs to look at the special situations for which human-like minds evolved.   In the paper

I suggest that symmetries and other regularities in the environments and goals that an intelligence needs to deal with, should be mappable via (uncertain) morphisms into corresponding symmetries/regularities in the structure and dynamics of the intelligent system itself.   I roughly formalize this correspondence in terms of category theory (which ultimately needs an intuitionistic probability-like quantity like the one I mentioned above, which however I only discovered/invented a few years after writing the Mind-World Correspondence paper).

As for what are the symmetries and regularities human-like minds in particular need to deal with, I made some concrete suggestions in

It should be noted that my suggestions are far more specific than what the great Yoshua Bengio proposed in his "consciousness prior" paper.   Basically there he suggests that AGI needs a prior distribution that favors joint distributions that factor into forms where most weight goes to a small number of factors.   This is a very sensible idea and does indeed tie in with the way working memory works in current human and AI minds.   However, I think the structure and dynamics of human-like minds have been adapted heavily to considerably more specialized assumptions to do with modeling events in 4D spacetime, and specifically to handling communication among spatiotemporally embodied agents who share the same sensation and action space.

One feature of the environments and goals human-like minds are faced with, is that they tend to factorize into qualitatively different types of knowledge / perception / action -- e.g. procedural vs. declarative/semantic vs. attentional vs. sensory, etc.    This leads to minds that have distinct yet closely coupled subcomponents that need to have robust capability to help each other out of difficult cognitive spots -- "Cognitive Synergy", which underpins the OpenCog AGI design I've been working on for 1-2 decades (depending how you count).   The different types of human memory correspond closely to different aspects of the everyday human physical and social environment.

The Embodied Communication Prior includes "tendency to take habits" as a corollary.   This leads to the amusing notion that, via reflexive application of morphic resonance to itself, the human sphere within our physical spacetime may have some "spooky synchronistic correlation" with other portions of the Eurycosm that also happen to display the tendency to take habits!

More prosaically, the paper

formalizes the concept of cognitive synergy on a category-theoretic foundation.

What is not articulated fully there is that, ultimately, the cognitive processing of real-world AGI systems can be viewed as: a set of interacting cognitive algorithms, each of which in a sense results from doing program specialization on the universal algorithm "form an algorithmic Markov model consistent with one's observations, and use it to drive inference about what procedures will achieve one's goals given the observed context", relative to focus on a specific sort of knowledge, memory or situation (e.g. procedural, sensory, declarative...).   These specialized cognitive algorithms must be learned/evolved based on multiple constraints including energetic usage, minimizing spatial extent and maximizing processing speed, and interoperability among the different cognitive algorithms (so that they can see each others' internal states so as to help each other out when they get stuck).

Design of a framework like OpenCog  may be viewed as performing this sort of program specialization "by hand", as we don't have automated program specializers capable of this degree of complexity.  An AGI program specializer will be able to do it, but then we have a chicken-egg problem -- which is solved by human AGI system designers performing the first round of the iteration.

The paper

explains how the connection between language, action, perception and memory works in terms of the category-theoretic model of cognitive synergy.

The paper

gives some speculative ideas regarding how the human brain may implement some of these abstract structures (using multiple neural-net modules interconnected, e.g. different, closely cooperating architectures for corpus and hippocampus -- but not as simplistically interconnected as in currently popular deep or shallow neural net architectures).

This lets us revisit the vexed issue of "consciousness."   My view is that consciousness is a universal property immanent in all existence, but that "human-like consciousness" has some special properties, which come out of the Embodied Communication Prior along with other factors.   This paper

aims to identify what is special about human-like consciousness as opposed to other flavors.    

This includes physical and computationally-cognitive correlates of the hyperset models of self, will and awareness alluded to earlier.   Mapping between distinction graphs and hyperset apg's, can be seen as mapping between sensate-oriented and reflection-oriented reflexive meta-views of the same base subjective experience.

This paper

deals with the question of identity under conditions of gradual change -- arguing that if a mind changes slowly enough that, at each stage, it models where it came from, where it is and where it's going in terms of a unified self-construct.... then in essence it IS a unified self.    This IMO solves the issue of "continuity of consciousness and identity" in a mind uploading context.

To realize these abstract GTGI ideas in practical AGI systems, one needs a series of bridging formalisms, toolkits and systems.   This is something I'm currently working on within the TrueAGI / Atomese 2.0 research initiative (still an early-stage non-public thing), but one paper has recently crawled out of this particular research swamp:

Among other things, what is advocated there is a gradually typed approach to AI programming, wherein different cognitive processes corresponding to different types of memory/knowledge are realized using different type systems.   Casting between these type systems is part of the process of cognitive synergy.  

There is a Curry-Howard correspondence between a gradually typed language like this, and a paraconsistent logic.   As cognitive processes must be probabilistic, what we ultimately  have is a Curry-Howard correspondence between intuitionistically-probabilistic paraconsistent logic and a gradually typed probabilistic functional programming language.  

The intuitionistic aspect of this logic, maps into the absence of highly general continuation-passing features in the language -- and it means that ultimately the logic can be reduced to operations on distinction graphs, and the corresponding programs can be reduced to e.g. CoDDs operating on elementary observations drawn from distinction graphs.

An AGI-oriented hypergraph knowledge store like the OpenCog Atomspace can be viewed as a CoDD that operates on the elementary observations made by a specific cognitive system, and abstracts from these observations to form programs for generating sets of observations from more compact descriptions.   These include observations of what action-combinations tend to lead to what goals in what contexts.   A programming language like Atomese 2.0 is a concise, workable way of creating higher level program constructs equivalent ultimately to CoDDs over distinction graphs.

So there you go.   Turning all the above papers into a single coherent narrative would be many  months of full-time work -- and then turning all the conjectures in the papers into actual theorems would be probably several years of full-time work.   I'm not sure when I'll get to all that, since I have an insane number of other things on my plate.   But I do feel like it's finally time for the "weaving together and rigorizing" phase of my GTGI quest -- I think that with the most recent few papers, among the ones listed above, the core ideas needed have finally fallen into place!

Saturday, April 11, 2020

The Likely Nasty Social, Economic and Surveillance Aftereffects of COVID-19 -- and How to Combat Them

A lot of attention right now is going into the question of flattening the curve of global COVID-19 infection -- and this is exactly right.   I've been trying to do my own part here, via organizing the COVIDathon blockchain-AI-against-COVID-19 hackathon, and working with  my SingularityNET colleagues on using some of our AI code for simulating COVID-19 spread and analyzing related biology.

It's also important, though, to think about the other side of the curve -- what happens once the virus starts to gradually recede into the background, and life resumes some variation of "normal."    How will things be different after COVID-19?  Which of the unusual things happening now in the midst of the pandemic are likely to continue to have impact in the post-pandemic world?

TL;DR it seems the answer is: Barring something unusual and countervailing happening, the impact of the pandemic will be the rich getting richer, the poor getting poorer, and Big Tech and Big Government getting more access to diverse personal data and more skill at mining it effectively.

Potentially these effects could be palliated by rolling out decentralized blockchain-based technologies for managing aspects of the pandemic and the pandemic-era economy.   But it appears likely that, even if we succeed in getting a few such technologies built and adopted rapidly via COVIDathon and other efforts, by and large it will be centralized technologies, centralized government agencies and companies and the traditionally financialized economy that will dominate COVID-19 response.

A more open question is whether, when the next pandemic or other global crisis rolls around, decentralized tech will be ready to play a major role.   Can COVID-19 and its impacts on society, economy and industry serve as a wake-up call regarding the risks global crises pose on multiple fronts including data sovereignty and economic fairness?   Will this wake-up call be loud enough to rouse a large open-source development community into action regarding the creation of decentralized, secure and democratically controlled technologies for doing things like, say, managing uploaded personal medical data ... tracking and predicting spread of epidemics ... carrying out precision medicine analytics on clinical trials
 ...  and assessing lifestyle choices in the light of current medical realities and practicalities like weather and transportation?

Let's run through the probable future in more detail.   Social distancing and travel restrictions are most likely to cause the virus's spread to slow as 2020 progresses; and then before too long effective antiviral compounds or cocktails will be available.   Sometime in 2021, most likely, COVID-19 vaccines will hit the market; and then this virus currently wreaking so much havoc will be relegated to a status much like that of the lowly flu.

In the meantime, though lots of low-wage service workers are getting laid off ... and many will not get re-hired, as many businesses will choose to rebuild in other ways after the pandemic fades (automation, anyone?).  For instance, many of the people who are now ordering groceries for home delivery for the first time, will continue doing this a lot after COVID-19 is gone.   Resulting in fewer jobs for supermarket cashiers and other staff.   The same sort

At the same time, savvy investment funds are right now buying up every valuable asset they can at bargain prices -- so that after the pandemic fades they will own an even larger percentage of the planet

And the techlash is already fading into the dim recesses of history along with net neutrality -- as everyone grows increasingly attached to Amazon, Netflix, Google etc. while trapped in their homes using the Internet for everything.  

Big Tech has been underhandedly striving to gather as much medical data as possible, for years now -- e.g. Google Deep Mind's series of sweetheart deals with the British health system to garner access to peoples' medical records; or Project Nightingale which saw Google quietly capture 50 million Americans medical records.  Gathering medical data from a wide population with a view toward pandemic-related analysis and prediction is absolute golden for Big Tech.   This data and the pipelines that bring it their way will continue to yield value for these companies and their government partners long after COVID-19 has been reduced to the level of one more irritating seasonal infection.

As everyone becomes increasingly fearful for the lives of their elderly friends and relations, centralized monitoring of everybody's location and movements and physiological data is increasingly taken as a Good Thing.   Today uploaded temperature readings from a million+ wireless digital thermometers are letting us track the spread of COVID-19 around the US.    Stanford researchers have also shown that, by using AI anomaly detection on data from heart-rate variability, body temperature and pulse oximetry , one can identify a person is sick even before they show any symptoms. 

But then what happens when it becomes standard for your smartwatch, smartphone and fitness tracker to upload your data to Big Tech and Big Government so they can track and analyze disease spread?   Do you really trust these corporate and governmental entities not to use this data for other purposes -- and not to find ways to quietly keep collecting and utilizing similar data?   Edward Snowden has recently gone on record that, no, he does not.  As you may have guessed,  I don't either.

Yet the UK is already going directly down this path, with a governmental software app that detects and tracks nearby COVID-19 sufferers.  Completely harmless, extremely helpful -- until the same tech and organizational set up is used to track other things of interest to the ruling politicos and their business and military allies.

Big Brother is watching your heart rate, your temperature and your blood oxygen level -- better be sure your heart doesn't pound too much when you walk past that political demonstration, or your credit rating's going way down!!

Global monitoring of human movement and human physiology can do wonders for optimizing global health, during a pandemic and otherwise -- but it should be done with decentralized, secure tools.   Otherwise one is placing great trust in the entities that are gathering and utilizing this data -- not only to do helpful things with it in the pandemic, but not to leverage this data and related data-gathering capabilities later in the interest of goals different from that of global human benefit.

At the moment most decentralized networks and associated software tools are still in fairly early states of development -- so to combat COVID-19 fast we are understandably relying on centralized methods.   But this will not be the last pandemic nor the last acute, unprecedented global crisis that humanity faces.   It is important work so that for the next such situation that arises, decentralized frameworks will be fully prepared to play a leading role in helping humanity cope.

Otherwise, each successive crisis will serve to concentrate more and more wealth and power in the hands of a small elite -- which is not at all the best way to create a beneficial future for humanity and its technological children.

Friday, April 10, 2020

Can We "Discover" Semantic Primitives for Commonsense and Math via Semantic Relation Extraction from Corpora?

Once more wild-ish train of thought completely unrelated to anything immediately practical … I was thinking about Chalmers’ idea from Constructing the World  that the notion of universal semantic primitives underlying all human concepts might be rendered sensible by use of intensional logic … i.e. extensionally reducing all concepts to combinations of a few dozen primitives [plus raw perception/action primitives] is doomed to fail (as shown by eons of pedantic pickery in the analytical philosophy literature) but doing the reduction intensionally seems to basically work….

But in his book he argues why this is the case and gives lots of examples but doesn’t fully perform the reduction as that’s too big a job (there are a lot of concepts to intensionally reduce…)

So it occurred to me if we managed to do decent semantic-relation-extraction from large NL corpora, then if Chalmers is right, there would be a set of a few dozen concepts such that doing intensional-logic operations to combine them (plus perception/action primitives) would yield close approximations (small Intensional Difference) from any given concepts

In vector embedding space, it might mean that any concept can be expressed fairly closely via a combination of the embedding vectors from a few dozen concepts, using combinatory operators like vector sum and pointwise min …

As I recall it the intensional-combination operators used in Chalmer’s philosophical arguments don’t involve so much advanced quantifier-munging so basic fuzzy-propositional-logic operators might do it…

Now if we cross-correlate this with Lakoff and Nunez’s thoughts in “Where  Mathematics Comes From?” — where they argue that  math theorem proving is done largely by unconscious analogy to reasoning about everyday physical situations — then we get the idea that morphisms from common-sense domains to abstract domains guide math theorem-proving, and that these potential generators of the algebra of commonsense concepts, can be mapped into abstract math-patterns (e.g. math-domain-independent proof strategies/tactics) that serve as generators of proofs for human-friendly mathematics….

Which led me to wonder if one could form an interesting corpus from videos of math profs going thru proofs online at the whiteboard.  One would then capture the verbal explanations along with proofs, hopefully capturing some of the commonsense intuitions/analogies behind the proof steps… from such a corpus one might be able to mine some of the correspondences Lakoff and Nunez wrote about….

There won’t be a seq2seq model mapping mathematicians’ mutterings into full Mizar proofs, but there could be useful guidance for pruning theorem-prover activity in models of the conceptual flow in mathematician’s proof-accompanying verbalizations.... 

Can we direct proofs from premises to conclusions, via drawing a vector V pointing from the embedding vector for the premise to the embedding vector for the conclusion, and using say the midpoint of V as a subgoal for getting from premise to the conclusion ... and where the basis for the vector space is the primitive mathematical concepts that are the Lakoff-and-Nunez-ian morphic image of primitive everyday-human-world concepts?

Alas making this sort of thing work is 8 billion times harder than conceptualizing it.   But conceptualization is a start ;)

Logical Inference Control via Quantum Partial Search — Maybe


While running SingularityNET and thinking about next-generation OpenCog and helping Ruiting with our charming little maniac Qorxi are taking up most of my time, I can’t help thinking here and there about quantum AI …

Quantum computing is moving toward practical realization — it’s still got a long way to go, but clearly the Schrodinger’s cat is out of the bag … the time when every server has a QPU alongside its GPU is now something quite concrete to foresee…

So I’m thinking a bit about how to use quantum partial search  (Grover's algorithm on a chunked database) to speed up backward-chaining logical inference dramatically. 

Suppose we are searching in some set S for those x in S that satisfying property P.   (The interesting case is where S is known implicitly rather than explicitly listed.)

Suppose we have some distribution f over S, which assigns a probability value f(x) to each element of S — interpretable as the prior probability that x will satisfy P

Suppose we divide S into bin S1, S2,…, Sk, so that the expected number of x that satisfy P is the same for each Si  (in which case the bins containing higher-probability x will have smaller cardinality) …

Then we can use quantum partial search to find a bin that contains x that satisfies P. 

If the size of S is N and the number of items per bin were constant b, then the time required is (pi/4) sqrt(N/b).   Time required increases with uneven-ness of bins (which means non-uniformity of distribution f, in this setup).

In an inference context, for instance, suppose one has a desired conclusion C and n premises Pi.   One wants to know for what combinations Pi * Pj ==> C.  One then constructs an N = n^2 dimensional Hilbert space, which has a basis vector corresponding to each combination (i,j).  One call to the quantum oracle can tell us whether Pi * Pj ==> C for some particular (i,j) (note though that this call must be implementable as a unitary transformation on the Hilbert space — but following the standard math of quantum circuits it can be set up this way). 

Using straight Grover’s algorithm, one can then find which Pi * Pj ==> C in sqrt(N) time.

If one wants to leverage the prior distribution, one can find which bin(s) the premise-pairs so that {Pi * Pj ==> C } live in, in time (pi/4)  sqrt(c*N/b) where c>1 is the correction for the non-uniformity of the prior and b is the average number of pairs per bin.

With a uniform prior, one is finding log(N/b) bits of information about what the premises are (and narrowing down to a search over b items).

With a non-uniform prior, one is still narrowing down *on average* to a search over b items, so is still finding log(N/b) bits on average about where the items are.

This could be useful e.g. in a hybrid classical-quantum context, where the quantum computer is used to narrow down a very large number of options to a more modest number, which are then searched through using classical methods.

It could also be useful as a heuristic layer on top of Grover’s algorithm.  I.e., one could do this prior-probability-guided search to narrow things down to a bin, and then do full-on Grover’s algorithm within the bin selected.

Constructing the bins in an artful way, so that e.g. bins tend to have similar entities in them, could potentially  make things work even faster.   Specifically, if the elements in each bin tend to be similar to each other, then the bin may effectively be a lower-dimensional subspace, which means the algorithm will work faster on that bin.   So there would be advantage to clustering the items being searched before constructing the bins.   If items that are clustered together tend to have similar prior probabilities, then the bins would tend to be lower-dimensional and things would tend to go faster.

Grover’s Algorithm and Natural Gradients

Now if we want to go even deeper down the rabbit hole — this funky paper shows that the quantum search problem reduces to finding optimal geodesic paths that minimize lengths on a manifold of pure density matrices with a metric structure defined by the Wigner-Yanase metric tensor …

Fisher metric geeks will simultaneously drop their jaws in amazement, and nod and grin in a self-satisfied way

So what we see here is that Grover’s algorithm is actually just following the natural gradient ... well sort of…

Putting some pieces together … We have seen that partial quantum search (Grover’s algorithm over a chunked database) can be set up to provide rapid (on average) approximate location of an item in an implicit database, where the average is taken relative to a given probability distribution (and the distribution is used to guide the chunking of the database)….

Well then — this partial quantum search on a database chunked-according-to-a-certain-distribution, should presumably correspond to following the natural gradient on a manifold of pure density matrices with a metric structure conditioned by that same distribution…

Which — if it actually holds up — is not really all that deep, just connecting some (quantum) dots, but sorta points in a nice quantum AI direction…

Post-Script: Wow, This Stuff May Be Implementable?

I was amazed/ amused to note some small-scale practical implementations of Grover’s Algorithm using Orbital Angular Momentum

It’s all classical optics except preparation of the initial state (which is where the Oracle gets packed).

Could this be how our quantum-accelerated logical inference control is going to work?   Quantum optics plugins for the server … or the cortex?

Monday, July 22, 2019

Toward an Abstract Energetics of Computational Processes (in Brains, Minds, Physics and Beyond)

Given the successes of energy-based formalisms in physics, it is natural to want to extend them into other domains like computation and cognition.

In this vein: My aim here is to sketch what I think is a workable approach to an energetics of computational processes (construed very broadly).  

By this I mean: I will explain how one can articulate highly general principles of the dynamics of computational processes, that take a similar form to physics principles such as the stationary action principle (which often takes the form of "least action") and the Second Law of Thermodynamics (the principle of entropy non-decrease).

Why am I interested in this topic?   Two related reasons, actually.

First, I would like to create a "General Theory of General Intelligence" -- or to be more precise, a general theory of what kinds of systems can display what levels of general intelligence in what environments given realistically limited (space, time and energy) resources.   Marcus Hutter's Universal AI theory is great but it doesn't say much about general intelligence under realistic resource assumptions, most of its power is limited to the case of AI systems with unrealistically  massive processing power.   I have published some ideas on this before -- e.g. formalizing Cognitive Synergy in terms of category theory, and articulating the Embodied Communication Prior in regard to which human-like agents attempt to be intelligent -- but nothing remotely near fully satisfying.  So I'm searching for new directions.

Second, I would like to come up with a real scientific theory of psi phenomena.  I am inclined toward what I call "euryphysical" theories -- i.e. theories that involve embedding our 4D spacetime continuum in a larger space (which could be a higher dimensional space or could be a non-dimensional topological space of some sort).   However, this begs the question of what this large space is like -- what rules govern "dynamics" in this space?   In my paper on Euryphysics, I give some rough ideas in this direction, but again nothing fully satisfying.

It would be nice if mind dynamics -- both in a traditional AI setting and in a more out-there euryphysical setting -- could be modeled on dynamical theories in physics, which are based on ideas like stationary action.   After all, if as Peirce said "matter is just mind hide-bound with habit" then perhaps the laws of matter are in some way simplifications or specializations of the laws of mind -- and  maybe there are laws of mind with roughly analogous form to some of the current laws of physics.

A Few Comments on Friston's Free Energy Ideas

Friston's "free energy principle" represents one well-known effort in the direction of modeling cognition using physics-ish principles.  It seems to me that Friston's ideas have some fundamental shortcomings -- but reviewing these shortcomings has some value for understanding how to take a more workable approach.

I should clarify that my own thinking described in this blog post was not inspired by Friston's thinking to any degree, but more so by long-ago reading in the systems-theory literature -- i.e. reading stuff like Ilya Prigogine's Order out of Chaos] and Eric Jantsch's The Self-Organizing Universe  and Hermann Haken's Synergetics.    These authors represented a tradition within the complex-systems research community, of using far-from-equilibrium thermodynamics as a guide for thinking about life, the universe and everything.  

Friston's "free energy principle" seems to have a somewhat similar conceptual orientation, but confusingly to me, doesn't seem to incorporate the lessons of far-from-equilibrium thermodynamics that thoroughly, being based more on equilibrium-thermodynamics-ish ideas.  

I haven't read everything Friston has written, but have skimmed various papers of his over the years, and recently looked at the much-discussed papers The Markov blankets of life: autonomy, active inference and the free energy principle

My general confusion about Friston's ideas is largely the same as that expressed by the authors of blog posts such as

As the latter post notes, regarding perception, Friston basically posits that neural and cognitive systems are engaged with trying to model the world they live in, and do so by looking for models with maximum probability conditioned on the data they've observed.   This is a useful but not adventurous perceptive, and one can formulate it in terms of trying to find models with  minimum KL-divergence to reality, which is one among many ways to describe Bayesian inference ... and which can be mathematically viewed as attempting to minimize a certain "free energy" function.

Friston then attempts to extend this principle to action via a notion of "active inference", and this is where things get dodgier.   As the above-linked "Markov Blankets" paper puts it,

"Active inference is a cornerstone of the free energy principle. This principle states that for organisms to maintain their integrity they must minimize variational free energy.  Variational free energy bounds surprise because the former can be shown to be either greater than or equal to the latter. It follows that any organism that minimizes free energy thereby reduces surprise—which is the same as saying that such an organism maximizes evidence for its own model, i.e. its own existence


This interpretation means that changing internal states is equivalent to inferring the most probable, hidden causes of sensory signals in terms of expectations about states of the environment


[A] biological system must possess a generative model with temporal depth, which, in turn, implies that it can sample among different options and select the option that has the greatest (expected) evidence or least (expected) free energy. The options sampled from are intuitively probabilistic and future oriented. Hence, living systems are able to ‘free’ themselves from their proximal conditions by making inferences about probabilistic future states and acting so as to minimize the expected surprise (i.e. uncertainty) associated with those possible future states. This capacity connects biological qua homeostatic systems with autonomy, as the latter denotes an organism’s capacity to regulate its internal milieu in the face of an ever-changing environment. This means that if a system is autonomous it must also be adaptive, where adaptivity refers to an ability to operate differentially in certain circumstances.


The key difference between mere and adaptive active inference rests upon selecting among different actions based upon deep (temporal) generative models that minimize the free energy expected under different courses of action.

This suggests that living systems can transcend their immediate present state and work towards occupying states with a free energy minimum."

If you are a math/physics oriented person and find the above quotes frustratingly vague, unfortunately you will find that the rest of the paper is equally vague on the confusing points, and Friston's other papers are also.   

What it sounds like to me (doing some "active inference" myself to try to understand what the paper is trying to say) is that active inference is being portrayed as a process by which cognitive systems take actions aimed at putting themselves in situations that will be minimally surprising, i.e. in which they will have the most accurate models of reality.    If taken literally this cannot be true, as it would predict that intelligent systems systematically seek simpler situations they can model better -- which is obviously not a full description of human motivation, for instance.   We do have a motivation to put ourselves in comprehensible, accurately model-able situations -- but we also have other motivations, such as the desire to perceive novelty and to challenge ourselves, which sometimes contradict our will to have a comprehensible environment.

The main thing that jumps out at me when reading what Friston and colleagues write about active inference is that it's too much about states and not enough about paths.   To model far-from-equilibrium thermodynamics using energy-based formalisms, one needs to think about paths and path entropies and such, not just about things like " work[ing] towards occupying states with a free energy minimum."    Instead of thinking about ideas like " selecting among different actions based upon deep (temporal) generative models that minimize the free energy expected under different courses of action." in terms of states with free energy  minimum, one needs to be thinking about action selection in terms of stationarity of action functions evaluated along multiple paths.
Energetics for Far-From-Equilibrium Thermodynamics

It seems clear that equilibrium thermodynamics isn’t really what we want to use as a guide for cognitive information processing.  Fortunately, the recent thermodynamics literature contains some quite interesting results regarding path entropy in far-from-equilibrium thermodynamics.

David Rogers and Susan Rempe in A First and Second Law for Nonequilibrium Thermodynamics: Maximum Entropy Derivation of the Fluctuation-Dissipation Theorem and Entropy Production Functionals" describe explicitly the far from equilibrium “path free energy”, but only for the case of processes with short memory, i.e. state at time i+1 depends on state i but not earlier ones (which is often fine but not totally general). 

The following table from Rogers and Rempe summarizes some key points concisely.

Conceptually, the key point is that we need to think not about the entropy of a state, but about the "caliber" of a path -- a normalization of the number of ways that path can be realized.   This then leads to the notion of the free energy of a certain path.    

It follows from this body of work that ideas like "free energy minimization" need to be re-thought dynamically rather than statically.   One needs to think about systems as following paths with differential probability based on the corresponding path free energies.    This is in line with the "Maximum Caliber principle"  which is a generalization of the Maximum Entropy principle to dynamical systems (both first proposed in clear form by E.T. Jaynes, though Maximum Entropy has been more widely developed than Maximum Caliber so far).

Extending these notions further, Diego Gonzalez outlines a Hamiltonian formalism that is equivalent to path entropy maximization, building on math from his earlier paper  Inference of trajectories over a time-dependent phase space distribution.

Action Selection and Active Inference

Harking back to Friston for a moment, it follows that the dynamics of an intelligent system, should be viewed, not as an attempt by an intelligent system to find a state with minimum free energy or surprisingness etc., but rather as a process of a system evolving dynamically along paths chosen probabilistically to have stationary path free energy.  

But  of course, this would be just as true for an unintelligent system as for an intelligent system -- it's not a principle of intelligence but just a restatement of how physics works (in far from equilibrium cases; in equilibrium cases one can collapse paths to states).   

If we want to say something unique about intelligent systems in this context, we can look at the goals that an intelligent system is trying to achieve.   We may say that, along each potential path of the system's evolution, its various goals will be achieved to a certain degree.   The system then has can be viewed to have a certain utility distribution across paths -- some paths are more desirable to it than others.   A guiding principle of action selection would then be: To take an action A so that, conditioned on action A, the predicted probability distribution across paths is as close as possible to the distribution implied by the system's goals.

This principle of action selection can be formalized as KL-divergence minimization if one wishes, and in that sense it can be formulated as a "free energy minimization" principle.   But it's a "free energy" defined across ensembles of paths, not across states.

A side note is, it's important to understand that the desirability of a path to an intelligent system need not be expressible as the expected future utility at all moments of time along that path.   The desirability of a path may be some more holistic function of everything that happens along that path.    Considering only expected utility as a form of goal leads to various pathologies related to wireheading, as I argued in a long-ago blog post on ultimate orgasms and such.

Algorithmic Thermodynamics

Now let's dig a little deeper.   Can we apply these same ideas beyond the realm of physics, to more general types of processes that change over time?

I am inspired by a general Whiteheadean notion of procesess as fundamental things.   However, to keep things concrete, for now I'm going to provisionally assume that the "processes" involved can be formulated as computer programs, in some standard Turing-equivalent framework, or maybe a quantum-computing framework.   I think the same ideas actually apply more broadly, but -- one step at a time...

Let us start with Kohtaro Tadaki's truly beautiful, simple, elegant paper titled A statistical mechanical interpretation of algorithmic information theory  

Section 6 of Tadaki outlines a majorly aesthetic, obvious-in-hindsight parallel between algorithmic information theory and equilibrium thermodynamics.   There is seen to be a natural mapping between temperature in thermodynamics and compression ratio in algorithmic information theory.   A natural notion of "algorithmic free energy"  is formulated, as a sort of weighted program-length over all possible computer programs (where the weights depend on the temperature).

The following table (drawn from Tadaki's presentation here) summarizes the key  mappings in Tadaki's theory

To ground the mappings he outlines, Tadaki gives a  simple statistical mechanical interpretation to algorithmic information theory.   He models an optimal computer as decoding equipment at the receiving end of a noiseless binary communication channel.   In this context, he regards programs for this computer as codewords (finite binary strings) and regards computation results (also finite binary strings) as decoded “symbols.”    For simplicity he assumes that the infinite binary string sent through the channel -- constituting a series of codewords in a prefix-free code is generated by infinitely repeated tosses of a fair coin.   Based on this simple reductive model, Tadaki formulates computation-theoretic analogues to core constructs of traditional equilibrium thermodynamics.  

Now let's start putting some pieces together.

Perhaps the most useful observation I will make in this blog post is:   It seems one could port the path-entropy based treatment of far-from-equilibrium thermodynamics (as seen in the papers I've linked above) to Tadaki's algorithmic-information context, by looking at sources emitting bits that are not independent of each other but rather have some probabilistic dependencies..

By doing so, one would obtain an “algorithmic energy” function that measures the energy of an algorithmic process over a period of time -- without assuming that it’s a memoryless process like Tadaki does in his paper.

To get this to work, so far as I can limn without doing all the math (which I don't have time for at the moment, alas), one needs to assume that the knowledge one has of the dependencies among the bits produced by the process is given the form of expectations…  e.g. that we know the average value of f_k(x_{i+1}, x_i} for various observables f_k ….  Plus one needs to make some other slightly funny assumptions that are probably replaceable (the paper assumes “the number of possible transitions does not depend on the starting point”… but I wonder if this could be replaced by some assumption about causality…)

If I'm not mistaken, this should give us something like Friston’s free energy principle that actually works and has meaning….  I.e. we have a rigorous sense in which complex algorithmic systems are minimizing free energy.   The catch is that it’s an algorithmic path energy -- but hey...

More precisely, relative to an observer S who is observing a system S1 in a  certain way (by tabulating conditional probabilities of “how often some event of type A occurs at time T+s, given some event of type B occurred at time T”) … we may say the evolution of S1 in S’s perspective obeys an energy minimization principle, where energy is defined algorithmic-informationally (following my proposed, not-yet-fleshed-out non-equilibrium generalization of Tadaki’s approach)…

Into the Quantum Rabbit Hole...

Now that we've gone this far, we may as well plunge in a bit deeper, right?

Tadaki deals w/ classical computers but -- gesticulating only moderately wildly -- it seems one could generalize his approach to quantum computers OK.  

Then one is looking at series of qubits rather than bits, and instead of tabulating conditional probabilities one is tabulating amplitudes.  

The maximum entropy principle is replaced with the stationary quantropy principle and one still has the situation that: Relative to S who is observing S1 using some standard linear quantum observables, S1 may be said to evolve according to a stationary quantropy trajectory, where quantropy is here defined via generalizing the non-equilibrium generalization of Tadaki’s algorithmic-informational entropy via replacing the real values w/ complex values

So we may well get a kind of free-energy principle for quantum systems also.

If we want to model cognitive stuff using bits or qubits, then we have here a physics-ish theory of cognitive stuff….  Or at least a sketch of the start of one…

Out Toward the Eurycosm

One of the motivations for these investigations was some discussions on higher-dimensional and more broadly eurycosmic models of psi.  If there are non-physical dimensions that connect spatiotemporally distant entities, then what are the dynamical laws in these dimensions?   If we can model them as information dimensions, then maybe the dynamics should be modeled as I’m alluding here…

Physics dynamics should be recoverable as a special case of algorithmic-information dynamics where one adds special constraints.   I.e. the constraints posed by spatial structure and special relatively etc. should reflect themselves in the conditional probabilities observed btw various classes of bits or qubits.  

Then the linear symmetries of spacetime structure should mean that when you calculate maximum-path-algorithmic-information distributions relative to these physics constraints, you end up getting maximum-Shannon-path-entropy distributions.   Because macrophysics results from doing computing using ensembles of randomly chosen computer programs (i.e. chosen subject to given constraints…).

Suppose we want to model a eurycosm that works according to a principle like Peirce's "tendency to take habits" aka Smolin's Precedence Principle aka Sheldrake's morphic resonance?   Well then, one can assume that the probability distribution underlying the emanation of codewords in Tadaki's model obeys this sort of principle.   I.e., one can assume that the prior probability of a certain subsequence is higher if that subsequence, or another subsequence with some of the same patterns in that sequence, have occurred earlier in the overall sequence.   Of course there are many ways to modify Tadaki's precise computational model, and many ways to formalize the notion that "subsequences with historically more frequent patterns should be more frequent going forward."    But conceptually this is quite straightforward.

One is forced however to answer the following question.   Suppose we assume that the probability of pattern P occurring in a subsequence beginning at time T is in some way proportional to the intensity with which P has occurred as a pattern in the subsequence prior to time T.   What language of processes are we using to formalize the patterns P?   If -- in line with the framework I articulate in The Hidden Pattern and elsewhere -- we formalize a pattern P in X as a process P that produces X and is simpler than X -- what is the language in which patterns are expressed?    What is the implicit programming language of our corner of the eurycosm?  

For simplicity I have been following Tadaki's conventional Turing machine based computational models here -- with a brief gesture toward quantum computing -- but of course  the broad approach outlined here goes beyond these computing paradigms.   What if we ported Tadaki's ideas to series of bits emanated by, say, a hypercomputer like the Zeno Machine?   Then we don't just get a single infinite bit string as output, but a more complex ordinal construction with infinite bit strings of infinite bit strings etc. -- but the math could be worked.   If the size of a Zeno Machine program can be quantified by a single real number, then one can assess Zeno Machine programs as patterns in data, and one can define concepts like compression ratio and algorithmic entropy and energy.   The paradigm sketched here is not tied to a Turing Machine model of eurycosmic processes, though TMs are certainly easier for initial sketches and calculations than ZMs or even weirder things.

I have definitely raised more questions than I've answered in this long and winding blog post.   My goal has been to indicate a direction for research and thinking, one that seems not a huge leap from the current state of research in various fields, but perhaps dramatic in its utility and implications.