To follow this blog by email, give your address here...

Monday, July 22, 2019

Toward an Abstract Energetics of Computational Processes (in Brains, Minds, Physics and Beyond)


Given the successes of energy-based formalisms in physics, it is natural to want to extend them into other domains like computation and cognition.

In this vein: My aim here is to sketch what I think is a workable approach to an energetics of computational processes (construed very broadly).  

By this I mean: I will explain how one can articulate highly general principles of the dynamics of computational processes, that take a similar form to physics principles such as the stationary action principle (which often takes the form of "least action") and the Second Law of Thermodynamics (the principle of entropy non-decrease).

Why am I interested in this topic?   Two related reasons, actually.

First, I would like to create a "General Theory of General Intelligence" -- or to be more precise, a general theory of what kinds of systems can display what levels of general intelligence in what environments given realistically limited (space, time and energy) resources.   Marcus Hutter's Universal AI theory is great but it doesn't say much about general intelligence under realistic resource assumptions, most of its power is limited to the case of AI systems with unrealistically  massive processing power.   I have published some ideas on this before -- e.g. formalizing Cognitive Synergy in terms of category theory, and articulating the Embodied Communication Prior in regard to which human-like agents attempt to be intelligent -- but nothing remotely near fully satisfying.  So I'm searching for new directions.

Second, I would like to come up with a real scientific theory of psi phenomena.  I am inclined toward what I call "euryphysical" theories -- i.e. theories that involve embedding our 4D spacetime continuum in a larger space (which could be a higher dimensional space or could be a non-dimensional topological space of some sort).   However, this begs the question of what this large space is like -- what rules govern "dynamics" in this space?   In my paper on Euryphysics, I give some rough ideas in this direction, but again nothing fully satisfying.

It would be nice if mind dynamics -- both in a traditional AI setting and in a more out-there euryphysical setting -- could be modeled on dynamical theories in physics, which are based on ideas like stationary action.   After all, if as Peirce said "matter is just mind hide-bound with habit" then perhaps the laws of matter are in some way simplifications or specializations of the laws of mind -- and  maybe there are laws of mind with roughly analogous form to some of the current laws of physics.

A Few Comments on Friston's Free Energy Ideas

Friston's "free energy principle" represents one well-known effort in the direction of modeling cognition using physics-ish principles.  It seems to me that Friston's ideas have some fundamental shortcomings -- but reviewing these shortcomings has some value for understanding how to take a more workable approach.

I should clarify that my own thinking described in this blog post was not inspired by Friston's thinking to any degree, but more so by long-ago reading in the systems-theory literature -- i.e. reading stuff like Ilya Prigogine's Order out of Chaos] and Eric Jantsch's The Self-Organizing Universe  and Hermann Haken's Synergetics.    These authors represented a tradition within the complex-systems research community, of using far-from-equilibrium thermodynamics as a guide for thinking about life, the universe and everything.  

Friston's "free energy principle" seems to have a somewhat similar conceptual orientation, but confusingly to me, doesn't seem to incorporate the lessons of far-from-equilibrium thermodynamics that thoroughly, being based more on equilibrium-thermodynamics-ish ideas.  

I haven't read everything Friston has written, but have skimmed various papers of his over the years, and recently looked at the much-discussed papers The Markov blankets of life: autonomy, active inference and the free energy principle

My general confusion about Friston's ideas is largely the same as that expressed by the authors of blog posts such as

As the latter post notes, regarding perception, Friston basically posits that neural and cognitive systems are engaged with trying to model the world they live in, and do so by looking for models with maximum probability conditioned on the data they've observed.   This is a useful but not adventurous perceptive, and one can formulate it in terms of trying to find models with  minimum KL-divergence to reality, which is one among many ways to describe Bayesian inference ... and which can be mathematically viewed as attempting to minimize a certain "free energy" function.

Friston then attempts to extend this principle to action via a notion of "active inference", and this is where things get dodgier.   As the above-linked "Markov Blankets" paper puts it,

"Active inference is a cornerstone of the free energy principle. This principle states that for organisms to maintain their integrity they must minimize variational free energy.  Variational free energy bounds surprise because the former can be shown to be either greater than or equal to the latter. It follows that any organism that minimizes free energy thereby reduces surprise—which is the same as saying that such an organism maximizes evidence for its own model, i.e. its own existence

...

This interpretation means that changing internal states is equivalent to inferring the most probable, hidden causes of sensory signals in terms of expectations about states of the environment

...

[A] biological system must possess a generative model with temporal depth, which, in turn, implies that it can sample among different options and select the option that has the greatest (expected) evidence or least (expected) free energy. The options sampled from are intuitively probabilistic and future oriented. Hence, living systems are able to ‘free’ themselves from their proximal conditions by making inferences about probabilistic future states and acting so as to minimize the expected surprise (i.e. uncertainty) associated with those possible future states. This capacity connects biological qua homeostatic systems with autonomy, as the latter denotes an organism’s capacity to regulate its internal milieu in the face of an ever-changing environment. This means that if a system is autonomous it must also be adaptive, where adaptivity refers to an ability to operate differentially in certain circumstances.

...

The key difference between mere and adaptive active inference rests upon selecting among different actions based upon deep (temporal) generative models that minimize the free energy expected under different courses of action.

This suggests that living systems can transcend their immediate present state and work towards occupying states with a free energy minimum."

If you are a math/physics oriented person and find the above quotes frustratingly vague, unfortunately you will find that the rest of the paper is equally vague on the confusing points, and Friston's other papers are also.   

What it sounds like to me (doing some "active inference" myself to try to understand what the paper is trying to say) is that active inference is being portrayed as a process by which cognitive systems take actions aimed at putting themselves in situations that will be minimally surprising, i.e. in which they will have the most accurate models of reality.    If taken literally this cannot be true, as it would predict that intelligent systems systematically seek simpler situations they can model better -- which is obviously not a full description of human motivation, for instance.   We do have a motivation to put ourselves in comprehensible, accurately model-able situations -- but we also have other motivations, such as the desire to perceive novelty and to challenge ourselves, which sometimes contradict our will to have a comprehensible environment.

The main thing that jumps out at me when reading what Friston and colleagues write about active inference is that it's too much about states and not enough about paths.   To model far-from-equilibrium thermodynamics using energy-based formalisms, one needs to think about paths and path entropies and such, not just about things like " work[ing] towards occupying states with a free energy minimum."    Instead of thinking about ideas like " selecting among different actions based upon deep (temporal) generative models that minimize the free energy expected under different courses of action." in terms of states with free energy  minimum, one needs to be thinking about action selection in terms of stationarity of action functions evaluated along multiple paths.
 
Energetics for Far-From-Equilibrium Thermodynamics

It seems clear that equilibrium thermodynamics isn’t really what we want to use as a guide for cognitive information processing.  Fortunately, the recent thermodynamics literature contains some quite interesting results regarding path entropy in far-from-equilibrium thermodynamics.


David Rogers and Susan Rempe in A First and Second Law for Nonequilibrium Thermodynamics: Maximum Entropy Derivation of the Fluctuation-Dissipation Theorem and Entropy Production Functionals" describe explicitly the far from equilibrium “path free energy”, but only for the case of processes with short memory, i.e. state at time i+1 depends on state i but not earlier ones (which is often fine but not totally general). 

The following table from Rogers and Rempe summarizes some key points concisely.





Conceptually, the key point is that we need to think not about the entropy of a state, but about the "caliber" of a path -- a normalization of the number of ways that path can be realized.   This then leads to the notion of the free energy of a certain path.    

It follows from this body of work that ideas like "free energy minimization" need to be re-thought dynamically rather than statically.   One needs to think about systems as following paths with differential probability based on the corresponding path free energies.    This is in line with the "Maximum Caliber principle"  which is a generalization of the Maximum Entropy principle to dynamical systems (both first proposed in clear form by E.T. Jaynes, though Maximum Entropy has been more widely developed than Maximum Caliber so far).

Extending these notions further, Diego Gonzalez outlines a Hamiltonian formalism that is equivalent to path entropy maximization, building on math from his earlier paper  Inference of trajectories over a time-dependent phase space distribution.

Action Selection and Active Inference

Harking back to Friston for a moment, it follows that the dynamics of an intelligent system, should be viewed, not as an attempt by an intelligent system to find a state with minimum free energy or surprisingness etc., but rather as a process of a system evolving dynamically along paths chosen probabilistically to have stationary path free energy.  

But  of course, this would be just as true for an unintelligent system as for an intelligent system -- it's not a principle of intelligence but just a restatement of how physics works (in far from equilibrium cases; in equilibrium cases one can collapse paths to states).   

If we want to say something unique about intelligent systems in this context, we can look at the goals that an intelligent system is trying to achieve.   We may say that, along each potential path of the system's evolution, its various goals will be achieved to a certain degree.   The system then has can be viewed to have a certain utility distribution across paths -- some paths are more desirable to it than others.   A guiding principle of action selection would then be: To take an action A so that, conditioned on action A, the predicted probability distribution across paths is as close as possible to the distribution implied by the system's goals.

This principle of action selection can be formalized as KL-divergence minimization if one wishes, and in that sense it can be formulated as a "free energy minimization" principle.   But it's a "free energy" defined across ensembles of paths, not across states.

A side note is, it's important to understand that the desirability of a path to an intelligent system need not be expressible as the expected future utility at all moments of time along that path.   The desirability of a path may be some more holistic function of everything that happens along that path.    Considering only expected utility as a form of goal leads to various pathologies related to wireheading, as I argued in a long-ago blog post on ultimate orgasms and such.


Algorithmic Thermodynamics

Now let's dig a little deeper.   Can we apply these same ideas beyond the realm of physics, to more general types of processes that change over time?

I am inspired by a general Whiteheadean notion of procesess as fundamental things.   However, to keep things concrete, for now I'm going to provisionally assume that the "processes" involved can be formulated as computer programs, in some standard Turing-equivalent framework, or maybe a quantum-computing framework.   I think the same ideas actually apply more broadly, but -- one step at a time...

Let us start with Kohtaro Tadaki's truly beautiful, simple, elegant paper titled A statistical mechanical interpretation of algorithmic information theory  

Section 6 of Tadaki outlines a majorly aesthetic, obvious-in-hindsight parallel between algorithmic information theory and equilibrium thermodynamics.   There is seen to be a natural mapping between temperature in thermodynamics and compression ratio in algorithmic information theory.   A natural notion of "algorithmic free energy"  is formulated, as a sort of weighted program-length over all possible computer programs (where the weights depend on the temperature).

The following table (drawn from Tadaki's presentation here) summarizes the key  mappings in Tadaki's theory





To ground the mappings he outlines, Tadaki gives a  simple statistical mechanical interpretation to algorithmic information theory.   He models an optimal computer as decoding equipment at the receiving end of a noiseless binary communication channel.   In this context, he regards programs for this computer as codewords (finite binary strings) and regards computation results (also finite binary strings) as decoded “symbols.”    For simplicity he assumes that the infinite binary string sent through the channel -- constituting a series of codewords in a prefix-free code is generated by infinitely repeated tosses of a fair coin.   Based on this simple reductive model, Tadaki formulates computation-theoretic analogues to core constructs of traditional equilibrium thermodynamics.  

Now let's start putting some pieces together.

Perhaps the most useful observation I will make in this blog post is:   It seems one could port the path-entropy based treatment of far-from-equilibrium thermodynamics (as seen in the papers I've linked above) to Tadaki's algorithmic-information context, by looking at sources emitting bits that are not independent of each other but rather have some probabilistic dependencies..

By doing so, one would obtain an “algorithmic energy” function that measures the energy of an algorithmic process over a period of time -- without assuming that it’s a memoryless process like Tadaki does in his paper.

To get this to work, so far as I can limn without doing all the math (which I don't have time for at the moment, alas), one needs to assume that the knowledge one has of the dependencies among the bits produced by the process is given the form of expectations…  e.g. that we know the average value of f_k(x_{i+1}, x_i} for various observables f_k ….  Plus one needs to make some other slightly funny assumptions that are probably replaceable (the paper assumes “the number of possible transitions does not depend on the starting point”… but I wonder if this could be replaced by some assumption about causality…)

If I'm not mistaken, this should give us something like Friston’s free energy principle that actually works and has meaning….  I.e. we have a rigorous sense in which complex algorithmic systems are minimizing free energy.   The catch is that it’s an algorithmic path energy -- but hey...

More precisely, relative to an observer S who is observing a system S1 in a  certain way (by tabulating conditional probabilities of “how often some event of type A occurs at time T+s, given some event of type B occurred at time T”) … we may say the evolution of S1 in S’s perspective obeys an energy minimization principle, where energy is defined algorithmic-informationally (following my proposed, not-yet-fleshed-out non-equilibrium generalization of Tadaki’s approach)…

Into the Quantum Rabbit Hole...

Now that we've gone this far, we may as well plunge in a bit deeper, right?

Tadaki deals w/ classical computers but -- gesticulating only moderately wildly -- it seems one could generalize his approach to quantum computers OK.  

Then one is looking at series of qubits rather than bits, and instead of tabulating conditional probabilities one is tabulating amplitudes.  

The maximum entropy principle is replaced with the stationary quantropy principle and one still has the situation that: Relative to S who is observing S1 using some standard linear quantum observables, S1 may be said to evolve according to a stationary quantropy trajectory, where quantropy is here defined via generalizing the non-equilibrium generalization of Tadaki’s algorithmic-informational entropy via replacing the real values w/ complex values

So we may well get a kind of free-energy principle for quantum systems also.

If we want to model cognitive stuff using bits or qubits, then we have here a physics-ish theory of cognitive stuff….  Or at least a sketch of the start of one…

Out Toward the Eurycosm

One of the motivations for these investigations was some discussions on higher-dimensional and more broadly eurycosmic models of psi.  If there are non-physical dimensions that connect spatiotemporally distant entities, then what are the dynamical laws in these dimensions?   If we can model them as information dimensions, then maybe the dynamics should be modeled as I’m alluding here…

Physics dynamics should be recoverable as a special case of algorithmic-information dynamics where one adds special constraints.   I.e. the constraints posed by spatial structure and special relatively etc. should reflect themselves in the conditional probabilities observed btw various classes of bits or qubits.  

Then the linear symmetries of spacetime structure should mean that when you calculate maximum-path-algorithmic-information distributions relative to these physics constraints, you end up getting maximum-Shannon-path-entropy distributions.   Because macrophysics results from doing computing using ensembles of randomly chosen computer programs (i.e. chosen subject to given constraints…).

Suppose we want to model a eurycosm that works according to a principle like Peirce's "tendency to take habits" aka Smolin's Precedence Principle aka Sheldrake's morphic resonance?   Well then, one can assume that the probability distribution underlying the emanation of codewords in Tadaki's model obeys this sort of principle.   I.e., one can assume that the prior probability of a certain subsequence is higher if that subsequence, or another subsequence with some of the same patterns in that sequence, have occurred earlier in the overall sequence.   Of course there are many ways to modify Tadaki's precise computational model, and many ways to formalize the notion that "subsequences with historically more frequent patterns should be more frequent going forward."    But conceptually this is quite straightforward.

One is forced however to answer the following question.   Suppose we assume that the probability of pattern P occurring in a subsequence beginning at time T is in some way proportional to the intensity with which P has occurred as a pattern in the subsequence prior to time T.   What language of processes are we using to formalize the patterns P?   If -- in line with the framework I articulate in The Hidden Pattern and elsewhere -- we formalize a pattern P in X as a process P that produces X and is simpler than X -- what is the language in which patterns are expressed?    What is the implicit programming language of our corner of the eurycosm?  

For simplicity I have been following Tadaki's conventional Turing machine based computational models here -- with a brief gesture toward quantum computing -- but of course  the broad approach outlined here goes beyond these computing paradigms.   What if we ported Tadaki's ideas to series of bits emanated by, say, a hypercomputer like the Zeno Machine?   Then we don't just get a single infinite bit string as output, but a more complex ordinal construction with infinite bit strings of infinite bit strings etc. -- but the math could be worked.   If the size of a Zeno Machine program can be quantified by a single real number, then one can assess Zeno Machine programs as patterns in data, and one can define concepts like compression ratio and algorithmic entropy and energy.   The paradigm sketched here is not tied to a Turing Machine model of eurycosmic processes, though TMs are certainly easier for initial sketches and calculations than ZMs or even weirder things.

I have definitely raised more questions than I've answered in this long and winding blog post.   My goal has been to indicate a direction for research and thinking, one that seems not a huge leap from the current state of research in various fields, but perhaps dramatic in its utility and implications.


 




5 comments:

Unknown said...

There may be thousands of processes that have to work together to create a functional general intelligence. Each process has to be able to share its data with every other process. This leads to a hierarchical series of processes that suggest probable conclusions. It may take a while to put all of this together.

Blogger said...

I think the facts and figures is the good try to explain, but I could not understand the last part of the articles. Overall to good information

Superior said...

All of the questions you raised in this article provide attraction to further research. Good Stuff buddy

Tim Tyler said...

Re: "state at time i+1 depends on state i but not earlier ones" - probably general enough to cover the laws of physics in our universe. The trick here is to note that it S(i) depends on two previous time intervals then you can make another model in which S(i) depends only on the previous time interval simply by doubling the size of the state.

Bruce David said...

Hello everyone..Welcome to my free masterclass strategy where i teach experience and inexperience traders the secret behind a successful trade.And how to be profitable in trading I will also teach you how to make a profit of $12,000 USD weekly and how to get back all your lost funds feel free to email me on(brucedavid004@gmail.com) or whataspp number is +22999290178













Hello everyone..Welcome to my free masterclass strategy where i teach experience and inexperience traders the secret behind a successful trade.And how to be profitable in trading I will also teach you how to make a profit of $12,000 USD weekly and how to get back all your lost funds feel free to email me on(brucedavid004@gmail.com) or whataspp number is +22999290178