Given the successes of energy-based
formalisms in physics, it is natural to want to extend them into other domains
like computation and cognition.
In this vein: My aim here is to sketch what I
think is a workable approach to an energetics of computational processes
(construed very broadly).
By this I mean: I will explain how one can
articulate highly general principles of the dynamics of computational
processes, that take a similar form to physics principles such as the
stationary action principle (which often takes the form of "least
action") and the Second Law of Thermodynamics (the principle of entropy
non-decrease).
Why am I interested in this topic? Two related reasons, actually.
First, I would like to create a "General
Theory of General Intelligence" -- or to be more precise, a general theory
of what kinds of systems can display what levels of general intelligence in
what environments given realistically limited (space, time and energy)
resources. Marcus Hutter's Universal AI
theory is great but it doesn't say much about general intelligence under
realistic resource assumptions, most of its power is limited to the case of AI
systems with unrealistically massive
processing power. I have published some ideas on this before --
e.g. formalizing Cognitive Synergy in terms of category theory,
and articulating the Embodied Communication
Prior in regard to which human-like agents attempt to be intelligent
-- but nothing remotely near fully satisfying.
So I'm searching for new directions.
Second, I would like to come up with a real
scientific theory of psi phenomena. I am
inclined toward what I call "euryphysical" theories -- i.e. theories
that involve embedding our 4D spacetime continuum in a larger space (which
could be a higher dimensional space or could be a non-dimensional topological
space of some sort). However, this begs
the question of what this large space is like -- what rules govern
"dynamics" in this space? In my paper on Euryphysics, I
give some rough ideas in this direction, but again nothing fully satisfying.
It would be nice if mind dynamics -- both in
a traditional AI setting and in a more out-there euryphysical setting -- could
be modeled on dynamical theories in physics, which are based on ideas like
stationary action. After all, if as
Peirce said "matter is just mind hide-bound with habit" then perhaps
the laws of matter are in some way simplifications or specializations of the
laws of mind -- and maybe there are laws
of mind with roughly analogous form to some of the current laws of physics.
A Few Comments on Friston's Free Energy Ideas
Friston's "free energy principle"
represents one well-known effort in the direction of modeling cognition using
physics-ish principles. It seems to me
that Friston's ideas have some fundamental shortcomings -- but reviewing these
shortcomings has some value for understanding how to take a more workable
approach.
I should clarify that my own thinking
described in this blog post was not inspired by Friston's thinking to any
degree, but more so by long-ago reading in the systems-theory literature --
i.e. reading stuff like Ilya Prigogine's Order
out of Chaos] and Eric Jantsch's The
Self-Organizing Universe and Hermann
Haken's Synergetics. These authors represented a tradition
within the complex-systems research community, of using far-from-equilibrium
thermodynamics as a guide for thinking about life, the universe and
everything.
Friston's "free energy principle"
seems to have a somewhat similar conceptual orientation, but confusingly to me,
doesn't seem to incorporate the lessons of far-from-equilibrium thermodynamics
that thoroughly, being based more on equilibrium-thermodynamics-ish ideas.
I haven't read everything Friston has
written, but have skimmed various papers of his over the years, and recently
looked at the much-discussed papers The Markov
blankets of life: autonomy, active inference and the free energy principle
My general confusion about Friston's ideas is
largely the same as that expressed by the authors of blog posts such as
As the latter post notes, regarding
perception, Friston basically posits that neural and cognitive systems are
engaged with trying to model the world they live in, and do so by looking for
models with maximum probability conditioned on the data they've observed. This is a useful but not adventurous
perceptive, and one can formulate it in terms of trying to find models
with minimum KL-divergence to reality,
which is one among many ways to describe Bayesian inference ... and which can
be mathematically viewed as attempting to minimize a certain "free
energy" function.
Friston then attempts to extend this
principle to action via a notion of "active inference", and this is
where things get dodgier. As the
above-linked "Markov Blankets" paper puts it,
"Active
inference is a cornerstone of the free energy principle. This principle states
that for organisms to maintain their integrity they must minimize variational
free energy. Variational free energy bounds
surprise because the former can be shown to be either greater than or equal to
the latter. It follows that any organism that minimizes free energy thereby
reduces surprise—which is the same as saying that such an organism maximizes
evidence for its own model, i.e. its own existence
...
This
interpretation means that changing internal states is equivalent to inferring
the most probable, hidden causes of sensory signals in terms of expectations
about states of the environment
...
[A]
biological system must possess a generative model with temporal depth, which,
in turn, implies that it can sample among different options and select the
option that has the greatest (expected) evidence or least (expected) free
energy. The options sampled from are intuitively probabilistic and future
oriented. Hence, living systems are able to ‘free’ themselves from their
proximal conditions by making inferences about probabilistic future states and
acting so as to minimize the expected surprise (i.e. uncertainty) associated
with those possible future states. This capacity connects biological qua
homeostatic systems with autonomy, as the latter denotes an organism’s capacity
to regulate its internal milieu in the face of an ever-changing environment.
This means that if a system is autonomous it must also be adaptive, where
adaptivity refers to an ability to operate differentially in certain
circumstances.
...
The
key difference between mere and adaptive active inference rests upon selecting
among different actions based upon deep (temporal) generative models that
minimize the free energy expected under different courses of action.
This
suggests that living systems can transcend their immediate present state and
work towards occupying states with a free energy minimum."
If you are a math/physics oriented person
and find the above quotes frustratingly vague, unfortunately you will find that
the rest of the paper is equally vague on the confusing points, and Friston's
other papers are also.
What it sounds like to me (doing some
"active inference" myself to try to understand what the paper is
trying to say) is that active inference is being portrayed as a process by
which cognitive systems take actions aimed at putting themselves in situations
that will be minimally surprising, i.e. in which they will have the most
accurate models of reality. If taken
literally this cannot be true, as it would predict that intelligent systems
systematically seek simpler situations they can model better -- which is
obviously not a full description of human motivation, for instance. We do have a motivation to put ourselves in
comprehensible, accurately model-able situations -- but we also have other
motivations, such as the desire to perceive novelty and to challenge ourselves,
which sometimes contradict our will to have a comprehensible environment.
The main thing that jumps out at me
when reading what Friston and colleagues write about active inference is that
it's too much about states and not enough about paths. To model far-from-equilibrium thermodynamics
using energy-based formalisms, one needs to think about paths and path
entropies and such, not just about things like " work[ing] towards occupying states with a free energy minimum." Instead of thinking about ideas like " selecting among different actions based
upon deep (temporal) generative models that minimize the free energy expected
under different courses of action." in terms of states with free
energy minimum, one needs to be thinking
about action selection in terms of stationarity of action functions evaluated
along multiple paths.
Energetics for Far-From-Equilibrium Thermodynamics
It seems clear that equilibrium
thermodynamics isn’t really what we want to use as a guide for cognitive
information processing. Fortunately, the
recent thermodynamics literature contains some quite interesting results
regarding path entropy in far-from-equilibrium thermodynamics.
Abaimov's paper General
formalism of non-equilibrium statistical mechanics, path approach and
Raphael Chetrite and Hugo Touchette's paper Nonequilibrium
Microcanonical and Canonical Ensembles and Their Equivalence each tell part of the story.
David Rogers and Susan Rempe in A First and Second Law for
Nonequilibrium Thermodynamics: Maximum Entropy Derivation of the
Fluctuation-Dissipation Theorem and Entropy Production Functionals"
describe explicitly the far from equilibrium “path
free energy”, but only for the case of processes with short memory, i.e. state
at time i+1 depends on state i but not earlier ones (which is often fine but
not totally general).
The following table from Rogers and Rempe
summarizes some key points concisely.
Conceptually, the key point is that we need
to think not about the entropy of a state, but about the "caliber" of
a path -- a normalization of the number of ways that path can be realized. This then leads to the notion of the free
energy of a certain path.
It follows from this body of work that ideas
like "free energy minimization" need to be re-thought dynamically
rather than statically. One needs to
think about systems as following paths with differential probability based on
the corresponding path free energies.
This is in line with the "Maximum
Caliber principle" which is a
generalization of the Maximum Entropy principle to dynamical systems (both
first proposed in clear form by E.T. Jaynes, though Maximum Entropy has been
more widely developed than Maximum Caliber so far).
Extending these notions further, Diego
Gonzalez outlines a Hamiltonian
formalism that is equivalent to path entropy maximization, building on math
from his earlier paper Inference
of trajectories over a time-dependent phase space distribution.
Action Selection and Active Inference
Harking back to Friston for a moment, it
follows that the dynamics of an intelligent system, should be viewed, not as an
attempt by an intelligent system to find a state with minimum free energy or
surprisingness etc., but rather as a process of a system evolving dynamically
along paths chosen probabilistically to have stationary path free energy.
But of
course, this would be just as true for an unintelligent system as for an
intelligent system -- it's not a principle of intelligence but just a
restatement of how physics works (in far from equilibrium cases; in equilibrium
cases one can collapse paths to states).
If we want to say something unique about
intelligent systems in this context, we can look at the goals that an
intelligent system is trying to achieve.
We may say that, along each potential path of the system's evolution,
its various goals will be achieved to a certain degree. The system then has can be viewed to have a
certain utility distribution across paths -- some paths are more desirable to
it than others. A guiding principle of
action selection would then be: To take an action A so that, conditioned on
action A, the predicted probability distribution across paths is as close as
possible to the distribution implied by the system's goals.
This principle of action selection can be
formalized as KL-divergence minimization if one wishes, and in that sense it
can be formulated as a "free energy minimization" principle. But it's a "free energy" defined
across ensembles of paths, not across states.
A side note is, it's important to understand
that the desirability of a path to an intelligent system need not be
expressible as the expected future utility at all moments of time along that
path. The desirability of a path may be
some more holistic function of everything that happens along that path. Considering only expected utility as a form
of goal leads to various pathologies related to wireheading, as I argued in a
long-ago blog post on ultimate orgasms and such.
Algorithmic Thermodynamics
Now let's dig a little deeper. Can we apply these same ideas beyond the
realm of physics, to more general types of processes that change over time?
I am inspired by a general Whiteheadean
notion of procesess as fundamental things.
However, to keep things concrete, for now I'm going to provisionally
assume that the "processes" involved can be formulated as computer
programs, in some standard Turing-equivalent framework, or maybe a
quantum-computing framework. I think
the same ideas actually apply more broadly, but -- one step at a time...
Let us start with Kohtaro Tadaki's truly
beautiful, simple, elegant paper titled A statistical mechanical
interpretation of algorithmic information theory
Section 6 of Tadaki outlines a majorly aesthetic,
obvious-in-hindsight parallel between algorithmic information theory and
equilibrium thermodynamics. There is
seen to be a natural mapping between temperature in thermodynamics and
compression ratio in algorithmic information theory. A natural notion of "algorithmic free
energy" is formulated, as a sort of
weighted program-length over all possible computer programs (where the weights
depend on the temperature).
The following table (drawn from Tadaki's
presentation here) summarizes the key
mappings in Tadaki's theory
To ground the mappings he outlines,
Tadaki gives a simple statistical
mechanical interpretation to algorithmic information theory. He models an optimal computer as decoding
equipment at the receiving end of a noiseless binary communication
channel. In this context, he regards
programs for this computer as codewords (finite binary strings) and regards
computation results (also finite binary strings) as decoded “symbols.” For simplicity he assumes that the infinite
binary string sent through the channel -- constituting a series of codewords in
a prefix-free code is generated by infinitely repeated tosses of a fair
coin. Based on this simple reductive
model, Tadaki formulates computation-theoretic analogues to core constructs of
traditional equilibrium thermodynamics.
Now let's start putting some pieces together.
Perhaps the most useful observation I will
make in this blog post is: It
seems one could port the path-entropy based treatment of far-from-equilibrium
thermodynamics (as seen in the papers I've linked above) to Tadaki's
algorithmic-information context, by looking at sources emitting bits that are
not independent of each other but rather have some probabilistic dependencies..
By doing so, one would obtain an “algorithmic
energy” function that measures the energy of an algorithmic process over a
period of time -- without assuming that it’s a memoryless process like Tadaki
does in his paper.
To get this to work, so far as I can limn
without doing all the math (which I don't have time for at the moment, alas), one
needs to assume that the knowledge one has of the dependencies among the bits
produced by the process is given the form of expectations… e.g. that we know the average value of
f_k(x_{i+1}, x_i} for various observables f_k …. Plus one needs to make some other slightly
funny assumptions that are probably replaceable (the paper assumes “the number
of possible transitions does not depend on the starting point”… but I wonder if
this could be replaced by some assumption about causality…)
If I'm not mistaken, this should give us
something like Friston’s free energy principle that actually works and has
meaning…. I.e. we have a rigorous sense
in which complex algorithmic systems are minimizing free energy. The catch is that it’s an algorithmic path
energy -- but hey...
More precisely, relative to an observer S who
is observing a system S1 in a certain
way (by tabulating conditional probabilities of “how often some event of type A
occurs at time T+s, given some event of type B occurred at time T”) … we may
say the evolution of S1 in S’s perspective obeys an energy minimization
principle, where energy is defined algorithmic-informationally (following my
proposed, not-yet-fleshed-out non-equilibrium generalization of Tadaki’s
approach)…
Into the Quantum Rabbit Hole...
Now that we've gone this far, we may as well
plunge in a bit deeper, right?
Tadaki deals w/ classical computers but --
gesticulating only moderately wildly -- it seems one could generalize his
approach to quantum computers OK.
Then one is looking at series of qubits
rather than bits, and instead of tabulating conditional probabilities one is
tabulating amplitudes.
The maximum entropy principle is replaced with
the stationary quantropy principle
and one still has the situation that: Relative to S who is observing S1 using
some standard linear quantum observables, S1 may be said to evolve according to
a stationary quantropy trajectory, where quantropy is here defined via
generalizing the non-equilibrium generalization of Tadaki’s
algorithmic-informational entropy via replacing the real values w/ complex
values
So we may well get a kind of free-energy
principle for quantum systems also.
If we want to model cognitive stuff using
bits or qubits, then we have here a physics-ish theory of cognitive
stuff…. Or at least a sketch of the
start of one…
Out Toward the Eurycosm
One of the motivations for these
investigations was some discussions on higher-dimensional and more broadly
eurycosmic models of psi. If there are
non-physical dimensions that connect spatiotemporally distant entities, then
what are the dynamical laws in these dimensions? If we can model them as information
dimensions, then maybe the dynamics should be modeled as I’m alluding here…
Physics dynamics should be recoverable as a
special case of algorithmic-information dynamics where one adds special constraints. I.e. the constraints posed by spatial
structure and special relatively etc. should reflect themselves in the
conditional probabilities observed btw various classes of bits or qubits.
Then the linear symmetries of spacetime
structure should mean that when you calculate
maximum-path-algorithmic-information distributions relative to these physics
constraints, you end up getting maximum-Shannon-path-entropy
distributions. Because macrophysics
results from doing computing using ensembles of randomly chosen computer
programs (i.e. chosen subject to given constraints…).
Suppose we want to model a eurycosm that
works according to a principle like Peirce's "tendency to take
habits" aka
Smolin's Precedence Principle aka Sheldrake's morphic resonance? Well then, one can assume that the
probability distribution underlying the emanation of codewords in Tadaki's
model obeys this sort of principle.
I.e., one can assume that the prior probability of a certain subsequence
is higher if that subsequence, or another subsequence with some of the same
patterns in that sequence, have occurred earlier in the overall sequence. Of course there are many ways to modify
Tadaki's precise computational model, and many ways to formalize the notion
that "subsequences with historically more frequent patterns should be more
frequent going forward." But
conceptually this is quite straightforward.
One is forced however to answer the following
question. Suppose we assume that the
probability of pattern P occurring in a subsequence beginning at time T is in
some way proportional to the intensity with which P has occurred as a pattern
in the subsequence prior to time T.
What language of processes are we using to formalize the patterns
P? If -- in line with the framework I
articulate in The
Hidden Pattern and elsewhere -- we formalize a pattern P in X as a process
P that produces X and is simpler than X -- what is the language in which patterns
are expressed? What is the implicit programming language of
our corner of the eurycosm?
For simplicity I have been following Tadaki's conventional Turing machine based computational models here -- with a brief gesture toward quantum computing -- but of course the broad approach outlined here goes beyond these computing paradigms. What if we ported Tadaki's ideas to series of bits emanated by, say, a hypercomputer like the Zeno Machine? Then we don't just get a single infinite bit string as output, but a more complex ordinal construction with infinite bit strings of infinite bit strings etc. -- but the math could be worked. If the size of a Zeno Machine program can be quantified by a single real number, then one can assess Zeno Machine programs as patterns in data, and one can define concepts like compression ratio and algorithmic entropy and energy. The paradigm sketched here is not tied to a Turing Machine model of eurycosmic processes, though TMs are certainly easier for initial sketches and calculations than ZMs or even weirder things.
For simplicity I have been following Tadaki's conventional Turing machine based computational models here -- with a brief gesture toward quantum computing -- but of course the broad approach outlined here goes beyond these computing paradigms. What if we ported Tadaki's ideas to series of bits emanated by, say, a hypercomputer like the Zeno Machine? Then we don't just get a single infinite bit string as output, but a more complex ordinal construction with infinite bit strings of infinite bit strings etc. -- but the math could be worked. If the size of a Zeno Machine program can be quantified by a single real number, then one can assess Zeno Machine programs as patterns in data, and one can define concepts like compression ratio and algorithmic entropy and energy. The paradigm sketched here is not tied to a Turing Machine model of eurycosmic processes, though TMs are certainly easier for initial sketches and calculations than ZMs or even weirder things.
I have definitely raised more questions than
I've answered in this long and winding blog post. My goal has been to indicate a direction for
research and thinking, one that seems not a huge leap from the current state of
research in various fields, but perhaps dramatic in its utility and implications.