I will argue here that, in natural environments (I’ll
explain what this means below), intelligent agents will tend to change in ways
that locally maximize the amount of pattern created. I will refer to this putative principle as
MaxPat.
The argument I present here is fairly careful, but still is far from a
formal proof. I think a formal proof
could be constructed along the lines of this argument, but obviously it would
acquire various conditions and caveats along the route to full formalization.
What I mean by “locally maximize” is, roughly: If an
intelligent agent in a natural environment has multiple possible avenues it
may take, on the whole it will tend to take the one that involves more pattern
creation (where “degree of patternment” is measured relative to its own memory’s
notion of simplicity, a measure that is argued to be correlated with the
measurement of simplicity that is implicit in the natural environment).
This is intended to have roughly the same conceptual form as
the Maximum Entropy Production Principle (MEPP), and there may in fact be some
technical relationship between the two principles as well. I will indicate below that maximizing
pattern creation also involves maximizing entropy in a certain sense, though
this sense is complexly related to the sort of entropy involved in MEPP.
Basic Setting: Stable
Systems and Natural Environments
The setting in which I will consider MaxPat is a universe
that contains a large number of small “atomic” entities (atoms, particles, whatever), which exist in space and time, and
are able to be assembled (or to self-assemble) into larger entities. Some of these larger entities are what I’ll
call Stable Systems (or SS’s), i.e. they can persist over time. A Stable System may be a certain pattern of
organization of small entities, i.e. some or all of the specific small entities
comprising it may change over time, and the Stable System may still be
considered the same system. (Note also that a SS as I conceive it here need not be permanent; stability is not an absolute concept...)
By a “natural environment” I mean one in which most Stable
Systems are forming via heavily stochastic processes of evolution and
self-organization, rather than e.g. by highly concerted processes of planning
and engineering.
In a natural environment, systems will tend to build up
incrementally. Small SS’s will build up
from atomic entities. Then larger SS’s
will build up from small SS’s and atomic entities, etc. Due to the stochastic nature of SS
formation, all else equal, smaller combinations will be more likely to get
formed than bigger ones. On the other
hand, if a bigger SS does get formed eventually, if it happens to be highly
stable it may still stay around a while.
To put it a little more explicitly: The odds of an SS
surviving in a messy stochastic world are going to depend on various factors,
including its robustness and its odds of getting formed. If formation is largely stochastic and
evolutionary there will be a bias toward: smaller SS’s, and SS’s that can be
built up hierarchically via combination of previous ones… Thus there will be a bias toward survival of
SS’s that can be merged with others into larger SS’s…. If a merger of S1 and S2 generally leads to
S3 so that the imprint of S1 and S2 can still be seen in the observations
produced by S3 ( a kind of syntax-semantics continuity) then we have a set of
observations with hierarchical patterns in it…
Intelligent Agents
Observing Natural Environments
Now, consider the position of an intelligent agent in a
natural environment, collecting observations, and making hypotheses about what
future observations it might collect.
Suppose the agent has two hypotheses about what kind of SS
might have generated the observations it has made so far: a big SS of type X,
or a small SS of type Y. All else
equal, it should prefer the hypothesis Y, because (according to the ideas
outlined above) small SS’s are more likely to form in its (assumed natural)
environment. That is, in Bayesian
terms, the prior probability of small SS’s should be considered greater.
Suppose the agent has memory capacity that is quite limited
compared to the number of observations it has to process. Then the SS’s it observes and conjectures
have to be saved in its memory, but some of them will need to be forgotten as
time passes; and compressing the SS’s it does remember will be important for it
to make the most of its limited memory capacity. Roughly speaking the agentwill do better to adopt
a memory code in which the SS’s that occur more often, and have a higher
probability of being relevant to the future, get a shorter code.
So, concision in the agent’s internal “computational model”
should end up corresponding roughly to concision in the natural environment’s
“computational model.”
The agent should then estimate that the most likely future
observation-sets will be those that are most probable given the system’s remembered
observational data, conditioned on the understanding that those generated by
smaller SS’s will be more likely.
To put it more precisely and more speculatively: I conjecture
that, if one formalizes all this and does the math a bit, it will turn out
that: The most probable observation-sets O will be the ones minimizing some
weighted combination of
- Kullback-Leibler distance between: A) the distribution over entity-combinations on various scales that O demonstrates, and B) the distribution over entity combinations on various scales that’s implicit in the agent’s remembered observational data
- The total size of the estimated-likely set of SS generators for O
As KL distance is relative entropy, this is basically a
“statistical entropy/information based on observations” term, and then an
“algorithmic information” type term reflecting a prior assumption that more
simply generated things are more likely.
Now, wha does this mean in terms of “pattern theory”? -- in which a pattern in X is a function that is simpler than X but (at least approximately) produces X? If one holds the degree of approximation equal, then the simpler the function is, the more 'intense" it is said to be as a pattern.
In the present case, the most probable observation-sets will be ones that are the most intense patterns relative to the background knowledge of the agent’s memory. They will be the ones that are most concise to express in terms of the agent’s memory, since the agent is expressing smaller SS generators more concisely in its memory, overall.
In the present case, the most probable observation-sets will be ones that are the most intense patterns relative to the background knowledge of the agent’s memory. They will be the ones that are most concise to express in terms of the agent’s memory, since the agent is expressing smaller SS generators more concisely in its memory, overall.
Intelligent Agents
Acting In Natural Environments
Now let us introduce the agent’s actions into the picture.
If an agent, in interaction with a natural, environment, has multiple possible
avenue of action, then ones involving setting up smaller SS’s will on the whole
be more appealing to the agent than ones involving setting up larger SS’s.
Why? Because they
will involve less effort -- and we can assume the system has limited energetic
resources and hence wants to conserve effort.
Therefore, the agent’s activity will be more likely to
result in possible scenarios with more patterns, than ones with less
patterns. That is -- the agent’s
actions will, roughly speaking tend to lead to maximal pattern generation -- conditioned
on the constraints of moving in the direction of the agent’s goals according to
the agent’s “judgment.”
MaxPat
So, what we have concluded is that: Given the various
avenues open to it at a certain point in time, an intelligent agent in a natural environment will tend to choose
actions that locally maximize the amount of pattern it understands itself to create
(i.e., that maximize the amount of pattern created, where “pattern intensity”
is measured relative to the system’s remembered observations, and its knowledge
of various SS’s in the world with various levels of complexity.)
This is what I call the Maximum
Pattern Creation Principle – MaxPat.
If the agent has enough observations in its memory, and has
a good enough understanding of which SS’s are small and which are not in the
world, then measuring pattern intensity relative to the agent will be basically
the same as measuring pattern intensity relative to the world. So a corollary is that: A sufficiently knowledgeable agent in a natural environment, will tend to
choose actions that lead to locally maximum pattern creation, where pattern intensity
is measured relative to the environment itself.
There is nothing tremendously philosophically surprising here; however, I
find it useful to spell these conceptually plain things out in detail
sometimes, so I can more cleanly use them as ingredients in other ideas. And of course, going from something that is
conceptually plain to a real rigorous proof can still be a huge amount of work;
this is a task I have not undertaken here.
"...it understands itself to create" It forms the equivalent of (at least one) Theory about itself?
ReplyDeleteMaximizing pattern intensity is proposed to organize a system of pattern recognition nodes.
ReplyDeleteConversely pattern minimization leads to abstraction in the sysyem of sysyems view.
This is awesome. I had a similar theory regarding the evolution of consciousness from our bilaterally-evolved neural network.
ReplyDeleteOur networks are algorithmically driven-- assigned specific signal data to look out for and spring into action when such criterion are met.
We are “observant and predictive environmental manipulation machines”, bilaterally evolved for the express purpose of tripping genetically and existentially- programmed biochemical excitation and satiation triggers. The ensuing raw data throughput to achieve each “trigger” is inconsequential to the efficiency of the energetic processes required to trip it.
We are existentially driven to satisfy all of our neural networks -simultaneously, if possible- using the least amount of energy expenditure. In order to streamline the efficiency process, pain becomes a self-correcting, “navigation” system. Nociceptive, neuropathic and psychogenic message delivery algorithms inform the organism that it is wastefully pouring energy into potentially avoidable system redirection and repair functions. This holds true for “existential” pain as well as physical.
I believe the algorithm of life is to— Seek or create a perpetual, stable, pain-free environment using the least amount of energy possible and reproduce in case of failure.
The algorithm catalyzes creativity through invention and artistry — invention to manipulate the painful environmental barriers and artistry to counter the existential ones.
A more common dual to the maximum entropy principle is the maximum power principle. This avoids counter-intuitive the claims such as the purpose of fridges is to heat the environment, and the purpose of cars is to produce exhaust. The main problems I see with this attempt at a more positive framing are three fold. One is that it isn't clear that these maximands are significantly different. Two is that it looks as though where they do differ in predictions the "entropy" formulation might have an advantage. For example, if an agent can damage its competition through destroying resources it can't use itself, then it can be expected to do that. That fits with maximum entropy, but not so much with maximum power. Three is the scientific literature volume. Most scientists seem to favor maximum entropy formulations.
ReplyDeleteMaximum pattern shares some of these issues, but adds some concerns about the definition of pattern. Also, "pattern" is quite a bit like "negentropy". It's true that e.g. fridges minimize entropy loss locally - in specific areas - but this is a whole different idea from maximum entropy. It is much less controversial - but also not so interesting. Part of the interest in maximum entropy is the idea that it represents "God's utility function" - evolution's maximand. That organisms often benefit from hoarding negentropy seems like a more commonplace observation.
My current favorite author on this topic is Axel Kleidon and his most easily readable & complete paper seems to be this one: http://www.bgc-jena.mpg.de/bgc-theory/uploads/Pubs/2009-NaWi-AK.pdf
ReplyDeleteArgh, that URL did not work out. Try again: Alex Kleidon (2009) Non-equilibrium thermodynamics and maximum entropy production in the Earth system
ReplyDeleteThis is in response to Tim Tyler's post: What Kleidon (and many many others) are saying is not that "nature is maximizing entropy", but rather that the flow of power through a system causes that system to try to extract as much "negentropy" as possible from that system -- or, what Ben is calling "extracting patterns".
ReplyDeleteWhat Kleidon is points out is that maximum pattern extraction has nothing to do with AI, per-se, rather, it is in the nature of hyperbolic dynamical systems -- any dynamical system with any region that has a "positive Lyapunov constant" is going to do the Hopf-bifurcation thing -- as long as you keep driving it with power/energy (solar energy, geothermal energy, in Kleidon's papers).
The deal with hyperbolic systems is that, as one drives them by dumping power into them, the number of occupiable states increases -- that is, they naturally move to those configurations where theere are more and more states that can be occupied. Now, occupying a state requires energy, so you suck energy out of the environment, and, since there are more states, the entropy increases. But since the path was hyperbolic, the states that get occupied look like fractals -- this is the bifurcation at work, the splitting that opens up new states makes the system look like it s self-similar -- that is *why* trees and river basis look fractal.
The next point is that the simplest systems bifurcate in the easiest ways -- that is, the simplest systems will lead to the Cantor-set-style fractals, where each branch is equally weighted (e.g. the koch curve, the De Rham curves for example) Some slightly more complex (but inorganic) systems do not generate binary strings, but the p-adic numbers instead. If these are equally weighted, you still get the fairly standard fractals.
The place where it gets interesting is when the dynamical system has a digital component to it -- digital in the sense of "not analog" -- e.g. DNA. The digital component ends up causing certain branches of a binary or p-adic bifurcation to have exactly zero weight -- i.e. do not get explored, at all (e.g. if a gene is not expressed, that branch cannot be taken; its not energetcially possible). So, instead of koch curves, you get the biological forms. Prusinkiewicz has done an excellent, beautiful job of exploring the digital vs. analog interplay in hyperbolic dynamical systems.
That much is clear. How to jump from the biological systems (digital+analog) to e.g. the patterns in language, I don't know. So really, for me, the question is: is human language just some digital+analog machine, much like that in biology, or are the patterns in language somewhow different than the biological patterns? If they are different, then, in what way?
Promoters of the "maximum entropy production principle" do generally argue that nature maximizes entropy (subject to constraints). Some see this as being equivalent to power maximization - e.g. see "Beyond the Second Law: An Overview" by Roderick C. Dewar, Charles H. Lineweaver, Robert K. Niven and Klaus Regenauer-Lieb. I'm a bit skeptical about whether power maximization formulations are more useful.
ReplyDelete
ReplyDeleteSV388
S128
Sbobet Casino
Sbobet
RAJABOLA Adalah Situs Agen Judi Bola dan Casino Online Resmi Di Indonesia Pelayanan 24 Jam Terpercaya Terbaik Bonus New Member 100% Deposit Pulsa Tanpa Potongan Termurah 10 Ribu. Info selengkapnya kunjungi link di bawah ini
ReplyDeleteSBOBET CASINO
Live22
ION Casino
JOKER388 GAMING
슬롯커뮤니티
ReplyDeletebezoek website :Biography Of Song Joong-ki
ReplyDeletebezoek website :One Piece: 7 Organizations Involved in the Underworld!
bezoek website :7 Largest Wheat Producing Countries in the World
bezoek website :Biography Of Edwin Van Der Sar
bezoek website :The History Of Blackpool Football Club's Logo
bezoek website :10 Typical Food From Montenegro
bezoek website :7 Largest Strawberry Producing Countries In The World
bezoek website :7 Largest Snapper Producing Countries In The World
I would like to share my success as the link at the bottom.
ReplyDeleteRegardless, just wanted to say fantastic blog!
ReplyDeleteI’m truly enjoying the design and layout of your website.
ReplyDelete