Sunday, August 28, 2016

An Alternate Formulation of MaxEnt and Maximum Entropy Production

This (rather technical) blog post observes that, via using tensorial linearization of Boolean functions, one can make a novel formulation of the maximum entropy and maximum entropy production principles.

The relationship of these novel formulations to the traditional formulations has not been thoroughly explored yet.

For general background on maximum entropy production, see XX.   Maximum entropy is better known and understood; see XX for the basics.

Basic Setup

Suppose we have a world with N specific observations in it.  Suppose each observation occurs at a certain point in time (where time-points are defined relative to a particular observer O …)

Now suppose that observer O1 has a coarse-grained view and often cannot distinguish two different observations from each other.  In this case O1 will view observations as coming with “count” values: n1 observations of type t1, n2 observations of type t2, etc.    (O1 may also be a subset of O’s mind…).   So what O1 sees will be a certain probability distribution over types t1, t2,….

Next, consider a distribution over “possible worlds”….  Note this can be done two ways:

·      Possible worlds as seen by O (who sees the individual observations)
·      Possible worlds as seen by O1 (who sees only distributions of counts over types)

For starters, consider the assumption that each observation is equally likely.   Let’s look at the scope of possible worlds as perceived by O1, in this case.   We can say that the distribution where O1 makes N observations and they all go into type t1, occurs only in 1 way.   But the distribution where O1 makes N observations and they go into m different categories, with N/m in each category, occurs in many more ways.   So if we are asking which distributions occur most often, from O1’s point of view, then we come up with the conclusion that the equiprobable distribution will occur most often.

Looking at the world according to O1 from O’s view, that is -- the comparison comes out similarly.   If we assume that each observation has an equal, independent chance of being assigned each type… then the number of possible worlds in which O1 puts all the observations in bin t1 is fewer, and the number in which O1 distributes the observables evenly among the bins is larger.  

The above argument yields the maximum entropy principle, according to the standard Boltzmann argument.  

Fun with Tensorial Linearization

The above is all well-known stuff, just phrased a little differently.

Now let's make things a bit more interesting.

Suppose one has a constraint more complex than equiprobable, independent observations.   One can still ask what distributions of counts over types are more likely.   If the constraints are linear then the answer still comes out as an entropy.

What if the constraints are nonlinear?  In general the maxent principle only applies with linear constraints.  However, using tensorial linearization, any Boolean function can be written as linear, on a very high dimensional space in which every conjunction of variables corresponds to a dimension.   So if one has a set of Boolean constraints on the observations seen by O (or by O1, as a special case) , then one can reformulate these as linear constraints on a higher dimensional space.  One can then argue that the most likely distribution O1 should assume is the maximum entropy distribution over this higher dimensional space whose axes are conjunctions of observations.

A New Look at Maximum Entropy of Dynamics

Now what if we want to apply this to dynamics?

In this case one has a system S at a certain point in time, T.   The observations in question are observations of S at the slight future of T (time T plus epsilon, say).   The constraints involved are basically the probabilities of each slight-future observation from the point of view of O, or O1.   Some slight-future observations are more likely than others based on the dynamics of S.  

Now the dynamics of S are going to make the dependencies between observations fairly complex.  However, if we can express this complexity as a set of probabilities attached to Boolean combinations of observation-typed recognized by O1, then we can do tensorial linearization and obtain a set of linear constraints on a higher dimensional space.   We then get the result that the system S evolves according to the maximum entropy distribution, from O1’s point of view (where the entropy is measured in regard to the higher-dimensional space of conjunctions).  I.e., roughly speaking, the various conjunctions of basic observations are going to be as equally-likely as they can be, while still obeying consistency with the given logical constraints.

Or, suppose we apply this argument to paths, i.e. sequences of states of S occurring over time?   If we coarse-grain paths (as we are doing from O1’s perspective) then paths will overlap, but we can again use tensorial linearization to account for overlaps.  We can then say that the evolution of S will follow a maximum entropy distribution over the space of conjunctions of paths, from O1’s point of view.    

The question is then in what sense this actually give a law of “maximum entropy production” in a physics sense.  Fairly clearly, if the various paths are statistically considerable as independent, then it works.  But if the paths are subtly and significantly interdependent, then the logical space on which maxent holds may be different than the physical space in which thermodynamic entropy is measured.  

1 comment:

Mentifex said...

Ben, this Boltzmannesque post is a little beyond your usual mundanity and urbanity. There is no way that my latest artificial Mind at can understand your mathematical discourse. I've been waiting for you to post something so that I can visit here and let you know that I have been working furiously to port my Perl AI back into Forth so that the AGI-let can think continuously and perhaps even consciously. However, your post of today is actually very Goertzelian. Bye - Arthur T.