Saturday, September 17, 2016

Wild-ass shit: P-adic physics, p-adic complex probabilities and the Eurycosm

(some fairly out-there technical/cosmic musings/speculations …)

This post is best read while listening to 1:39:55 and onwards of https://www.youtube.com/watch?v=GzBslbMJdRU

I have been skimming (not yet carefully reading) some bits and pieces of the radically imaginative physics theories of Matti Pitkanen.   Among other innovations, he founds his physics theories on p-adic analysis rather than conventional differential and integral calculus.   Pitkanen has been looking at p-adic physics for  a long time; but in recent years various applications of p-adic math to physics have gotten more mainstream, with a large number of researchers jumping into the fray.

Inspired by a vague inkling of some of Pitkanen’s ideas, this afternoon I started thinking about the possibility of developing a notion of p-adic uncertainty.   Among other things, I have in mind Knuth and Skilling’s elegant derivation of probability theory from basic axioms regarding lattices and orders, and my sketchy ideas about how to extend their derivation to yield Youssef-style complex-valued probabilities.  

Now, one of Knuth and Skilling’s initial assumptions is the existence of a valuation mapping elements of a lattice of events into real numbers.   So, it becomes natural to wonder – what happens if one replaces this assumption with that of a valuation mapping elements of a lattice of events into p-adic numbers?  

Perhaps some variant of their symmetry arguments follows through … it seems at least plausible that it could, since the p-adic numbers also have field structure, and also have continuity.  

If so, then one would obtain a concept of p-adic probability, with an elegant foundation in symmetries.  

Extending this argument to complex numbers, one would obtain a p-adic analogue of Youseff-ian complex probability.  

One key difference between p-adic numbers and real numbers is ultrametricity – p-adic numbers obey a strong triangle inequality of the form

d(x,z) <= max{ d(x,y), d(y,z) }

Conceptually, ultrametricity can be modeled via drawing a tree structure with the elements of the ultrametric space at the leaves.  The distance between x and y then corresponds to the number of levels one has to go up in the tree, to form a path between x and y.

If one arranges the elements of one’s event lattice in a hierarchy, one can naturally define an ultrametric distance between the lattice elements using this hierarchy.   Intuitively, it seems that p-adic probability might provide a way of quantifying “size” of lattice elements that correlates with this sort of hierarchical distance.

Viewed in this way, and making all sorts of thinly-substantiated conceptual leaps, one is tempted to think about ordinary probability as heterarchical probability, and p-adic probability as hierarchical probability.  Or in the complex case: heterarchical vs. hierarchical complex probabilities.

It’s not immediately obvious why physics would make use of hierarchical rather than (or along with?) heterarchical complex probabilities.   But with a bit of funky lateral thinking, one can imagine why this might be so.

For instance, it seems to be the case that if one looks at the distribution of distances among sparse vectors in very high dimensional spaces, ultrametricity generally holds to within a close degree of approximation.   This suggests that if one embeds a set of sparse high-dimensional vectors into a lower-dimensional metric space, one may end up doing some serious injustice to the metric structure.   On the other hand, if one embeds the same set of sparse high-dimensional vectors into an ultrametric space, one may preserve the distance relations more closely.   But any ultrametric structure one imposes on finite datasets, if it’s going to have reasonable mathematical properties, is going to be equivalent to the p-adic numbers.  

So, suppose the set of events in our universe is viewed as a sparse set of events drawn from a much higher-dimensional space --- then projected into some sort of smaller space to form our universe.   It follows then that, to preserve something of the metric structure that our universe has when embedded in the original higher-dimensional space, we want to model our universe as having ultrametric structure.  But if we also want our universe to have some nice reasonable symmetries, then we end up with a p-adic structure on our universe, rather than a traditional real metric structure.

And finally – I mean, since we’re already way out on a latticework of extremely flimsy and weird-looking ambiguously-dimensional limbs, we may as well go all-out, right? – this would appear to bring us back to my modest proposal that our universe can be viewed as embedded in some broader eurycosm.   If our universe is embedded in some “wider world” or “eurycosm” which is viewed as very high dimensional (perhaps as an approximation of some sort of nondimensional structure), then one would appear to have the beginning of an argument as to how a p-adic foundation for physics would emerge.

It’s also worth noting that a finite non-dimensional structure can be turned into a high-dimensional structure via tensorial linearization – so that, to the extent we can describe a eurycosmic order via Boolean descriptors, we can also describe it via very high-dimensional vectors.    So we have a path from any logical description of a eurycosm, to a picture of our universe as a sparse set of high-dimensional vectors, to a picture of our universe as an ultrametric low-dimensional embedding of these vectors, to a p-adic foundation for physics…

And that, dear friends, is some hi-fi fuckin’ sci-fi !!!!

(I mean ... conspiracy theories about the Rothschilds or the Reptilians or whatever just can’t compete, in my not so humble opinion…)

The question, a la Max Born, is if it’s crazy enough to be (after appropriate fleshing out and tweaking, yadda yadda) pointing in the vague direction of some sort of useful truth…

(“Truth?  What is truth?”)


--> This was a fun post to write – it was written on a flight from the Milken conference in Singapore, where I served on a panel about the future of AI in Asia, back home to Hong Kong.  After blowing much of the Milken audience’s minds with videos of OpenCog-controlled Hanson robots and rather obvious observations about the imminent obsolescence of humanity and the potentials of nanotech, femtotech and superintelligence … I needed to plunge a bit into deeper questions.   Mathematics gives us intriguing, amazing hints at aspects of superhuman realms; though of course superhuman minds are likely to create new cognitive disciplines far beyond our concept of mathematics….
-->

Monday, September 12, 2016

Kafka in the morning

As Ruiting lay in bed this morning,, I handed her a book I was reading — some lit-crit essays by Walter Benjamin — so she could read a passage about Kafka (a conversation between Max Brod and Kafka):

‘I remember,’ Brod writes, ‘ a conversation with Kafka which began with present-day Europe and the decline of the human race. “We are nihilistic thoughts, suicidal thoughts that come into God’s head,” Kafka said. This reminded me at first of the Gnostic view of life: God as the evil demiurge, the world as his Fall. “Oh no,” said Kafka, “our world is only a bad mood of God, a bad day of his.” “Then there is hope outside this manifestation of the world that we know.” He smiled. “Oh, plenty of hope, an infinite amount of hope – but not for us.”

A suicide talking about a suicide talking about how our world is a sort of cosmic suicidal thought.

At times I certainly can feel that way too -- that our world is a sort of glitch in a broader and better cosmic realm — Yes, the post-Singularity universe will be amazing; yes there are all sorts of potentials for growth, joy and choice in the universe — but fuck it, humanity is just a goddamned piece of shit — we are the minimal generally intelligent system and we are tangled and fucked up in all sorts of social knots and the best we can do is to make one last final gasp of creativity, and on our deathbed as a species, cough out of our dying collective throat some sort of superior mind, some sort of engineered being with less perversity and more creativity and understanding … some sort of mind that doesn’t waste 99% of its energy defeating itself and its kindred by tying bizarre cognitive-emotional knots and suffering from the tension they create ..

And yet, it struck me that Kafka had so beautifully summarized the worthlessness of the human race that it was a self-defeating statement, right? the justification of the human race was, in fact, the exact sort of beauty illustrated in his statement as reported by Benjamin — as the minimum generally intelligent creature, we are the first species on earth (well except maybe some cetacea, who knows) capable of appreciating the poignance and beauty and terror and absurdity of our own limitations, and capable of understanding the glory and wonder of what lies beyond and what may come after us, and what may already exist in parallel with us (or in some sense within us) in different dimensions —

Amen! ...

Sunday, September 11, 2016

Does Modern Evidence Refute Chomskyan Universal Grammar?

Scientific American says, in a recent article, “Evidence Rebuts Chomsky’s Theory of Language Learning” …

Does it?  Well, sort of.  Partly.  But not as definitively as the article says.

Michael Tomasello, whose work I love, argues in the article that Chomsky’s old idea of a universal grammar is now obsoleted by a new “usage-based approach”:

In the new usage-based approach (which includes ideas from functional linguistics, cognitive linguistics and construction grammar), children are not born with a universal, dedicated tool for learning grammar. Instead they inherit the mental equivalent of a Swiss Army knife: a set of general-purpose tools—such as categorization, the reading of communicative intentions, and analogy making, with which children build grammatical categories and rules from the language they hear around them.

Here’s the thing, though.   Every collection of learning tools is going to be better at learning some things than others.   So, for any collection of learning tools that is set to the task of learning grammar, some grammars will be easier to learn than others.  That is, given a certain set of social and physical situations, any particular of learning tools will be biased to learn certain grammars for communication in those situations, as opposed to other grammars.

So if humans have a certain set of universal learning tools, it follows that humans have a certain “universal probability distribution over (situation, grammar) pairs.”

This is not exactly the same as a universal grammar in the classic Chomskyan sense.  But just how far off it is from what Chomsky was thinking, remains to be understood.

For instance, more recent versions of Chomsky’s ideas view a sort of linguistic recursion as the core principle and tool of universal grammar.   Does our collection of human learning tools give us a strong bias to learn grammars involving certain sorts of linguistic recursion, in humanly common physical/social situations?   It may well.

Does the fact that some obscure grammars like Piraha appear not to have much recursion in their grammar refute such a possibility?  Not really.  It appears likely the Piraha have recursion in their linguistic repertoire, but just carry out this recursion more on the pragmatic and cross-sentential level, rather than on the level of syntax within individual sentences.  But that’s one obscure language — and the fact that a certain linguistic form does not appear in EVERY human language, does not refute the idea that there is a universal probabilistic bias toward this form in the human brain.

I’m not just splitting hairs here.   The question is to what extent has evolution honed the set of learning tools in the human mind for learning particular sorts of linguistic forms.   Tomasello’s intuition seems to be: not that much.  That is, he seems to think that our learning tools basically evolved for more general perceptual, motor and social learning, and then we just use these for language learning as well.   This is possible.  However, it’s also possible that our toolset has been substantially honed by evolution for the particularities of language learning — in which case there is a meaningful “universal human bias for learning certain types of grammars”, which can be thought about as a more modern incarnation of many of Chomsky’s ideas about universal grammar.

This issue is also relevant to AGI, because it has to do with how much attention AGI designers should spend on learning algorithms that are tuned and tweaked for language learning in particular, as opposed to expecting language learning to just pop out from application of general-purpose learning tools without any special language-oriented tuning.

Clearly Chomsky proposed a lot of strong ideas that just don’t hold up in the light of modern data regarding child language learning.  However, sometimes science (like many other human endeavors) can be a bit too much of a swinging pendulum, going from one extreme all the way to the other.  I wonder if the wholesale rejection of universal-grammar-related ideas in favor of usage-based ideas may be an example of this.  I wonder if we will find that the specific assemblage of learning tools in the human mind is, in fact, very well tuned by evolution to make learning of some specific grammatical forms especially easy in evolutionarily commonplace human situations. 

Saturday, September 10, 2016

In What Sense Does Deep Learning Reflect the Laws of Physics?


“Technology Review” is making a fuss about an article by Lin and Tegmark on why deep learning works.   To wit:

Physicists have discovered what makes neural networks so extraordinarily powerful
Nobody understands why deep neural networks are so good at solving complex problems. Now physicists say the secret is buried in the laws of physics

It's a nice article, but as often happens, the conclusion is a bit more limited -- and rather less original -- than the popular media account suggests...

Stripping away the math, the basic idea they propose in their paper is a simple and obvious one: That the physical universe has a certain bias encoded into it, regarding what patterns tend to occur in the universe….   Some mathematically possible patterns are common in our physical universe, others are less common.

As one example, since the laws of physics limit communication between distant points but the universe is spread out, there tend to arise patterns involving multiple variables, many of which are only loosely dependent on each other.

As another example, hierarchical patterns are uncommonly common in our universe — because the laws of physics, at least in the regimes we’re accustomed to, tend to lead to the emergence of hierarchical structures (e.g. think particles building up atoms building up molecules building up compounds building up cells building up organisms building up ecosystems…).

Since the physical universe has certain habitual biases regarding what sorts of patterns tend to occur in it, it follows that a pattern recognition system that is biased to recognize THESE types of patterns, is going to be more efficient than one that has different biases.   It’s going to be inefficient for a pattern recognition system to spend a lot of time searching physical-world data for possible patterns that are extremely unlikely to occur in our physical universe, due to the nature of the laws of physics.

So -- this is a quite valid point, but not at all a new point — for instance I made that same point in this paper a few years ago  (presented at an IEEE conference on Human-Level Intelligence in Singapore, and published in the conference proceedings... and mostly reprinted in my book Engineering General Intelligence as part of the early preliminary material)…

Now my mathematical formalization of this idea was quite different than Lin and Tegmark’s, since I tend to be more abstract-mathy and computer-sciency than physicsy … what I said formally is

MIND-WORLD CORRESPONDENCE PRINCIPLE: For an organism with a reasonably high level of intelligence in a certain world, relative to a certain set of goals, the mind-world path transfer function is a goal-weighted approximate functor

Formalism aside, the basic idea here is that: If you have a system that is supposed to achieve a high degree of goal-achievement in a world with a certain habitual structure, then the best way for this system to do so using limited resources is to internally contain structures that are morphic to the habitual structures in the world*

I explicitly introduced the example of hierarchical structure in the world — and pointed out that intelligent systems trying to achieve goals in a hierarchical world will do best, using limited resources, if they internally have a hierarchical structure (in a way that manifests itself specifically in their goal-seeking behavior).

Deep neural networks are an example of a kind of system that manifests hierarchical structure internally in this way.

Certainly I am not claiming any sort of priority regarding this general conceptual point, though — I am sure others made that same point way before I did, expressing it in different language...

One also shouldn’t overestimate the importance of this sort of point, though.  Lin and Tegmark point out that "properties such as symmetry, locality, compositionality and polynomial log-probability” come out of the laws of physics, and also are easily encoded into the structure of neural networks. This is all true and good … but of course self-organizing systems add a lot of complexity to the picture, so many patterns in the portion and level of the physical universe that is relevant to us, do NOT actually display these properties… which is why simply-structured neural networks like deep neural networks are not actually adequate for AGI....

Specifically, we may note that current deep neural networks do best at recognizing patterns in sensory data, which makes sense because sensory data (as opposed to stuff that is more explicitly constructed by mind and society) is more transparently  and directly structured via “physical law.”

It's cool to see the popular media, and more and more scientists from various disciplines, finally paying attention to these deep and important ideas....   But as more attention comes, we have to ward off oversimplification.  Tegmark and Lin are solid thinkers and smart people, and they know it's not so simple as "deep neural nets are the key to intelligence because they reflect aspects of the laws of physics" -- and they  may well even know that diverse others have made very similar points to theirs dozens of times over the preceding decades.  Let's just remember these are subtle matters, and there is still much to be understood -- and any one special class of algorithms and structures, like deep neural networks, is only going to be one modest part of the AGI picture, conceptually or pragmatically.  

Saturday, September 03, 2016

The Semantic Primitives Enabling Universal Computation and Probabilistic Logic...



I have been musing a bit about combinatory and probabilistic logic, and what they have to teach us philosophically.   It’s perhaps an obvious sort of point, but to me it’s interesting to reflect on these basic foundations and try to understand what are the core, minimum requirements for being able to perform computations and think about the world.

The conclusion of my musing here will be list of commonsensical semantic primitives, so that, roughly speaking: Any mind that has mastered these, should be able to do universal computing and universal probabilistic logic … limited only by its available resources.

The Crux of Combinatory Logic

Let us start with computation; and what is in a sense the simplest formalism for universal computation, combinatory logic.

One thing combinatory logic tells us is that to do any kind of computation, all we need is to be able to do the following things:

  • ·      Sequential ordering: arranging two or more things in a list with a specific order
  • ·      Grouping: drawing a boundary around a subsequence of two or more things, to distinguish the subsequence from other things before or after it
  • ·      Insertion: draw an arrow connecting one entity or sequence, with a position in another sequence
  • ·      Removal: note that one entity or sequence, consists of a certain other sequence with its {beginning, middle or end} removed.   We could decompose this into the primitive concepts of beginning middle and end; and the primitive concept of removal.
  • ·      Action: Actually carry out an insertion (placing one entity in a certain position in another sequence) or removal (create a copy of a sequence with its beginning, middle or end removed)


An elegant way to represent insertion and removal is to think about there as being four sorts of arrows: simple arrows, remove-beginning arrows, remove-middle arrows and remove-end arrows.   Enacting a remove-end arrow pointing from a sequence to a position in another sequence, results in said position receiving a version of the first sequence with its end removed.

This set of basic operations is enough to do general-purpose computation.   To see, this, note that the S and K combinators, defined as

K x y = x
S x y z = (x y) (x z)

are enough to do general-purpose computation.   For,

·      K is just a remove-end arrow
·      S is a sequencing of a grouping of a remove-end arrow, and a grouping of a remove-middle arrow

(Note that, according to the sufficiency of S and K, remove-beginning is not strictly necessary; we can make do with just remove-middle and remove-end, and derive remove-beginning therefrom.)

We can depict this graphically by adopting some simple conventions:

  • ·      Entities written on a horizontal row, connected by thick arrows pointing from left to right, are considered as a sequence to be read from left to right
  • ·      Arrows denoting insertion point down
  • ·      Removals are represented as arrows labeled by –e (remove end), -m (remove middle) or –b (remove beginning)
  • ·      A star indicates the result of doing an action (where an action is an arrow)
  • ·      Grouping of a subsequence is indicated by drawing a box around the subsequence
  • ·      An arrow pointing into a box, means that upon enaction, the result of the action denoted by the arrow is placed into the box
  • ·      An arrow pointing from a bracket around a subsequence, takes the bracketed subsequence as input when enacted
  • ·      An arrow pointing from a grouping (box) around a subsequence, takes the box and its contents as input when enacted



The following diagrams illustrate these conventions:









In this formalism, an arbitrary computation can be represented as a series of rows, where each row consists of a sequence of boxes (some containing other sequences of boxes) or spaces; with each lowest-level box and each space containing the pointy end of an arrow, whose root is some box or bracketed subsequence on the row above.

We can support concurrency in this formalism by interpreting two sequences A and B so that neither A nor B is below each other, as concurrently enactable.  We say that X is immediately below Y if there is an arrow pointing from somewhere in X to somewhere in Y.  We say that X is below Y if you can reach Y from X by following a series of immediately-below relationships.

Conceptually, and to be a bit repetitive, what this means is: to do ANY computation whatsoever, all you need to be able to do is…

  • ·      Arrange things in a linear sequence
  • ·      Put a boundary around a subsequence, to mark it off from the rest of the sequence
  • ·      Indicate to yourself that one sequence (with a  boundary or not) should be inserted in a certain place in another sequence
  • ·      Indicate to yourself that one sequence is obtained by removing the beginning, middle or end of some sequence
  • ·      Enact some insertion or removal that you have conceptualized


That’s it.    In a very basic sense, that’s what computing is.

Boolean and Probabilistic Logic

Now, none of the above tells us what things should be arranged in sequence, or grouped in boxes, in the first place.    Combinatory logic as formulated above can only do purely mathematical computing.  To do computing relative to some body or other practical system, we need to add at very least one more primitive:

  • ·      Enact some primitive action, based on some sequence of (primitive or composite) perceptions


To handle perceptions and primitive actions elegantly, however, it seems this is not enough.   It seems very useful to add a few additional components to our basic model of computation, to wit:

  • ·      Un-ordered grouping (sets)
  • ·      Join (on sets)
  • ·      Direct product (on sets)
  • ·      Set difference
  • ·      Comparison (a partial order operation)
  • ·      Valuation: mapping sets of primitive perceptions into numbers


As Knuth and Skilling have shown if we add these operations, then we get not only Boolean logic, but also, fairly directly, probability theory.  If we let the valuation map into pairs of real numbers instead of single real numbers, then we get quantum probability theory.

On the other hand, what Schonfinkel showed when originally formulating combinatory logic, was that he could emulate quantifier logic using combinatory logic simply by adding one more combinator, the U combinator.  What the U combinator does is says “UAB means not both A and B.”    But this comes immediately as a side-effect of the set operations introduced just above.  So if we put the basic combinator operations together with the basic lattice operations, we get quantifier logic.

So we then get general-purpose probabilistic-logical computation.

Graphically, to represent these additional algebraic operators, we would add the potential for unordered grouping.  For instance, in this diagram



the outputs of the first two arrows can be in either order inside their grouping; and this grouping can be in either order as compared to the third arrow.  The left-to-right ordering on the page has no semantics here.

One would then add join (+) and direct product (x) operators between unordered groups.   Unordered groups can contain sequences;a nd sequences can contain unordered groups; or operator expressions built up from unordered groups using + or x.  A box around an unordered group may be used to allow + or x to act on that group as a whole.

(To emphasize, these additions are not strictly needed, because pure combinatory logic can be used to emulate these lattice operations.   I just think the latter is counterintuitive and messy, and it’s better to add the lattice operations as primitives.  This is certainly debatable.)

So, to be a bit repetitive again -- philosophically, what this suggests is that to do computing nicely about observations in a world, you also want to have a few other processes at your disposal besides those strictly required for computation:

  • ·      The ability to group things into unordered sets
  • ·      The ability to join two sets together
  • ·      The ability to combine two sets (by combining each element of one, with each element of the other)
  • ·      The ability to compare two sets, to see if one is part of the other
  • ·      The ability to measure how big a set is – i.e. to map sets to numbers representing “size”, in a way that works nicely with the above set operations
  • ·      The ability to compare two size-measurements, so as to say if one set is bigger than another


These basic operations let you do probabilistic logic; and put together with combinators, they let you do quantifier logic … and this logic can then interoperate complexly with computation.

Musing a Bit

One nice thing is that all these operations are pretty simple and natural conceptually. They are not weird, obscure mathematical operations.  They are all quite basic, intuitive operations.

Looking at these basic operations, it’s not hard to see how the ability for general-purpose computation and logical reasoning would emerge from simpler  organisms lacking this ability.   Each of the basic operations needed to support computation or reasoning seems likely to be useful in isolation, in various situations.  The ability to identify middles and ends, or the ability to insert sequences into other sequences, or the ability to mentally join together two sets of observations, etc. – each of these things is going to be useful in practice for many organisms that cannot do general logic or computation.   But then once these basic capabilities have evolved, due to their partly-separate, partly synergetic-in-various-groupings, utility … then they can come together to do general-purpose computation and logic … and then the utility of this general-purpose logic will reinforce all these capabilities and cause them to pass on through the generations.

Thinking about general intelligence, it seems that if we wanted to take a reasonably good stab at making an unbiased sort of general intelligence, we would make a system that tried to learn concise programs computing patterns in its observations, using a programming language comprising the simple operations enumerated above.   Such an AGI would aim to learn patterns in the world that were as simple as possible, when expressed in terms: of basic lattice operations, corresponding numerical size comparisons; and basic sequence groupings, insertions and beginning/middle/end removals.  

In terms of teaching baby AGIs, it would perhaps be wise to include in the curriculum lessons focused on each of the elementary operations mentioned above.  Via combining together practical facility with these elementary operations, quite general practical facility becomes possible.

Semantic Primitives

Finally, it is a little bit interesting to take these basic mathematical operations and reframe them less formally as semantic primitives.

The verbal primitives we arrive at, from the above math operations, look like:

  • ·      Before
  • ·      After
  • ·      Together with
  • ·      Put into
  • ·      Take out of
  • ·      Beginning
  • ·      Middle
  • ·      End
  • ·      Do
  • ·      Or
  • ·      Combine with
  • ·      More than
  • ·      Graded indication of how much more than


Every human language, so far as I know, contains good methods of referring to all of the above except for perhaps the latter.  The latter I suspect is handled gesturally and/or in terms of phonological variation, in languages which have limited verbal mechanisms for expressing magnitudes.

At a basic level, it is mastery of these semantic primitives that enables universal computation and universal uncertain reasoning.

An Alternative Language for Automated Programming?

These considerations also suggest a concise elementary programming language, a bit different from existing ones.  What we would have are:

  • ·      Basic types for: primitive action, primitive perception, list, set, group
  • ·      The notion of an arrow from a set, group, list or sub-list to a specific location in another list
  •  ·      The notion of removing the beginning, middle or end of a list (or, for good measure, perhaps removing any specified sub-list between positions m and n in a list).
  •  ·      The notion of enacting a set or list (thus indirectly enacting some primitive actions, after following some arrows)
  •  ·      Join and direct-product as set operations
  • "    "Greater than" and "less than", for sets based on inclusion, and for numbers produced as probabilities
  •       Addition and  multiplication of probabilities; and evaluation of probabilities via normalizing set sizes


Such a language would not be very suitable for human programming, but could be an interesting alternative for automated programming such as evolutionary learning or probabilistic program inference.   It would seem amenable to a highly efficient implementation, leveraging nice data structures for holding and manipulating sets and lists.