To follow this blog by email, give your address here...

Saturday, September 10, 2016

In What Sense Does Deep Learning Reflect the Laws of Physics?


“Technology Review” is making a fuss about an article by Lin and Tegmark on why deep learning works.   To wit:

Physicists have discovered what makes neural networks so extraordinarily powerful
Nobody understands why deep neural networks are so good at solving complex problems. Now physicists say the secret is buried in the laws of physics

It's a nice article, but as often happens, the conclusion is a bit more limited -- and rather less original -- than the popular media account suggests...

Stripping away the math, the basic idea they propose in their paper is a simple and obvious one: That the physical universe has a certain bias encoded into it, regarding what patterns tend to occur in the universe….   Some mathematically possible patterns are common in our physical universe, others are less common.

As one example, since the laws of physics limit communication between distant points but the universe is spread out, there tend to arise patterns involving multiple variables, many of which are only loosely dependent on each other.

As another example, hierarchical patterns are uncommonly common in our universe — because the laws of physics, at least in the regimes we’re accustomed to, tend to lead to the emergence of hierarchical structures (e.g. think particles building up atoms building up molecules building up compounds building up cells building up organisms building up ecosystems…).

Since the physical universe has certain habitual biases regarding what sorts of patterns tend to occur in it, it follows that a pattern recognition system that is biased to recognize THESE types of patterns, is going to be more efficient than one that has different biases.   It’s going to be inefficient for a pattern recognition system to spend a lot of time searching physical-world data for possible patterns that are extremely unlikely to occur in our physical universe, due to the nature of the laws of physics.

So -- this is a quite valid point, but not at all a new point — for instance I made that same point in this paper a few years ago  (presented at an IEEE conference on Human-Level Intelligence in Singapore, and published in the conference proceedings... and mostly reprinted in my book Engineering General Intelligence as part of the early preliminary material)…

Now my mathematical formalization of this idea was quite different than Lin and Tegmark’s, since I tend to be more abstract-mathy and computer-sciency than physicsy … what I said formally is

MIND-WORLD CORRESPONDENCE PRINCIPLE: For an organism with a reasonably high level of intelligence in a certain world, relative to a certain set of goals, the mind-world path transfer function is a goal-weighted approximate functor

Formalism aside, the basic idea here is that: If you have a system that is supposed to achieve a high degree of goal-achievement in a world with a certain habitual structure, then the best way for this system to do so using limited resources is to internally contain structures that are morphic to the habitual structures in the world*

I explicitly introduced the example of hierarchical structure in the world — and pointed out that intelligent systems trying to achieve goals in a hierarchical world will do best, using limited resources, if they internally have a hierarchical structure (in a way that manifests itself specifically in their goal-seeking behavior).

Deep neural networks are an example of a kind of system that manifests hierarchical structure internally in this way.

Certainly I am not claiming any sort of priority regarding this general conceptual point, though — I am sure others made that same point way before I did, expressing it in different language...

One also shouldn’t overestimate the importance of this sort of point, though.  Lin and Tegmark point out that "properties such as symmetry, locality, compositionality and polynomial log-probability” come out of the laws of physics, and also are easily encoded into the structure of neural networks. This is all true and good … but of course self-organizing systems add a lot of complexity to the picture, so many patterns in the portion and level of the physical universe that is relevant to us, do NOT actually display these properties… which is why simply-structured neural networks like deep neural networks are not actually adequate for AGI....

Specifically, we may note that current deep neural networks do best at recognizing patterns in sensory data, which makes sense because sensory data (as opposed to stuff that is more explicitly constructed by mind and society) is more transparently  and directly structured via “physical law.”

It's cool to see the popular media, and more and more scientists from various disciplines, finally paying attention to these deep and important ideas....   But as more attention comes, we have to ward off oversimplification.  Tegmark and Lin are solid thinkers and smart people, and they know it's not so simple as "deep neural nets are the key to intelligence because they reflect aspects of the laws of physics" -- and they  may well even know that diverse others have made very similar points to theirs dozens of times over the preceding decades.  Let's just remember these are subtle matters, and there is still much to be understood -- and any one special class of algorithms and structures, like deep neural networks, is only going to be one modest part of the AGI picture, conceptually or pragmatically.  

8 comments:

Jef Allbright said...
This comment has been removed by the author.
Jef Allbright said...
This comment has been removed by the author.
Jef Allbright said...

For the same reason, our most coherent of morality will be hierarchical in the orthogonal domains of values and instrumental methods.

marcalpv said...

Basically we are trying to build an interpolating function between input and output variables. So we observe that intermediate variables aid the process. Why should some variables not be a function of other variables?

Tory Wright said...

If memory serves, you once in an interview questioned correlation between intelligence and fitness. Many of your arguments have me considering the probability that normative complexity over time brings about greater intelligence as a logical conclusion. Your blog is always a good read. Thanks for sharing.

Robin de Lange said...

Excellent reply! Will come back to your blog for my research on VR as a learning tool.

marcalpv said...

Here is an alternative to deep learning that treats it as an interpolating function
least-squares-fe-for-ann

Blogger said...

I've just downloaded iStripper, and now I can watch the sexiest virtual strippers on my desktop.