To follow this blog by email, give your address here...

Saturday, September 05, 2015

The First Earthly Superintelligence Will Be an Incremental Mind-Upload of the Global Brain

In which Dr. Goertzel briefly outlines a new way of thinking about the Global Brain , in terms of the concept of "glocal memory" ... leading up to, at the end, a new suggestion about what the first super-powerful AGI on our planet may look like (to wit: AN INCREMENTAL MIND-UPLOAD OF THE GLOBAL BRAIN)..

Unscientific Prelude...

Recently I was sitting in a late-night dinner meeting in Tsim Sha Tsui, in a private room overlooking the spectacular Hong Kong harbor, with

  • a robot head/face genius
  • a robot body genius
  • an expert on outsourcing manufacturing of complex products to Chinese factories
  • a former toy and entertainment company executive

and discussing the possibility of coming together to make an amazing AI-powered humanoid robot...

After a few glasses of excellent wine, I started thinking about all the trends and patterns that had come together to bring us to that restaurant to discuss that project.   All the trends that had converged to make Guangdong Province the world center of consumer electronics manufacturing.  The factors underlying the current relatively easy availability of mainland Chinese funding for interesting tech projects.   The factors that had brought me to Hong Kong -- Hugo de Garis's position at Xiamen University, Gino Yu's assistance getting funding for OpenCog work at Hong Kong Poly U, the convergence of international bankers in Hong Kong enabling funding of Aidyia Holdings, our AI based hedge fund, etc. etc.

And I started to feel how, for sure, my individual self and my feelings of will and purpose and so forth, were mostly just being swept along by large forces and trends....   Somehow, it seemed, we had all been gathered there via sociocultural forces beyond our control and comprehension, because the universe wanted amazing humanoid robots built, and we were a reasonable team of people to do it....

I fell in love with a girl named Ruiting Lian in Xiamen in 2009, and now being married to her is helping my understanding of the Chinese system in general -- helping me collaborate with Chinese programmers, and do business with Chinese firms.   I met her because Hugo de Garis invited me to Xiamen, where he was because the Chinese university system was more open to his wacky ideas than the US university system. Even the most intimate and personal parts of our lives are driven along by these broader trends, and then help push these broader trends along.

Of course, there is lots of contingency in any particular series of events -- if I hadn't happened to meet RUITING and get so overwhelmingly charmed by her personally, then I might not have relocated to Hong Kong, and lots of other things might be different....   Perhaps my meeting Ruiting may "cause" a revolution in humanoid robotics to happen years earlier, or in a fundamentally better way, than would have happened otherwise...

But on the other hand, if I hadn't moved to Hong Kong and if I had focused my work these last couple years on (say) theorem-proving rather than robotics, then probably someone else would have done comparable robotics work to what I'm now doing.   The same broader trends would have manifested themselves in some other way, via somebody else's will and live and love and dreams and desires and so forth -- probably ... not definitely, never definitely, but probably...

So the question I have been musing about is -- how can we think about all this in terms of the "global brain"?   (i.e., the set of computing and communication and cognition patterns occurring around the globe, among humans and various electronic devices, which arguably has a kind of coordination and dynamics to it that is vaguely "neural" or "cognitive" in nature...)

Can we say that the humanoid robot design we were cooking up over dinner that night, is a thought produced by the global brain?   In what sense?

Glocal Memory

OK, I'm going to get a little bit more technical now -- impatient readers please bear with me!  I'll get back to the qualitative and wacky stuff at the end...

Let me review the concept of "glocal memory" as I've defined it elsewhere,.in the context of human neuroscience and cognition...

In a neuroscience context, "glocal memory"  refers to the hypothesis that, in many cases, memories in the human brain have both

  • a highly localized form (e.g. a memory may correspond to a single neuron or a fairly small set of neurons) 
  • a neurally global form (as a pattern of activity across a large number of neurons occupying a significantfraction of the brain).   

This general notion has been around a while (without use of the word "glocal") -- e.g. William Calvin's work from way back when.

"Glocal memory" overcomes the dichotomy between localized memory (in which each memory item is stored in a single location within an overall memory structure) and distributed memory (in which a memory item is stored as an aspect of a multi-component memory system, in such a way that the same set of multiple components stores a large number of memories). In a glocal memory system, most memory items are stored both locally and globally, with the property that eliciting either one of the two records of an item tends to also elicit the other one.

In a 2013 paper, I gave a moderately detailed, fairly speculative model of glocal memory in the human brain.

The classic example of localized memory in the brain is the Jennifer Aniston neuron.   Many neuroscientists' brains were blown by the discovery of individual neurons in the hippocampus that fire specifically when the face (or voice, or name) of the actress Jennifer Aniston is perceived.

We must remember, though, that these localized memories in the hippocampus are not the ONLY kind of memories in the brain.   The cortex contains memories as well, that appear to be stored in a very different way, involving diffuse activation patterns distributed across large numbers of neurons.

"Glocal memory" in the human brain, then, appears to involve dynamic interaction between localized "concept neurons" in hippocampus, and more distributed memory attractors in cortext.

The presence of these concept neurons gives an interesting potential direction for brain-computer interfacing research.   Why not connect nodes in a neural-semantic network -- like the one inside OpenCog -- to a brain's concept neurons?    Connect the "Jennifer Aniston" node in an OpenCog Atom-network, to the "Jennifer Aniston" neurons in a person's brain?

Actually, OpenCog's internal representations are glocal as well -- OpenCog has a distributed-attractor modality for representing "Jennifer Aniston", as well as a localized-node modality.  But connecting localized representations to one another would seem the easiest way to connect OpenCog to a human brain, enabling human-AGI co-cognition.

The Global Brain

Previously I have described a Global Brain as follows:

The general idea of the Global Brain is that computing and communication technologies may lead to the creation of a kind of"distributed mind” in which humans and AI minds both participate, but that collectively forms a higher level of intelligence and awareness, going beyond the individual intelligences of the people or AI’s involved in it. 

What I will mean by a "Global Brain" here is basically the same -- a self-organizing complex adaptive system that

  1. has many of the same cognitive processes as a human mind ---- e.g. perception, action, reasoning, learning, memory...
  2. includes humans and some of their computing and communication tools within its own substrate
  3. achieves cognition substantially via the dynamics of patterns that are **emergent** in the communication network of humans and hardware/software, rather than just patterns that are easily observable within individual humans (or the tools of particular individual humans)
  4. includes a significant fraction of people in each region of the Earth

Of course point 4, is what  makes it "global", and that criterion is arguably dispensable....  A Global Brain that leaves out North Korea can still be considered a decent Global Brain.   A Global Brain that leaves out, say, China -- or that is only operational in China -- would have basically the same structures and dynamics and properties as one that spanned the whole world.

To what extent a Global Brain currently exists, or will soon come to exist, is a tricky question on which different experts will give different answers.   Concrete measurement of the degree to which a Global Brain is present is not something we really know how to do right now.   Francis Heylighen and his colleagues at the Global Brain Institute in Brussels are working on it!

The Glocal Brain

I guess you can see where I'm going now.   The glocal brain has already put the idea into your unconscious mind, right?

Perhaps we should view the Global Brain as possessing glocal knowledge representation -- both localized and distributed representations of the same piece of knowledge, acting in a coordinated way.

Localized representations have advantages for symbolic thinking -- they are easier to manipulate and combine in certain ways.   Global distributed representations have advantages for creative thinking -- they are easily morphable and content-addressable, and have analogy and metaphor built right into them.   The human brain/mind makes use of both, why shouldn't the global brain?

But what are the localized representations in the Global Brain's mind?   What are the Global Brain's "concept neurons"?

The Global Brain's "concept neurons" are precisely: The ideas that are explicitly formulated in our own conscious minds, and written down in our books and papers and emails and so forth.

And the Global Brain's distributed cortex-type memories, are the subtle trends and broad patterns that guide us throughout our lives in barely-perceptible ways.   

Of course our individual human actions are not strictly determined by the patterns constituting the Global Brain's distributed memories.   We may not have "free will" in a classical sense (an ill-defined notion), but we have something related called "natural autonomy" -- in a pragmatic sense, we can choose our actions.  But our choices are heavily guided by the broader patterns in which we live -- in a way that, statistically, causes us to behave like vehicles for the Global Brain's concept neurons ... most of the time...

The Global Brain differs from a human brain/mind in many ways -- it lacks an extremely pushy, particular goal system, like we humans get from our hindbrains.   In the terminology of Weaver and Viktoras from the Global Brain Institute, it is more strongly an open-ended intelligence -- a self-organizing complex system ongoingly creating new patterns and exploring and overcoming its own boundaries, rather than a system oriented toward optimizing achievement of certain specific goals.   But the glocality of the Global Brain's memory is one aspect in which it resembles human brains.

In the end human brain/minds are open-ended too -- but we can at least roughly approximate human brain/mind dynamics by thinking about optimization toward achievement of specific goals.   For the Global Brain, this kind of analysis isn't even an OK approximation, it's jut wrong.    An open-ended intelligence type model is utterly necessary from the beginning.

What was happening in that dinner in Tsim Sha Tsui, then, was -- the Global Brain was trying to enshrine some patterns from its distributed, global memory store, in its localized memory store.   It was trying to pack some broad, diffuse, distributed patterns into "concept neurons" -- i.e. into the brains and writings and software and hardware creations of the humans sitting around the table, and all their employees and colleagues.

If the Global Brain succeeds in doing this, it will then guide the creation of new and ever more amazing humanoid robots -- which will then guide the creation of yet more subtle, subterranean patterns driving peoples' actions in an unconscious yet exquisitely coordinated way.   This is the feedback between global and local memory, the essence of glocal memory.

The First Super-Smart AGI Will Be a Mind-Upload of the Global Brain

AGI systems capable of aggressively recognizing patterns in the activity of humans and AGIs and narrow AIs and other software and hardware on the planet, may be able to serve as higher-capacity concept-neuron networks than people are -- and may  massively accelerate the ongoing, iterated process of concretizing global memory patterns into localized patterns.  

And this leads on to one more idea I'd like to throw at you before I leave the computer for the evening, and go crawl into bed to watch a movie with Ruiting....   Maybe the first AGI will not be a mind-upload of any human, nor a purely engineered intelligent mind -- maybe the first AGI will be a mind-upload of the Global Brain.

Indeed, once this is said it seems almost trivial and obvious.   Of course it will be.

The Global Brain is far more intelligent than any individual human.  And AGIs are not going to be developed in a box in somebody's basement, or in some top-secret military or corporate lab -- they're going to be out there on the Internet, enhancing everyone's searches and analytics and so forth ... making the Global Brain smarter and smarter.

As AGIs gradually make the Global Brain smarter and smarter, they will also gradually make the Global Brain more and more AGI-centric, with human participants playing less and less of a critical role as the AGIs get smarter.

Before long, then, the Global Brain will morph into its own mind-upload.   Via gradual replacement of the human "concept neurons" with AGI "concept neurons."

(yeah, yeah... of course there are other possibilities.   An isolated AGI in a secret lab could massively self-improve and achieve a secret Singularity and then wreak its good or ill upon the world....  But the odds of this seem rather low to me, in spite of the preoccupation Bostrom, Yudkowsky and others have with this outcome....)

What this incrementally-uploaded Global Brain will then morph into, as it ongoingly learns and self-improves -- and encounters aspects of reality the human-based Global Brain never imagined -- is another question.   We can't know and our current Global Brain can't know, any more than cockroaches or cavemen could predict the course of the Internet.

And. So. It. Goes....

Is a Human Borg-Mind Inevitable?

There's a well-known rule that if an article has a headline in the form of a question, the answer is NO.

This article isn't really an exception ;-)

Is a human borg-mind inevitable?   The answer, I think, is kinda ...

A "borg" mind, as popularized by the classic Star Trek episode, is a group of people all controlled by a single collective will, consciousness and memory.    It's obviously an extreme invented for entertainment purposes.   A more common term is "hive mind", but there are many kinds of hive minds, of which the Borg Collective in Star Trek is a particular variety.

What I do think is very likely is that individual humans get sidelined via the emergence of some sort of mindplex ... a term I introduced over a decade ago to indicate a network of minds that has its own emergent consciousness, will, memory and individuality -- yet also allows the individual minds in the network to have these aspects on their own.

"Mindplex" is a very broad concept and encompasses options that are more "society of individuals"-like alongside some that are more borg-like.

Why do I think the dominance of mindplexes over 2015-style human individuals is very likely?

In short, because of: Brain-computer interfacing (BCI), the inefficiency of current means of communication, and the human love of togetherness and socializing.

What could stop mindplexes from becoming dominant?   Apart from horrible global calamities, the most likely thing to stand in their way would be the very rapid emergence of advanced AGI, alongside with the capability for humans to upload and fuse with advanced AGI.    If something more advanced than human-based mindplexes emerges before BCI gets to the point of enabling powerful mindplexes, then all bets regarding mindplexes are off....

Let me share a little more of what I've been thinking....

Why Many Businesses Stay Small

Reading Why Information Grows by Cesar Hidalgo, I was pleased with his summary of ideas regarding why companies tend to become less efficient when they expand beyond a certain size.   Basically, following prior research of others, he attributes this to the cost of building links (links between people or companies in this case).

In the business world, building links between individuals in different companies is costly, because it requires lots of negotiation, legal overhead, etc.    Linking between different individuals in the same company is generally cheaper.  

Yet, when a company becomes too big, this is no longer necessarily the case.   In a big company, it's often easier for a department to outsource work to an external contractor, than to deal with another department of the same company.   This may occur because of complex internal politics, or simply due to the bureaucracy that seems to inevitably spring up when a company grows beyond a certain size.

When the cost of building the internal links needed to get something done exceed the cost of getting the same thing done using external links, then a company may stop growing in size and begin to grow in capability via networking with external entities instead.

Further, one thing we see happening in the tech biz world now is: the cost of linking between different companies, or companies and external individual contractors gets cheaper.   This is the move toward a "gig economy", as it's been called.   Cheaper links between organizations will tend to lead to smaller, leaner organizations.

Beyond Economics

One thing that I kept wondering while reading Hidalgo's book, though, is why our society is dominated by organizations that are glued together by ECONOMIC transactions.

I mean, economic interactions are important, but they are not the only kinds of links between people.  There are also emotional and relational links, intellectual links, spiritual links, and so on.   Yet it's organizations based on economic links that are currently dominant.

The obvious conclusion is that organizations based on economic links are currently so powerful, in large part, because economic links are so easy to form.    More easily formed links tend to lead to bigger organizations, as Hidalgo points out.  And bigger organizations, on the whole, have more potential to exert power....

What could change the landscape fundamentally, then, would be if other kinds of links became much easier to form.

Forming links based on, say, friendship or sexual relationship or intellectual interchange or shared goals is currently much more difficult and time-consuming than forming links based on monetary exchange.    So groups founded on other sorts of exchange are going to be smaller and less able to grow rapidly than groups founded based on economic exchange.   Given the current state of things.

I experience this phenomenon quite concretely in my AGI work.   It's possible to pull together great contributors for a science or engineering project without paying anyone -- just by recruiting people with a common goal and vision and building a shared feeling and community among them.   But in many ways this is MUCH MUCH HARDER than simply hiring and paying people.

Of course, a great community of unpaid contributors can have a self-organizing, self-motivated aspect that MOST groups of employee collaborators won't have.   But it's also possible to get great collaboration and enthusiasm among a paid team -- if you hire the right people who gel with each other....   Once a non-monetary link with a project contributor is made, it can have great persistence (or it can evaporate when the person's life situation changes and they suddenly need to spend more time earning an income).   But forming non-monetary links just tends to be a lot slower, whereas hiring a contractor is almost instantaneous these days, with sites like Upwork and Elance.

But what if we had, for instance, brain-computer interfacing technology?

Brain Computer Interfacing and Social Self-Organization

What if we had BCI hooked up to allow different people to directly interface with each others brains?

Real BCI tech will presumably be a lot less picturesque
and cool-looking than this, but hey...

What if you could drink from the firehose of somebody else's mind? -- directly suck in their thoughts or feelings?   This isn't possible yet, but first steps are being taken in the lab already.   Would this sort of multidimensional exchange, manifested more fully, make it easier for networks of people to establish other sorts of mutually valuable relationships beyond "merely" economic ones?   I would tend to think so....

And networks of people that cohere together based on deeper forms of exchange -- intellectual, emotional and spiritual -- are likely to be much more effective than networks cohering based on economic exchange.   Encoding information about needs, desires and motivations in economic terms is terribly inefficient, really.

Paying an employee to align their goals with one's own, is of meaningful yet erratic effectiveness.   Spelling out one's needs and desires to a subcontractor in a requirements specification coupled with a legal contract, is always a terrible oversimplification of one's actual needs.

How much better to have a collaborator who really gets one's goals at the deep level, or a subcontracting organization that understands one's requirements at a deep and intuitive level.   And these things happen sometimes.  But what if they could happen systematically?

For this reason I think BCI will be the death of corporations -- they will simply pale in effectiveness compared to networks of people that self-organize based on deeper kinds of exchange than the economic.   But the implications are much broader than this.   BCI may also lead -- perhaps quite rapidly -- to the obsolescence of individuals as we know them.

Between Self and Borg, Mindplex

How much of modern culture is focused on exalting the joy and moral value of "coming together."   Lovers who feel and act as one; parents who give their all for their children; work teams that act in harmony (e.g. agile software teams), thus achieving much more than the sum of the parts.   BCI could enhance all these things -- lovers could really be in each others' minds, minimizing misunderstandings; work teams could share thoughts directly, avoiding all sorts of communication bottlenecks....

Most of the use we get out of Internet and computing tech these days, is oriented toward communication.  With Facebook, SMS, video-chat and all the rest, we ensconce ourselves in interaction with others as richly and constantly as we can.    If BCI were rolled out, it would immediately be applied to various forms of brain-to-brain social networking.   Sufficient use of this kind of technology will cause brains to adapt physiologically to BCI-powered neurosocial networking.

So  -- Will this make us a borg?  Not exactly.  But it will make us part of something new, a new kind of mindplex, something between present-day notions of individual and society.

Incipit Homo Mindplexicus

A true undifferentiated borg mind is unlikely to be optimal as a problem-solving system, for the same reasons that island models work in genetic algorithms (and why OpenCog's evolutionary program learning component, MOSES, works by evolving distinct "demes" of programs).    Given realistic resource constraints, one often gets more innovation by letting different pools of resources evolve somewhat independently.  The overall system can then choose the best (by its own explicit or implicit criteria) of what the various somewhat silo'd off subsystems have created or discovered.

So one fairly likely-looking possibility is that, after the emergence of powerful BCI: Instead of individuals looking out for their own personal good, and banding together into organizations based crudely on economic exchange -- we will have networks of tightly bound group-minds, interacting based on directly exchanging goals, values and ideas ... and periodically re-shuffling or merging within a broader network of mindplex-like emergent intelligent patterns.

One big question , though, is how this will interrelate with advances in AGI.   The same tech that will let us network our minds together, will let us execute Google search queries and access calculators and general software programs from within our minds.   The same tech will also let us share thoughts with any AGI software that exists at a given point in time.

A Few Plausible Scenarios

For sake of having an interesting discussion, let's assume a positive post-Singularity world where humans have options and choices (see my chapter Toward a Human-Friendly Post-Singularity World in The End of the Beginning for a more detailed discussion of this sort of world; free PDF version here).

Once AGIs are much  more cognitively powerful than humans, then any human mindplexes that exist will, just like human minds, need to decide how far they want to fuse with these AGIs.    Full-on fusion with AGIs will likely reduce the human component of any individual or mindplex to relative insignificance relative to the more powerful AGI component.

So various scenarios are possible :

  • Advanced AGI comes before advanced BCI.   Then the only people who fuse into mindplexes, rather than fusing with AGIs, are ones who value humanity but not individuality.
  • Advanced BCI comes before advanced AGI.   Then human mindplexes will form, and various whole and partial mindplexes will make their own decisions about fusing with AGIs
  • Advanced AGI and advanced BCI come about at around the same time.   Then things really get complexicated!

Yadda yadda ... interesting times ...

Friday, September 04, 2015

Growing Psychic Mini-Brains

I was mildly distressed recently to find that our Ethiopian company iCog Labs -- recently written up nicely in Techonomy -- may not be the weirdest tech company in Addis Ababa.   

Scientific Revolution Earth, also in Addis, is centered around novel designs for Zero Point Energy -- which, in the spectrum of out-there science, may beat iCog's thrust toward Artificial General Intelligence, humanoid robotics and radical life extension...

Never one to be outdone, I started thinking about what way-out-there projects we could try at iCog to recapture the Ethiopian Mad Science crown!! ....

Probably many of you have seen recent news about biologists growing"mini-brains" in the lab....   This is really cool and would seem to open up all sorts of possibilities.   Among them, obviously: GROWING PSYCHIC MINI-BRAINS !!

Last year Damien Broderick and I co-authored a book summarizing some of the empirical evidence for "paranormal" phenomena such as ESP, precognition, psychokinesis and so forth.    I won't re-tread that information about the evidence for psi here, you can look it up if you're curious!  I gave some other links on the topic here as well.

What I want to wonder about here is:  
Given that psi is presumably mediated by some neural process, I wonder if one could grow a mini-brain specifically to be a psychic (e.g. precognitive) hunk of
neural matter?

I mean: If particular (perhaps quantum) properties of neural structure/dynamics somehow play a key role in psi, it might be possible to create neural matter that leverages these properties better than the human brain does...

Suppose one could induce some sort of Hebbian reinforcement learning (long-term potentiation, etc.), perhaps one could guide the self-shaping of a biological neural network via rewarding it for correctly predicting the future....  Then one might be able to create a precognitive hunk of neurons, even without understanding exactly how the hunk of neurons organized itself to become precognitive (and then one could study it to see how it worked)

I bounced his idea off some psi-researcher friends and they were somewhat positive, but suspected it would work best if the mini-brain were attached to some sort of body.

Pumping blood thru the mini-brain via an artificial heart would add some extra rhythmicity, which is important according to some views of the underpinnings of psi.

More adventurously, one could put the mini-brain in a simple robot body with a low-fi camera eye and some wheels ... and a motivational system that rewards it based on some simple stimulus pattern, say finding the color red in its environment...

Then make the mini-brain experience a reward when it sees, say, the color red -- and put it in situations where seeing the color red is much more likely if it does a bit of precognition...

This weird cyborgic organism would have a strong selective pressure for psychic "first sight" -- just as James Carpenter suggests many animals do...

Yyyeaahhh -- Now that's weird science, folks!

Alas I don't currently have time to pursue this fascinating speculative direction, nor wealth to fund someone else to do it.   But it's interesting to think about!   

If you give it a try, let me know what you find ;-)

Friday, July 10, 2015

Life Is Complexicated

I grew up, intellectually, drinking the Complexity Kool-Aid from a big fat self-organizing firehose.

Back in the 1980s, I ate up the rhetoric, and the fascinating research papers, emanating from the Santa Fe Institute and ilk.  The core idea was extremely compelling — out of very simple rules, large-scale self-organizing dynamics can give rise to extraordinarily complex and subtle phenomena … stuff like stars, ecosystems, people,… the whole physical universe, maybe?

Fast forward a few decades and how does the “complexity” paradigm feel now?

(for simplicity, in the rest of this blog post I will use the word “complexity” to refer to Santa Fe Institute style, self-organizing-systems-ish “complexity”, rather than other meanings of the word)

Artificial life hasn’t panned out all that spectacularly — it has led to lots of cool insights and funky demos, but in the end attempts to get really richly-behaving life-forms or ecosystems to self-organize out of simple rules in the computer haven’t gone that well.

In AI, simplistic “complexity” oriented approaches — e.g. large, recurrent neural networks self-organizing via Hebbian learning or other local rules; or genetic programming systems — haven’t panned out insanely well either.  Again, research results have been obtained and a lot has been learned, but more impressive progress has been made via taking simple elements and connecting them together in highly structured ways to carry out specific kinds of learning tasks (e.g. the currently super-popular deep learning networks).

What about modeling economies, physical or chemical systems, ecosystems, etc.?   “Complex systems” style computer simulation models have provided insightful qualitative models of real systems here and there.  To some extent, the early message of the Santa Fe Institute and the other early complexity pioneers has simply diffused itself throughout science and become part of standard practice.   These days “everybody knows” that one very important way to understand complex real-world phenomena is to set up computer simulation models capturing the key interactions between the underlying elements, and run the simulations with various parameter values and look at the results.

Universal Laws of Complexity?

But back in the 80s the dream of complexity science seemed to go well beyond basic pragmatic lessons about running computer simulations, and high level discussions of properties shared by various systems in various contexts.  Back in the 80s, there was lots of talk about “universal laws of complex systems” — about using simulations of large numbers of very simple elements to understand the rules of complexity and emergence, in ways that would give us concrete lessons about real-world systems.

The great Stephen Wolfram, in 1985, foresaw cellular automaton models as a route to understanding “universal laws of complexity”   … decades later his book “A New Kind of Science” pushed in the same direction, but ultimately not all that compellingly.

I myself, with my occasional (er) grandiose tendencies, was a huge fan of this vision of universal laws of complex systems.   I even tried to lay some out in detail, pages 67-70 of my 2001 book “Creating Internet Intelligence”,

And one still sees mention of this idea here and there, e.g. in 2007

“The properties of a complex system are multiply realisable since they satisfy universal laws—that is, they have universal properties that are independent of the microscopic details of the system.”

But, truth be told, the “laws” of complexity found so far are just not all that law-like.  The grand complexity-science vision has panned out a little, but more complicatedly than anticipated.   Certainly broad self-organization/emergence based phenomena common to a huge variety of real-world complex systems have been identified.  Phase transitions, small world networks, strange attractors, self-organized criticality and so forth are now, simply, part of the language of science.   But these are more like “common phenomena, existing alongside all the other known phenomena characterizing systems, and manifesting themselves in various systems in various subtle and particular ways” — not remotely as law-like as, say, the “laws” of physics or chemistry.

(“Law” of course is a metaphor in all these cases, but the point is that the observational patterns referred to as physical or chemical “laws” are just a lot more solidly demonstrated and broadly applicable than any of the known properties of complex systems…)

Why So Complicated?

So why has the success of complexity science been so, well, complicated?

Some would say it’s because the core ideas of complexity, emergence, self-organization and so forth just aren’t the right ones to be looking at.

But I don’t think it’s that.  These are critical, important ideas.

Rather, I think the correct message is a subtler one: Real-world systems aren’t just complex, in the Santa Fe Institute sense of displaying emergent properties and behaviors that self-organize from the large-scale simple interactions of many simple elements.

Rather, real-world systems are what I’ll -- a bit goofily, I acknowledge -- call “complexicated”.

That is: They are complex (in the Santa Fe Institute sense) AND complicated (in the sense of just having lots of different parts that are architected or evolved to have specific structures and properties, which play specific roles in the whole system).

A modern car or factory is a complicated system - with many specialized parts, each carefully designed to play their own role.

Conway’s Game of Life (a popular, interesting cellular automaton model), or a giant Hebbian recurrent neural net, is a complex system in the SFI sense — it has emergent high-level properties that can ultimately be traced back to zillions of simple interactions between the simple parts.  But doing the tracing-back in detail would be insanely complicated and computationally resource-intensive.

On the other hand, a human body, or a modern economy, or the Internet, combines both of these aspects.   These are complicated systems, with many specialized parts, each carefully created to play their own role — yet key aspects of the roles that these parts play, involve their entrainment in complex emergent dynamics that can ultimately be traced back to zillions of simple interactions between simple parts (but doing the tracing-back in detail would be insanely complicated and computationally resource-intensive).

These kinds of “complexicated” systems lack the elegance of a well-designed car or factory, and they also lack the elegance of Conway’s Game of Life or a Hopfield formal neural network.  They are messy in a lot of different ways.  They have lots of specialized parts all working together, AND they have complex holistic dynamics that are hard to predict from looking at the parts, but that are critical to the practical operation of the parts.

Why does our world consist so much of this sort of perversely complexicated system, instead of nice elegant well-organized systems, or simplistic SFI-style “complex systems” models?   Because when dealing with severe resource constraints, evolutionary processes are going to make use of Any Means Necessary (well, any  means they can find within the searching they have resources to do).  Both self-organizing emergence and well-organized factory-style organization are effective ways of making big systems do difficult things.   If they can be gotten to work together in the same system, sometimes that’s even better.

The simple, uncomplicated self-organizing systems that the SFI-style "complexity science" likes to study, are not generally capable of giving rise to interesting phenomena given realistic amounts of resources.  That's a bit inelegant, but it's a cost of living in a universe that imposes such severe resource constraints on its residents.  To get interesting complex-self-organization-ish phenomena in reality, one generally needs to interweave some complicatedness with one's complexity.  Which means that one obtains systems whose behavior is a mixture of universal complex-systems properties, and highly specific properties resulting from complicated particulars.   Which is either ugly and messy or weirdly beautiful or completely plainly normal and real, depending on one's perspective!

Life and Mind are Complexicated

The all-knowing Urban Dictionary defines “complexicated” as

Something so complex, it's not enough to say it's complicated.

Girl 1: So how are things going with you and that new guy you're seeing?

Girl 2: I don't know, things are really complexicated with us. I'm not sure where things are going.

which isn’t exactly the meaning I’m using here, but I figure it’s aesthetically reasonably enough in synch.

Of course, followers of my AI work will have already anticipated my closing comment here.  The OpenCog AGI design I’ve co-created and am currently working on, combines SFI-style complexity with complicatedness in various subtle ways, some of which can be frustrating to work with at times.

I have spent a fair bit of time trying to figure out how to make a fundamentally simpler AGI design with the smell of “hey, this could work too” — but I haven’t succeeded at that, and instead have kept pressing ahead with OpenCog, along with some great colleagues.   If the line of thinking in this blog post is correct, then the search for a “fundamentally simpler” design may be misguided.   Getting rid of either the complexity or the complicatedness may not be possible.

Or in short, OpenCog is complexicated ... human minds and bodies are complexicated ...  the economy is complexicated ... the Global Brain is complexicated ... Life is complexicated.


-- And hey, even if the word doesn't have legs (or has complexicated legs -- yeesh, that sounds like an unfortunate disease!!), the underlying concept is important! ;-)

Some Interesting Comments

I posted a link to this post on the Global Brain and AGI email lists and got some interesting responses, e.g.

From Weaver (David Weinbaum):

Yes, I resonate with Ben's observations. It seems that real complexity defies universality as a principle.

Whenever we can describe a phenomenon so that its local particularities are easily decoupled from universal patterns e.g. describing a classic mechanical system in terms of the Hamiltonian and its initial conditions, this is not a complex phenomenon. I would also add to the list of complexicated systems those systems where statistical descriptions do not contribute a lot to their understanding.
Things that seem to be characteristic to complexicated systems are:

  1. Heterogeneity of the elements at various scales.
  2. Diversity of properties and functions.
  3. Degeneracy - every property and function has multiple ways of realization.
  4. 2^3. Very different structures realizing very similar functions while very similar structures may realize radically different functions. (I call it transductive instability, a concept I am working on developing). This seems to be a major key to the evolution of complex systems. 
  5. Variable range correlations - Local interactions may have global effects and vice versa, global patterns may affect local interactions. In other words, it is often hard or entirely impossible to clearly delineate distinct scales within such systems. 
  6. Contingency - certain behaviors are contingent and unpredictable.    

At least some of these are examined in some depth in two papers written by me and Viktoras Veitas that uses the theory of individuation to tackle complexity of the complexicated kind in the context of intelligence and cognition:

From Francis Heylighen:

Ben makes a number of correct observations here, about  truly complex systems (which he calls "complexicated") being more than ordered patterns emerging out of simple, homogeneous agents and rules. In practice, evolution works by developing specialized modules, which are relatively complex systems highly adapted to a particular function of niche, and then fitting these modules together so as to combine them recursively into higher-order systems. This leads to a hierarchical "architecture of complexity", as envisaged by Herbert Simon, where you find complexity at all levels, not only at the top level.

The picture of Simon is still too simple, because the modules are in general not neatly separable, and because sometimes you have distributed patterns of feedback and coordination that exploit the local capabilities of the modules. But I agree with Ben that the old "Santa Fe" vision of deterministic "laws of complexity" that specify how simple rules produce emergent patterns is equally unrealistic. The combination of the two, as Ben seems to propose, is likely to be more fruitful.

My own preferred metaphor for a complex adaptive system is an ecosystem, which consist of an immense variety of complex organisms, from bacteria to bears and trees, and assemblies of such organisms, interacting non-linearly which each other and with the somewhat simpler physical and chemical processes of climate, resource flows, erosion, etc... The components of such a system have co-evolved to be both largely autonomous, and mutually dependent via intricate networks, producing a truly "complexicated" whole.

From Russell Wallace:

Good article! Or, put another way:

  1. The Santa Fe school implicitly optimizes for smallness of source code versus aesthetic interestingness of results.
  2. Biology optimises for ease of creation by evolution versus performance.
  3. Technology optimises for ease of creation by human engineers versus performance.

Looked at that way, it makes sense that 1 isn't a good model for 2 or 3.

Tuesday, June 16, 2015

Why Occam’s Razor Works

(Sketch of a Possible Explanation Why Occam’s Razor Works...)

(Though motivated by deep questions in philosophy, this is a speculative math-y blog post; non-technically oriented readers beware…)

How can, or should, an intelligent mind make sense of the firehose of complex, messy data that its sensors feed it?    Minds recognize patterns, but generally there are many many patterns in the data coming into a mind, and figuring out which data to pay attention to is a significant problem.   Some major aspects of this problem are: Figuring out which of the patterns that have occurred in one context are more likely to occur in other similar contexts, and figuring out which of the patterns that have occurred in the past are more likely to occur in the future. 

One informal principle that seems broadly useful for solving this “pattern selection” problem is “Occam's Razor.”   This principle is commonly taken as a key ingredient in the scientific method – it plays a key role in many philosophies of science, including the “social computational probabilist” philosophy I’ve presented here and in The Hidden Pattern.

Occam’s Razor has been around a while (well before its namesake, William of Ockham) and has been posited in multiple forms, e.g.:

“Nature operates in the shortest way possible” -- Aristotle, BC 384-322

“We consider it a good principle to explain the phenomena by the simplest hypothesis possible.”  -- Ptolemy, c. AD 90 -  168

“Plurality must never be posited without necessity” -- William of Ockham, c. 1287-1347

“Entities must not be multiplied beyond necessity” -- John Punch, 1639

“Everything should be as simple as it can be, but not simpler” --  Albert Einstein (paraphrased by Roger Sessions)

The modern form of the Razor, as used in discussions of scientific methodology and philosophy of science, could be phrased something like:

Given two explanations that explain the same data, the simpler one should be preferred

or as Sklar (1977) phrased it,

In choosing a scientific hypothesis in light of evidential data, we must frequently add to the data some methodological principle of simplicity in order to select out as ''preferred'' one of the many different possible hypotheses, all compatible with the specified data.

This principle is often taken for granted, and has a certain intuitive ring of truth to it -- but why should we actually accept it?   What makes it true? 

Arguments for Occam’s Razor

Perhaps the most compelling argument for Occam's Razor is the theory of Solomonoff Induction, generalized more recently by  Marcus Hutter into a theory of Universal AI.   This theory shows, very roughly speaking, that the assumption that the shortest computer program computing a set of data is the best explanation for that data, where "best" is defined in terms of accurate prediction of missing or future parts of the dataset.  This is very elegant but the catch is that effectively it only applies to fairly large datasets, because it relies heavily on the fact that, in the limit of large datasets, the shortest program explaining the dataset in one programming language is approximately the same length as the shortest program explaining that same dataset in any other programming language.   It assumes one is in a regime where the computational cost of simulating one programming language (or computer) using another is a trivial matter to be brushed aside.

There are other approaches to justifying Occam's Razor as well.   The Akaike Information Criterion (AIC) formalizes the balance between simplicity and goodness-of-fit that is required to achieve extrapolation without overfitting.  However,  the derivations underlying the AIC and its competitor, the Bayesian Information Criterion (BIC), hold only for large datasets.  The AICc version works for small datasets but relies on special distributional assumptions.

There is also Kelly's (2007, 2010) interesting argument that the shortest hypothesis is, under certain assumptions, the one that will require one to change one's mind least often upon exposure to new data.   This is interesting but somewhat begs the question of why it's so bad to change one's mind when exposed to new data.  Kelly's proofs also seem to get bogged down in various technical intricacies and conditions, which  may however be ultimately resolvable in a clean way.

Here I present a new  argument for Occam's Razor, which appears to work for small as well as large datasets, and which is based on the statistical notion of subsampling.   At present the argument is just a sketch, yet to be filled out into a formal proof.  In essence, what is done is to construct a particular computational model, based on the properties of a given dataset, and argue that using Occam's Razor relative to this sort of computational model leads to good explanations for any dataset, large or small.   As the size of the dataset increases, the explanatory advantage gained by choosing a dataset-guided computational model for using Occam's Razor decreases, and the choice of computational model becomes increasingly arbitrary.

Sketch of a New Argument why Occam’s Razor Works

I’ve been thinking for a while about a new, somewhat different argument for why Occam’s Razor makes sense and is a good idea.   I haven’t found time to write up a formal proof , so I’m just going to sketch my “proof idea” here.   Eventually maybe I’ll find time to try to turn this into a real proof, which may well yield to some gotchas, restrictive conditions or new discoveries….  Or maybe some other brave and noble soul will take the bait and try to make a real proof based on my idea…
Ok, so -- the crux of the argument I’m thinking of is as follows.

Consider, for simplicity, a binary classification problem, involving  a data-set D living within a data universe U, and a a "ground truth" mapping F: U --> {0,1}.    This is not the only context my proposed argument applies to, but it’s a simple context for explaining the basic idea.

Then, consider two sets of functions S1 and S2, both learned via application of some learning algorithm L to study D (and not the rest of F).   Suppose:

  • The functions in S1 have accuracy a across D, and have size s1
  • The functions in S2 have accuracy a across D, and have size s2 > s1

Then the proposition I make is: On average, functions in S1 will give a higher accuracy on F than functions in S2 (**).

Why would (**) be true?

I believe it can probably be demonstrated via induction on the size of D  

Suppose (**) holds for D^* that are smaller than D.   Then, suppose we apply crossvalidation to assess the value of functions in S1 and S2; that is, we run a series of experiments in which we partition D into 2 subsets (D1, D2) and then apply L to learn classification functions on D1, and test these functions on D2.   These experiment will yield many functions that don't belong in either S1 or S2, and also some that do belong to these sets.   According to the inductive hypothesis: on average the functions L finds (on the validation folds) belonging to  S1 will have greater accuracy across D as compared to those that L finds (on the validation folds) belonging to S2 (***).  

But then from (***) and the basic theory of crossvalidation (which says that hypotheses doing better on the test portions of validation folds, tend to do better out of sample), we derive (**).

The question then becomes how to start the inductive hypothesis off.   What is the base case?

Well,  one possibility is for the base case to be the situation where D contains two elements d0 and d1, with F(d_0) = 0 and F(d_1)=1.   To understand this case, suppose that the data elements d (in D) each have k internal features.   Based on comparing d0 and d1, there is no way for L to identify dependencies between the different internal features.  Thus, the models most likely to give good accuracy on F are single-feature models.  But these are smaller than any other models.  Thus in this (somewhat degenerate) base case, smaller models are better.

I think this extreme case reflects a more general truth: When D is very small, models that ignore dependencies are more likely to give high accuracy, because it’s generally hard to identify dependencies based on small datasets.

So there you go – Bob’s your uncle!

The crux of the argument is:

  • Simpler models are better for extrapolating from datasets of size N, because simple models are better for extrapolating from size N/k, and crossvalidation theory says that models working better on data subsets are better at working on the whole dataset
  • Simpler models are better for extrapolating from very small datasets because it’s not possible to meaningfully extrapolate dependencies between variables based on very small datasets, and models that treat variables as independent and don’t try to model dependencies, are intrinsically simpler

Dependency on the Expressive Language

The above is admittedly a sketchy argument at this point, and more rigorous analysis may expose some holes.   But, provisionally taking the basic argument for granted, it’s worth asking what the above argument says about the language in which models are expressed.  

The main constraint seems to come from the base case: we need an expressive language in which modeling a dataset in a way that ignores dependencies, is generally more concise than modeling a dataset in a way that takes dependencies into account.   There may also be aspects of the “crossvalidation theory” invoked vaguely above, that depend in some way on the specifics of the expressive language.

While vague and still speculative, this seems promising to me, e.g. as compared to the Solomonoff induction based foundation for Occam’s Razor.   In the Solomonoff approach, the judgment of “what is simple” displays a strong dependence on the underlying Universal Turing Machine, which becomes irrelevant only for large N.  But a lot of everyday-life pattern recognition seems to be best considered in the “small N” context.    A lot of human pattern recognition does seem to depend on what “expressive language” the human mind/brain is using to represent patterns.  On the other hand, in the present approach, the dependency on the expressive language seems much weaker.   “What is simple” seems to be mostly -- What is judged simple by an expressive language in which: models that ignore dependencies are simpler than those that incorporate them….

So What?

What’s the point of this kind of argumentation?  (apart from the copious entertainment value it supplies to those of us with odd senses of aesthetics and humour, that is... ;D )

The point is that Occam’s Razor, valuable as it is, is a vague, hand-wavy sort of principle – but given its very central importance in the philosophy of mind and of science, it would be very nice to have a more precise version!    

Among other goals, a more precise version of the Razor could provide useful guidance to AI systems in analyzing data and AGI systems in thinking about their experiences.


A Few Quasi-Random References

  title = { Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach},
  publisher = {Springer},
  year = {2002},
  author = {Burnham, K P and Anderson, D R},

 author = "Hutter, Marcus",
 title = "Universal {Artificial} {Intelligence}: {Sequential} {Decisions} based on {Algorithmic} {Probability}",
 publisher = "Springer",
 year = 2005

  author = {Kevin T Kelly and Conor Mayo-Wilson},
  title = {Ockham Efficiency Theorem for Stochastic Empirical Methods },
  journal = {Journal of Philosophical Logic 39: pp. 679-312. },
 year = {2010},

  author = {Kevin T Kelly},
  title = {Ockham’s Razor, Empirical Complexity, and Truth-finding Efficiency”  },
  journal = {Theoretical Computer Science },
 year = {2007},

  title = {Space, Time, and Spacetime,},
  publisher = {Berkeley},
  year = {1977},
  author = {L Sklar },