Monday, August 27, 2012

Finding the "Right" Computational Model to Support Occam's Razor

(This is a rather abstract science-y post, beware....  If you don't like theoretical computer science, you'll be bored and/or baffled....  But if you do -- or if you like being bored and/or baffled, read on!  These are some technical speculations that I believe may have interesting conceptual/philosophical implications....)

One question I've been thinking about a lot lately is: What is the "right" computational model?

Occam's Razor is a powerful tool in AI, both in machine learning / data mining applications, and in a more ambitious AGI context.   But the formulation of Occam's Razor, in a practical computer science context, contains a certain arbitrariness.

Occam's Razor says to prefer the simpler hypothesis; or more subtly, to prefer hypotheses, in some sense, proportionally to their simplicity.

But what's "simple"?

An old friend of mine, Mariela Ivanova, once said: "Simplicity is so complex, I couldn't understand it."

Sometimes folks define "simplicity of X" as "length of the shortest expression of X in a specified programming language P" -- or something like that.   But of course this is too crude, so there are variations, such as "runtime of the program giving the shortest expression to X in language P"; or various weighted combinations of length and runtime as Juergen Schmidhuber and colleagues have explored in their work on "frontier search."

One can also look at a "total computational cost" simplicity measure such as "the minimum amount of space and time resources needed to compute X" on a specific machine M.

The program-length simplicity metric is related to the Solomonoff-Levin measure on program space, which defines the probability of a program as inversely related to the length of the shortest program computing it.  Variant simplicity metrics lead to other measures on program space

All this is very nice and elegant, to write about and to prove theorems about.  But it all depends on a seemingly arbitrary assumption: what is the programming language, or machine, involved?

There's some nice old theory saying that, in a sense, for the case of "shortest program length," this choice doesn't matter.  Any universal Turing machine can simulate any other one, given a constant overhead.

But in practice, this constant may be very large, and it sometimes does matter.

And for other simplicity measures, such as "total computational cost", no comparable theories are known.

To complete the story of the computer science basis for Occam's Razor, it seems what we need is a theory of what is the "right" programming language or machine; what is the "right" computational model.

Of course, this comes down to deciding what "right" means.

What I'd like to find are some nice theorems of the form: IF you want a programming-language/computational-model satisfying simple, elegant criteria A-B-C, THEN you  must use a programming-language/computational model essentially equivalent to P*.

If we had a good theory of this nature, then the choice of which programming-language/computational-model to use, would come down to choosing criteria A-B-C.

I looked around a bit and couldn't find any really great theoretical results of this nature.   But it's possible they exist and I just didn't figure out the right terms to search for.

My intuition is that: one can formulate some simple, elegant criteria that mathematically imply the right computational model is some sort of parallel graph rewriting.

In other words, I suspect there will be some simple, elegant criteria that are equivalent to the use of something vaguely like core Haskell.

What kind of criteria am I thinking of?  I know I'm getting a little fuzzy at this point, but I'm thinking about stuff like: "Computing W and computing f(W) should take the same amount of space and time resources," for cases where it seems intuitively obvious that W and f(W) should take the same amount of space and time resources.

Might it be possible, by imposing a few natural criteria of this nature, to derive results implying that the only way to fulfill the criteria is to use some sort of simple parallel graph rewriting based computational model?

I strongly suspect this is possible, but I've been thinking about this a while, and (a familiar story for me lately!) haven't found time to spend the needed hours/days/weeks to try to work out any details.

There is a math literature on abstract rewrite systems, as well as on the practical use of parallel graph rewriting systems to do computation.   These may provide raw material to be used to prove theorems of the sort I'm alluding to.  Or maybe some theorems equivalent to what I'm dreaming about exist somewhere, and I just didn't find them yet!

So, let's suppose someone proves theorems of the nature I'm suggesting.  What would that get us?

It would tell us, in an elegant and intuitively compelling way, what computational model to assume when applying Occam's Razor.

Of course, someone could always say "That math is nice, but I see no reason to accept your criteria.  Sure they're simple and elegant, but I happen to prefer a computational model fulfilling different criteria."   That's inevitable.

But: A community sharing a common intuition regarding which mathematical criteria are elegant and simple, would have a strong mutual intuitive reason to assume a common computational model, and thus to apply the Occam's Razor heuristic in the same way.








6 comments:

  1. Anonymous4:13 PM

    You should read this reddit thread, it'll solve all your AGI problems.

    http://en.reddit.com/r/artificial/comments/yvpi1/what_are_some_of_the_biggest_problems_we_face_in/

    ReplyDelete
  2. Of course, someone could always say "That math is nice, but I see no reason to accept your criteria. Sure they're simple and elegant, but I happen to prefer a computational model fulfilling different criteria." That's inevitable.

    It seems to me that if there is no universally optimal criteria, what you have to do is ground these a priori arbitrary choices in real world performance. It's the match with the environment that makes criteria "right"

    ReplyDelete
  3. I'm not an expert in computer science. But as far as Occam's Razor is concerned, I've always felt annoyed by its inherent claim of being self-evident or universally true. For example, one might argue that not the simplest theory in science is the best theory in science but rather the kind of science which is the most accessible to expand, develop further and build upon. In this sense, Einstein might be the greatest genius in science ever because his theory of general relativity is so outstanding and hitherto unsurpassed. On the other hand, one might criticize him for being selfish and narcissistic since he hasn't laid any groundwork for scientist to come after him.

    ReplyDelete
  4. Anonymous3:39 PM

    I am more or less an idiot, please keep that in mind in connection with this question.
    A problem which is hard/time intensive on a turing machine might be very tractable on a quantum turing machine, yes?
    Then you have a simple computation on a complex machine or a machine who power derives from it's inherent complexity -- and a complex/intractable computation on a simple machine.
    Could this suggest something about the nature of simplicity and its relation to complexity?
    Or only prove I haven't a clue wtf I'm saying?
    Go ahead. I can take it.

    ReplyDelete
  5. To me, it would seem that “simplicity” would have to be correlated with semantic value given that semantic value quantifies the contribution an entity makes to the construct it is a part of. If you define a system of semantic values on a set of computational elements then Occam’s Razor dictates the greatest semantic value for the least number of elements, i.e. the biggest bang for the buck. Like poetry, it says the most with the least . . . E = mc2

    Of course, if one accepts these criteria then Occam’s Razor invariably dictates fuzzy systems because the reference in fuzzy systems are to multisets or gradients as opposed to rigid elements. In fact, it would seem that fuzzy logic would be the best approach to your valuation problem: one defines a system of semantic values on a set of computational elements and then defines a rule-based fuzzy decision making system which assigns degrees of membership to the semantic values in the valuation system. In this way one could generalize decision making to all computational systems. If someone doesn’t like your particular criteria they can simply enter their own goals and constraints into the decision maker and presto, a result based on their own criteria. NASA developed a fuzzy decision making software way back in 1995 called DECISION MAKER. Currently they use something called START Methodology:

    "START employs a systematized approach to formulating and assessing mission architectures and optimal portfolios of capabilities and technologies. It allows decision-makers to see explicitly the information and assumptions that go into the analysis, to conduct "what-if" experiments rapidly with different scenarios and assumptions, and to develop an objective foundation for their decisions."

    Considering most Silicon-based Intelligence systems use fuzzy logic computational systems, this also represents a rather interesting case of self-referential recursion . . .

    Some suggestive links:

    http://start.jpl.nasa.gov/methodology/methodology.cfm

    http://interfaces.journal.informs.org/content/33/3/40.abstract

    https://webspace.utexas.edu/deverj/personal/papers/sv.pdf

    http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19950010854_1995110854.pdf

    ReplyDelete


  6. Wow, What a Excellent post. I really found this to much informatics.
    It is what i was searching for.I would like to suggest you that please keep sharing such type of info.Thanks

    Please Visit My homepage ➤ 출장안마
    (jk)

    ReplyDelete