Tuesday, September 20, 2011

A New Approach to Computational Language Learning

I've been thinking about a new approach to computational language learning for a while, and finally found time to write it down -- see the 2 page document here.

Pursued on its own, this is a "narrow AI" approach, but it's also designed to be pursued in an AGI context, and integrated into an AGI system like OpenCog.

In very broad terms, these ideas are consistent with the integrative NLP approach I described in this 2008 conference paper. But the application of evolutionary learning is a new idea, which should allow a more learning-oriented integrative approach than the conference paper alluded to.

Refining and implementing these ideas would be a lot of work, probably the equivalent of a PhD thesis for a very good student.

Those with a pure "experiential learning" bent will not like the suggested approach much, because it involves making use of existing linguistic resources alongside experiential knowledge. However, there's no doubt that existing statistical and rule-based computational linguistics have made a lot of progress, in spite of not having achieved human-level linguistic performance. I think the outlined approach would be able to leverage this progress in a way that works for AGI and integrates well with experiential learning.

I also think it would be possible for an AGI system (e.g. OpenCog, or many other approaches) to learn language purely from perceptual experience. However, the possibility of such an approach, doesn't imply its optimality in practice, given the hardware, software and knowledge resources available to us right now.

6 comments:

  1. Ben, I'm always happy to see someone thinking about how to model natural language computationally. But decades of attempts to computationally "learn" natural language grammar have failed. Before committing resources in another attempt, I hope you'll at least consider the idea that language is a complex system which cannot be "learned" more compactly than itself.

    If this is true it will be quite easy to simulate language computationally, e.g. using interactions between vectors of word associations. But if we try to "learn" or abstract those word associations we will always lose information, and fail.

    For more on this idea see my website: http://www.chaoticlanguage.com, or some publications http://independent.academia.edu/RobFreeman

    ReplyDelete
  2. Ventura7:10 PM

    After reading "Big Brain: Origins and Future of Intelligence" (Lynch & Granger), I was convinced that the structure of memory has a major role in language and its mechanisms are probably not so different from those of perception.

    ReplyDelete
  3. Looks like an interesting idea. Of course the choice of representation for the lexicon will be very important.

    ....

    Here is another idea for language learning: one theory about the origin of language (due to VS Ramachandran, I think) asserts that language arose through synesthesia -- cross-sensory mappings such as that which appear in the Bouba-Kiki phenomenon.

    These mappings, of course, are simply analogies. And I do think that analogies are one of the keys to language learning, regardless of whether they are synesthetic in origin or not. Suppose, for instance, that a computer has the following sentence in its corpus of prior knowledge of examples of good English:

    KATHY SAID SHE WOULD GO TO THE STORE.

    And then suppose that it is asked to determine whether the following sentence makes sense:

    BILL SAID HE WOULD GO TO THE STORE.

    If it *knows* that `Kathy' is typically a `she' and that `Bill' is typically a `he', upon forming the analogy Kathy:she :: Bill:he, it could infer that the sentence is indeed correct and makes sense; actually, this would be contingent on knowing that that particular kind of analogy results in word swaps Kathy <--> Bill and she <--> he that maintain semantic and grammatical integrity.

    (Note: The example just given doesn't really take into account context, `emotion', `attention', `memory', etc. I'll get to that next comment... )

    So, analogies could (should) be useful for language learning. ``Great... now how does the computer figure out the analogies to begin with?,'' you might wonder. Here is an idea for this... first some intro: basically you want to put words into categories (e.g. `hand' would be placed in the category `body part') and ordered pairs of words into relational categories (e.g. (arm, body) would be placed into the category `A is a part of B'). Think of the words as corresponding to vertices in a graph; ordered pairs of words correspond to directed edges; word categories correspond to a vertex coloring; and word-pair relational categories correspond to an edge coloring. We allow vertices and edges to have multiple colors, which corresponds to words and word-pairs appearing in multiple categories.

    Now, one can then read off elementary analogies A:B :: C:D by checking that the edge A --> B has the same color as C --> D in the graph (or one of the colors anyhow... since, as we said, edges can have multiple colors); for more distant analogies, however, one wants to look for pairs of paths with the same colors in sequence, but involving different vertices: e.g., say you have the analogy night:sunny :: day:dark. You can relate these two pairs (night, sunny) and (day, dark) through the `opposites' category and then the `state of being' category; that is, the edge `night --> day' has the same color as `day --> night'... and then `day --> sunny' has the same color as `night --> dark'.

    And now here is how I propose learning the word-pair categories and word categories that ultimately give you the power of analogy-detection -- i.e. the colorings of edges and vertices: starting with your corpus of sentences, scan through all the words that appear in all of them, and put one vertex for each word. And then you simply choose the edge and vertex colors so as to MAXIMIZE the number of analogies you have among all pairs of sentences. Now obviously if you chose all edges and vertices to have the same color you would maximize this count immediately; so you also need additional constraints, e.g. that you use at least X colors and each color has to be used at least Y times... and you should probably also weight analogies according to how distant they are (e.g. length k path analogies might get a score of 1/2^k). It may be possible to optimize here using dynamic programming-type; doesn't feel to me like an NP-complete problem... which is good.

    to be continued...

    ReplyDelete
  4. part 2:

    Ok, that's all fine and good... but how could you incorporate context and multi-sensory data temporally bound to those sentences? This is where you could use an idea of Minsky (at least I seem to recall it was due to him originally): when a certain emotional state is activated, you can think of it as a situation where certain mental resources are turned on, while others are turned off; and, in particular, associated to each emotional state you can have a DIFFERENT set of word categories and word-pair categories. As the machine reads a sentence, its emotional state is constantly changing (hopefully smoothly), and this gives it a kind of memory and an awareness of context... clearly this will also affect the perception of analogies. In addition to having the emotional state depend on the text being read, as well as its previous emotional state, it could also depend on what sensory data is being received.

    ReplyDelete
  5. That makes complete sense!It sounds like a great book. Thanks for sharing.

    ReplyDelete
  6. Great post. I was checking constantly this blog and I am impressed! Extremely helpful information particularly the ultimate section : ) I take care of such info much. I was looking for this particular info for a very lengthy time. Thanks and good luck. cara cepat hamil | contoh gambar pemandangan

    ReplyDelete