The Multiverse According to Ben: What does Google’s tensorflow mean for AI?

Google’s release of their tensorflow machine learning library has attracted a lot of attention recently. Like everyone else in the field I’ve felt moved to take a look.

(Microsoft's recent release of an open source distributed machine learning toolkit is also interesting. But that would be another story; here I'll restrict myself to tensorflow...)

tensorflow as a Deep Machine Learning Toolkit

Folks familiar with tools for deep learning based machine vision will quickly see that the
tensorflow neural net library is fairly similar to in concept to the Theano/pylearn2 library from Yoshua Bengio’s team at U. Montreal. Its functionality is similar to Theano/pylearn2 and also to other modern deep ML toolkits like Caffe. However, it looks like it may combine the strengths of the different existing toolkits in a novel way — an elegant,simple to use architecture like Theano/pylearn2, combined with rapid execution like one gets with Caffe.

Tensorflow is an infrastructure and toolkit, intended so that one can build and run specific deep learning algorithms within it. The specific algorithms released with the toolkit initially are well-known and fairly limited. For instance, they give a 2D convolutional neural net but not a 3D one (though Facebook open-sourced a 3D CNN not long ago).

The currently released version of tensorflow runs on one machine only (though making efficient use of multiple processors). But it seems they may release a distributed version some time fairly soon

tensorflow as a Dataflow Framework

As well as a toolkit for implementing distributed deep learning algorithms, tensorflow is also — underneath — a fairly general framework for “dataflow”, for passing knowledge around among graphs. However, looked at as a dataflow architecture it has some fairly strict limitations, which emerge directly from its purpose as an infrastructure for current deep learning neural net algorithms.

For one thing, tensorflow seems optimized for passing around pretty large chunks of data .... So if one wanted to use it to spread activation around in a network, one wouldn't make an Operation per neuron, rather one would make an "activation-spreading" Operation and have it act on a connection matrix or similar....

Furthermore, tensorflow’s execution model seems to be fundamentally *synchronous*. Even when run across multiple machines in distributed mode using Senders and Receivers, the basic mathematical operation of the network is synchronous. This is fine for most current
deep learning algorithms, which are constructed of nodes that are assumed to pass information around among each other in a specific and synchronized way. The control mechanisms tensorflow provides (e.g. for and while constructs) are flowchart-like rather than adaptive-network-like, and remain within the synchronized execution paradigm, so far as I can tell.

This is a marked contrast to ROS, which my team at OpenCog and Hanson Robotics is currently using for robotics work — in ROS one wraps up different functions in ROS nodes, which interact with each other autonomously and asynchronously. It’s also a contrast to the BrICA framework for AGI and brain emulation produced recently by the Japanese Whole Brain Initiative. BriCA’s nodes pass around vectors rather than tensors, but since a tensor is basically a multidimensional stack of vectors, this amounts to the same thing. BrICA’s nodes interact asychronously via a simple but elegant mechanism. This reflects the fact that BrICA was engineered as a framework for neural net based AGI, whereas tensorflow was engineered as a framework for a valuable but relatively narrow class of deep learning based data processing algorithms.

That is: Conceptually, it seems that tensorflow is made for executing precisely-orchestrated multi-node algorithms (potentially in a distributed way), in which interaction among nodes happens in a specifically synchronized and predetermined way based on a particular architecture; whereas BriCA can also be applied to more open ended designs in which different nodes (components) react to each others' outputs on the fly and everything does not happen within an overall architecture in which the dynamic relations between the behaviors of the components are thought out. Philosophically this related to the more "open-ended" nature of AGI systems.

tensorflow and OpenCog?

My current view on the currently popular deep learning architectures for data processing (whose implementation and tweaking and application tensorflow is intended to ease) is that they are strong for perceptual pattern recognition, but do not constitute general-purpose cognitive architectures for general intelligence.

Contrasting tensorflow and OpenCog (which is worse by far than contrasting apples and oranges, but so be it…), one observation we can make is that an OpenCog Atom is a persistent store of information, whereas a TensorFlow graph is a collection of Operations (each translating input into output). So, on the face of it, TensorFlow is best for (certain sorts of) procedural knowledge, whereas Atomspace is best for declarative knowledge.... It seems the "declarative knowledge" in a TensorFlow graph is pretty much contained in the numerical tensors that the Operations pass around...

In OpenCog’s MOSES component, small LISP-like programs called “Combo trees” are used to represent certain sorts of procedural knowledge; these are then mapped into the Atomspace for declarative analysis. But deep learning neural nets are most suitable for representing different sorts of procedural knowledge than Combo trees — e.g. procedural knowledge used for low-level perception and action. (The distinction between procedural and sensorimotor knowledge blurs a bit here, but that would be a topic for another blog post….)

I had been thinking about integrating deep learning based perception into OpenCog using Theano / pylearn2 as an underlying engine — making OpenCog Atoms that executed small neural networks on GPU, and using the OpenCog Atomspace to glue together these small neural networks (via the Atoms that refer to them) into an overall architecture. See particulars here and here.

Now I am wondering whether we should do this using tensorflow instead, or as well….

In terms of OpenCog/tensorflow integration, the most straightforward thing would be to implement

TensorNode ... with subtypes as appropriate
GroundedSchemaNodes that wrap up TensorFlow "Operations"

This would allow us to basically embed TensorFlow graphs inside the Atomspace...

Deep learning operations like convolution are represented as opaque operations in tensorflow, and would also be opaque operations (wrapped inside GSNs) in OpenCog....

The purported advantage over Theano would be that TensorFlow is supposed to be faster (we'll test), whereas Theano has an elegant interface but is slower than Caffe ...

Wrapping Operations inside GSN would add a level of indirection/inefficiency, but if the Operations are expensive things like running convolutions on images or multiplying big matrices, this doesn't matter much...

Anyway, we will evaluate and see what makes sense! …

Rambling Reflections on the Open-Source Ecosystem

The AI / proto-AGI landscape is certainly becoming interesting and complex these days. It seems that AI went in just a few years from being obscure and marginalized (outside of science fiction) to being big-time corporate. Which is exciting in terms of the R&D progress it will likely lead to, yet frustrating to those of us who aren’t thrilled with the domination of the world socioeconomy by megacorporations.

But then we also see a major trend of big companies sharing significant aspects of their AI code with the world at large via open-source released like Facebook’s conv3D code and Google’s tensorflow, and so many others. They are doing this for multiple reasons — one is that it keeps their research staff happy (most researchers want to feel they’re contributing to the scientific community at large rather than just to one company); and another is that other researchers, learning from and improving on the code they have released, will create new innovations they can use. The interplay between the free-and-open R&D world and the corporate-and-proprietary R&D world becomes subtler and subtler.

Supposing we integrate tensorflow into OpenCog and it yield interesting results… Google could then choose to use OpenCog themselves and integrate it into their own systems. Hopefully if they did so, they would push some of their OpenCog improvements into the open-source ecosystem as well. Precisely where this sort of thing will lead business-wise is not entirely clear, given the shifting nature of current tech business models, but it’s already clear that companies like Google don’t derive the bulk of their business advantage from proprietary algorithms or code, but rather from the social dynamics associated with their products and their brand.

If open-source AI code were somehow coupled with a shift in the dynamics of online interaction, to something more peer-to-peer and less big-media and big-company and advertising dominated — THEN we would have a more dramatic shift, with interesting implications for everybody’s business model. But that’s another topic that would lead us far afield from tensorflow. For the time being, it seems that the open-source ecosystem is playing a fairly core role in the complex unfolding of AI algorithms, architectures and applications among various intellectual/socieconomic actors … and funky stuff like tensorflow is emerging as a result.

4 comments:

Erf said...: Hi,
The link to BrICA isn't correct.; 7:04 AM
ZARZUELAZEN said...: This comment has been removed by the author.; 2:24 AM
Bill Lauritzen said...: Thanks for this summary, Ben.; 12:26 AM
bamgosoocom said...: What is a blogging site where people give a lot of quick feedback?

My site : 안마
(jk); 6:40 AM