I am an invited speaker for at an NSF-sponsored special symposium, From Brains to Machines, to be held in San Jose, CA. This symposium is part of IJCNN 2011, the flagship conference of the International Neural Network Society. The official title of my talk is *How to work towards a mathematical understanding of the brain*, but that is really the best academic sounding title I could come up with. What I actually want to talk about is *How to build a brain without solving it*. I hope I get to show some of our recent work at Vicarious Systems.

]]>You can download the full paper here.

]]>You can download the full paper here.

]]>I have been away and inactive on the web for a while, and I wanted to give an update about my whereabouts and future plans.

**Vicarious Systems**

The first update is that I started a new company – Vicarious Systems. Some of you know this already from my website. My cofounder and I raised a venture round and we have been heads down working since then. I am very excited about the new brain-inspired A.I algorithms we are building. We will provide more details when we formally launch the company sometime in early 2011. You can sign up on our website, www.vicariousinc.com, to be notified when we launch.

This also means that I am no longer at Numenta. I still remain very interested in Numenta’s success and wish them all the best.

**Lab visits**

I have also been active giving talks at labs and universities. I visited the Los Alamos National Labs in New Mexico and had very interesting conversations with Garrett Kenyon, Louis Bettencourt and others. I gave a talk at the Redwood center on the connection between retrograde signaling in neurons and sequence learning. I took off for India in November and visited the Indian Institute of Science in Bangalore. My recent talks were all around the theme of ‘how to reverse engineer the brain’ with specifics on combining insights from machine learning and engineering with neuroscience. One of the talks was titled “How to build a brain without solving it”. I have been consolidating my thoughts on this and I will be writing more about this in the future.

**Blog and website moving**

One of the reasons why I haven’t been posting frequently is the amount of comment spam my blog was generating. So, instead of running my own wordpress and wiki installations, I decided to move my blog and website to Posterous. The new website will be centered on the blog which will be directly linked to www.dileepgeorge.com when the conversion is complete. My next few posts will go to both my wordpress installation and my posterous site. (Thanks to posterous for making this very easy). I will move completely over to posterous by mid February. Please update your feed address if you are accessing my blog via RSS.

Wish you all a Happy New Year.

]]>

If we live in a world where whenever we see a cat we see a dog right after, we will say that cats and dogs are the same things! I am sure that Nuo’s and Jim’s experiments have left a lot of monkeys similarly confused.

**Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal CortexNuo Li and James J. DiCarlo**, McGovern Institute for Brain Research, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology,

*Powered by* Qumana

Let us say we have a video camera whose output is fed to a very smart algorithm that is trying to learn spatial concepts from passively observing the video stream. We want the algorithm to learn concepts like ‘translating to the right’. To teach the algorithm, we show it videos with different objects undergoing translations. Let us consider an abstraction of this setting to see why inductive learning of a concept like translation can be very expensive in general.

Suppose we are given vector pairs (x1, x1’), (x2, x2’), (x3, x3’) … where x1, x1’ etc are vectors. We are told that all the points in the pair are related by the same transformation *f*. That is, *x1’ = f(x1), x2’ = f(x2),* **..** etc. However, we do not know what the transformation *f* is. The learning problem is to figure out the unknown transformation *f* from the observed set of vector pairs. If the algorithm does not make any assumptions about the nature of these unknown functions, then it is working in an infinitely large hypothesis space with no structured way of exploring it. Therefore learning *f* will not be possible in a reasonable amount of time.

So, at the very least, the algorithm has to make assumptions about the space of transformations. The motor system could provide the set of assumptions to the perceptual learner algorithm to restrict its hypothesis space. However, I think motor system plays a more important role in perceptual learning than just providing a set of assumptions. In my next post on this subject I will describe some deep insights Henri Poincare had on the role of actions in perception.

*Powered by* Qumana

Consider an organism named *couchy*. Couchy lives anchored to a rock — it does not move. It has a visual sensor to sense the environment, but it has no self-generated motion. Nor does it have any facility to make actions that are directed in space (for example, shooting a barb to the left or right). Couchy senses the environment through its visual sensor. Couchy has formed invariant representations for objects in the world through the temporal proximity mechanism we discussed earlier. When dangerous objects are recognized, couchy emits a sharp sound that propagates in an isotropic manner.

Couchy is like a couch potato, watching the world through the television screen, never making a move and not even moving his eyes. Hence the name.

Now the question: If all couchy does is observe videos of objects moving in the world without ever making a self-generated spatially directed action, will it develop a sense of space? That is, will couchy have an idea of left and right, up and down? Will couchy develop a sense of geometry in its lifetime?

We can replace couchy’s brain with any learning algorithm and ask the same question. Will the learning algorithm develop a sense of geometry if it is observing the world passively through videos? Couchy has only a lifetime to learn this. So the question really is whether algorithms can learn a sense of geometry in this setting in a reasonable amount of time. Anything that is exponential is clearly of no interest.

The role of actions in perception/cognition/language is a very important topic and I will be writing more about it. This thought experiment is a good start.

]]>When I joined the Redwood Institute (RNI) back in 2003, I was still a novice in neuroscience literature. I had implemented a simple version of HMAX/Neocognitron and also re-implemented Rajesh Rao’s hierarchical Kalman filter paper by then, but still did not know enough about other vision research. Having no knowledge can be an advantage sometimes — it enables original thinking. One day I came up with a thought experiment on learning invariant representations.

This thought experiment is given in more detail in my thesis, but I will parahrase it here. A cat walking towards its bowl of milk has to know that it is the same bowl of milk even though the images on cat’s retina vary dramatically from instant to instant. Nobody has supervised the cat to teach it that all those images are actually the same milk bowl — it had to learn it in an unsupervised fashion. Thinking more about it, I concluded that there is no way the cat could do it without using temporal proximity.

It turned out that using time as supervisor is actually not a new idea. Slow Feature Analysis (SFA) is a method for implementing this idea. This paper from Laurenz Wiskott showed how a feature hierarchy can be constructed using unsupervised learning driven by temporal proximity. Geoff Hinton proposed this idea quite a long time ago. Infact, before slow feature analysis came in there was VisNet from Edmund Rolls lab that implemented this idea in a hierarchy. Even before that Foldiak showed that this idea could work in principle in simple systems. Later I found that it can even be traced back to early philospohers. So much for my original idea!

The SFA paper showed that one can use temporal slowness to extract invariant representations . As you go up the hierarchy, you can have responses that remain invariant while maintaining selectivity between objects. However, the technique used in SFA is not scalable. It relied on specifying a set of non-linear function expansions from the input space and then applying a linear technique to find the slow varying features. The non-linear expansion space that you specify need not contain the invariant functions you are looking for. If you try to increase the repertoire of invariant functions, then you encounter the curse of dimensionality.

*Powered by* Qumana

I couldn’t help smiling when I read the quip about new entrants to the field having high confidence to knowledge ratios. Having entered neuroscience from an electrical engineering background, I am sure that I fell into the high confidence to knowledge ratio side. This is the way it should be. Why would anyone enter a field if they already know everything about it or if they are not confident about making a difference in that field? As Larry muses in the paper, the most important thing is that the ratio gets adjusted through an increase in knowledge and not a decrease in confidence.

The very next paragraph in the paper discusses the difference between “word-models” and mathematical models. This is a very instructive paragraph, so I will quote it below.

“What has a theoretical component brought to the field of neuroscience? Neuroscience has always had models (how would it be possible to contemplate experimental results in such complex systems without a model in one’s head?), but prior to the invasion of the theorists, these were often word models. There are several advantages of expressing a model in equations rather than words. Equations force a model to be precise, complete, and self-consistent, and they allow its full implications to be worked out. It is not difficult to find word models in the conclusions sections of older neuroscience papers that sound reasonable but, when expressed as mathematical models, turn out to be inconsistent and unworkable. Mathematical formulation of a model forces it to be self-consistent and, although self-consistency is not necessarily truth, self-inconsistency is certainly falsehood.”

I have direct experience with this. There were many models that I started with a description of the kind when *neuron A fires , followed by neuron B fires, it strengthens the connections and then the winner uses a soft-inhibition to inhibit the neurons around it. *Although the descriptions seemed sensible to begin with, I was never able to get such models to work to in real life. I know that there are many scientists who have a knack for making such models work, but I never had any luck with them. I was always able to get *interesting* behavior out of those networks, but seldom able to get *desired* behavior.That is when I started paying attention to the equivalence between many complicated network models and simpler mathematical formulations.

For example, I had once built a temporal learning network model with learning rules that I thought were very ingenious. Later we understood that the model was equivalent to a special case of Hidden Markov model. Although the the special case was interesting in itself, our understanding greatly improved and the learning rules became much simpler when we realized the mapping between the network model and this mathematical model. This example taught me that it is not enough to just specify the learning rules in a mathematical form — it is important to seek to understand the actual computation that is being done in a mathematical form.

It is a fairly high level paper — the models in the paper are not concrete enough to be implemented or validated. That doesn’t diminish the importance of this paper though. However, because of its high-level, all-encompassing nature, I sometimes have a tough time answering questions like “How does your model differ from the model Lee and Mumford presented?”. It is tough to differ from the models in this paper. You can only add details (very important ones) that are mostly consistent with this paper.

One of the important omissions in this paper is that it doesn’t grapple with the question of invariant representations, which is an important property of the visual cortex. The word “invariance” appears only once in this paper. One of my early papers tried to bring the idea of invariant representations into this framework.

]]>Hierarchical Bayesian inference in the visual cortexTai Sing Lee and David MumfordTraditional views of visual processing suggest that early visual neurons in areas V1 and V2 are static spa- tiotemporal filters that extract local features from a visual scene. The extracted information is then chan- neled through a feedforward chain of modules in successively higher visual areas for further analysis. Recent electrophysiological recordings from early visual neurons in awake behaving monkeys reveal that there are many levels of complexity in the information processing of the early visual cortex, as seen in the long-latency responses of its neurons. These new findings suggest that activity in the early visual cortex is tightly coupled and highly interactive with the rest of the visual system. They lead us to propose a new theoretical setting based on the mathematical framework of hierarchical Bayesian inference for reasoning about the visual system. In this framework, the recurrent feedforward/feedback loops in the cortex serve to integrate top-down contextual priors and bottom-up observations so as to implement concurrent probabilistic inference along the visual hierarchy. We suggest that the algorithms of particle filtering and Bayesian belief propagation might model these interactive cortical computations. We review some recent neurophysiological evidences that support the plausibility of these ideas.