What is most amazing about the mammalian visual sytem is its invariance property — that it can recognize objects despite wide amounts of variation. Learning these invariant representations in the correct way is one of challenges in vision and neuroscience research.
When I joined the Redwood Institute (RNI) back in 2003, I was still a novice in neuroscience literature. I had implemented a simple version of HMAX/Neocognitron and also re-implemented Rajesh Rao’s hierarchical Kalman filter paper by then, but still did not know enough about other vision research. Having no knowledge can be an advantage sometimes — it enables original thinking. One day I came up with a thought experiment on learning invariant representations.
This thought experiment is given in more detail in my thesis, but I will parahrase it here. A cat walking towards its bowl of milk has to know that it is the same bowl of milk even though the images on cat’s retina vary dramatically from instant to instant. Nobody has supervised the cat to teach it that all those images are actually the same milk bowl — it had to learn it in an unsupervised fashion. Thinking more about it, I concluded that there is no way the cat could do it without using temporal proximity.
It turned out that using time as supervisor is actually not a new idea. Slow Feature Analysis (SFA) is a method for implementing this idea. This paper from Laurenz Wiskott showed how a feature hierarchy can be constructed using unsupervised learning driven by temporal proximity. Geoff Hinton proposed this idea quite a long time ago. Infact, before slow feature analysis came in there was VisNet from Edmund Rolls lab that implemented this idea in a hierarchy. Even before that Foldiak showed that this idea could work in principle in simple systems. Later I found that it can even be traced back to early philospohers. So much for my original idea!
The SFA paper showed that one can use temporal slowness to extract invariant representations . As you go up the hierarchy, you can have responses that remain invariant while maintaining selectivity between objects. However, the technique used in SFA is not scalable. It relied on specifying a set of non-linear function expansions from the input space and then applying a linear technique to find the slow varying features. The non-linear expansion space that you specify need not contain the invariant functions you are looking for. If you try to increase the repertoire of invariant functions, then you encounter the curse of dimensionality.
Powered by Qumana