Friday, May 01, 2009

Infomax


From "Modeling the Mind: From Circuits to Systems: section 1.2 "Sensory Coding" by Suzanna Becker.
"Several classes of computational models have been influential in guiding current thinking about self-organization in sensory systems. These models share the general feature of modeling the brain as a communication channel and applying concepts from information theory. The underlying assumption of these models is that the goal of sensory coding is to map the high-dimensional sensory signal into another (usually lower-dimensional) code that is somehow optimal with respect to information content. Four information-theoretic coding principles will be considered here: 1) Linsker's Infomax principle, 2) Barlow's redundancy reduction principle, 3) Becker and Hinton's Imax principle, and 4) Risannen's minimum description length (MDL) principle. Each of these principles has been used to derive models of learning and has inspired further research into related models at multiple stages of information processing.
...
The Infomax principle has been highly influential in the study of neural coding, going well beyond Linsker's pioneering work in the linear case. One of the major developments in this field is Bell and Sejnowski's Infomax-based independent Component Analysis (ICA) algorithm, which applies to nonlinear mappings with equal numbers of inputs and outputs (Bell and Sejnowkski, 1995).
...
The principle of preserving information may be a good description of the very earliest stages of sensory coding, but it is unlikely that this one principle will capture all levels of processing in the brain. Clearly, one can trivially preserve all the information in the input simply by copying the input to the next level up. Thus, the idea only makes sense in the context of additional processing constraints. Implicit in Linsker's work was the constraint of dimension reduction. However, in the neocortex, there is no evidence of a progressive reduction in the number of neurons at successively higher levels of processing.




Transcript of presentation by Ralph Linsker (IBM TJ Watson Research center)
(video, slides)
Lisker's presentation slot starts at the 50 minute mark in the video (aprox. 40% through). (Lisker's an excellent speaker, but his slides leave much to be desired!)


Slide 1. The search for organizing principals of brain function.

"My working belief is that one needs multiple organizing principles at multiple levels of the brain ranging from synapse up to hierarchies of areas within the neocortex and different areas apart from the neocortex. And my working belief is that the number of such high level organizing principles one might need is more than one but less than ten. And I'm going to talk about a couple of aspects of what these potential organizing principles might be, my special interest being at the level between cell and cortical maps.
...

Slide 2. Self-organization
It's striking to me sometimes how long it takes certain ideas to be put together, to be combined from different disciplines.
Turing had a wonderful paper, not the one for which he's most famous, but it's a seminal paper on morphogenesis in biology. 1952.
Hebb's idea that you've heard about dates from 1949.
An early puzzle in neuroscience arose from the work of Hubel and Wiesel, experimental work starting in 1960 which showed that, in cats and then later in monkeys one finds a layer of cells in which each cell responds selectively, preferrentially, to a local edge at some particular orientation. And that as you move across cortex, you find a pattern of the different preferred orientations.
[see image at the top of this post] The puzzle was, "how does this come about"? They even found in monkey that this is present at birth, so it does not develop as a result of exposure to structured visual stimuli in that case.

What I found, my introduction to this area of self-organization in neural systems, was that if you combine the Hebb rule with short connections (known to exist in retina) and simply some random electrical activity that's at least locally correlated - so if one retinal ganglian cell is activate or seeing a bright spot, it's likely that at neighbouring cell is going to be seeing a bright spot as well, a portion of the same patch - that those ingredients alone can lead to orientation selective cells and also to their patterning within a cortical layer. By the way, that locally correlated electical activity prenatally was not known to exist at the time but was found a few years later experimentally.

Slide 3: Self-organization in cortical models
What you see
[in the color image at the top of this post] is a pattern that I generated that, at the time, troubled me because it looked more complex than Hubel and Wiesel's pattern of orientation domains. The color coding reflects the preferred orientation of the cell, as represented by one of 5 colors here. And at the time, Hubel and Wiesel's patterns only referred to a coarser grained resolution - are you closer to a horizontal preference or closer to a vertical preference, for example. And those patterns looked rath, like fingerprints, meandering stripes.

When I came up with this I was troubled at first, but it turned out later that experimental work published after this came out by Blasdo and Salama revealled that the complexity of this pattern is, in fact, what's found in cortex, including the singularities where most or all of the colors meet at a point, which are now commonly called 'pinwheels'. Now that's a static view of the end of the optimization process using my model. Here's a movie thanks to Sirotia et al that uses a more elaborate version of this same model for self organization but is based on the same principles.

(Movie)

Staring from random orientation preferences (or lack of preference), you evolve using simple Hebbian-type learning rules to get this kind of resulting pattern.

Slide 4: Some higher-level properties that can result from Hebbian Learning
Now, that's well and good. What are some higher level properties that can result from Hebbian learning? To put it another way, if the Hebb rule for synapses is a good algorithm, what is it good for? What computational tasks is it good for? I'll illustrate with a couple of analogies that are really much stronger than metaphor - there's a mathematical base for them that's solid. But just to make the point quickly:
If you apply a Hebb rule to the synapses impinging on a given output cell, the cell can be regarded as a committee in which each member has a voting strength that's initially random, but he gets more votes each time his preference agrees with the final output of the committee.

What this does is it induces a consensus forming by the committee on a subset of issues that frequently come before it for consideration. So this committee becomes, for example, an orientation selective analyzing committee. On most questions that come to it, most local input patterns that come to it, it will have no strong opinion. Where there's an oriented edge, it will have a strong opinion, perhaps positive or negative.

So Hebb's rule induces a committee consensus and I extended this to the issue of an entire layer of cells that can interact in a competitive and cooperative manner, through lateral connections, and proposed what I call the "infomax" principle, which says:

  • Create a layer of cells connecting inputs to outputs.

  • The cells can compute any of a wide class of functions, subject to certain biological constraints.

  • Let it develop in such a way that its outputs convey maximum Shannon information on average about its inputs, subject to those biological constraints and costs.

  • the costs can be of different types: it could be types of allowed processing - how strong are the processors as computers, limited wire length, energy costs, and so forth.



  • It's an optimal encoding principle, and again, for a brief metaphor, imagine an organization now of human beings where no person is told what their job is explicitly, and in fact, no one is told what the goal of the entire organization is. All they're told is "you're going to receive masses of data each day, and your job is to write a summary in one page that captures as much Shannon information as possible about that input.

    What each person will do is within the limits of their ability , find regularity, find patterns within that data so that they can capture it more concisely. And that, in essence, is what Infomax does.

  • It's been used in various ways, extensively for models of neural learning and development.

  • It leads to qualitative and quantitative agreement with experiment, especially in the first few stages of early visual processing.

  • It's the basis of Bell and Sejnowski's Independent Component Analysis method, which can reconstruct N statistically independent sources, given at least N linear combinations of them.


  • ...


    More...
    Journal of vision article: Cone selectivity derived from the responses of the retinal cone mosaic to natural scenes
    Andrei Cimponeriu (Georgetown Institute for Cognitive and Computational Sciences,
    Georgetown University Medical Center):
    Modeling the Development of Ocular Dominance and Orientation Preference Maps in The Primary Visual Cortex with The Elastic Net

    No comments: