Health Care Industry
Industry: Email Alert RSS Feedups and downs of Hebb synapses, The
Canadian Psychology, Feb 2003 by Geoffrey Hinton
Abstract
Modelers have come up with many different learning rules for neural networks. When a teacher specifies the correct output, error-driven rules work better than pure Hebb rules in which the changes in synapse strength depend on the correlation between pre and postsynaptic activities. But for unsupervised learning, Hebb rules can be very effective if they are combined with suitable normalization or "unlearning" terms to prevent the synapses growing without bound. Hebb rules that use rates of change of activity instead of activity itself are useful for discovering perceptual invariants and may also provide a way of implementing error-driven learning.
Most RecentHealth Care Articles
- Healthcare Roundup: Aetna Slammed by Senate Committee, $600M for Community...
- Senate Deal on Public Option Would Expand Medicare
- Debate Over Value-Based Purchasing by Medicare Continues
- Industry Has Influence In Reform Bill's Research Institute
- Cadillac Plan Tax Could Backfire, Study Suggests
- More »
It would be truly wonderful if randomly connected neural networks could turn themselves into useful computing devices by using some simple rule to modify the strengths of synapses. This was the hope that lay behind the original Hebb learning rule and it is the vision that has driven neural network modelers for half a century. Initially, researchers tried simulating various rules to see what would happen. After a decade or two of messing around, researchers realized that there was a much better way to explore the space of possible learning rules: First write down an objective function (a quantitative definition of how well the network is performing) and then use elementary calculus to derive a learning rule that will improve the objective function. For the last few decades, the big theoretical advances in learning rules for neural networks have been associated with new optimization methods and new ideas about what objective function should be optimized.
If we think of a neural network as a device for converting input vectors into output vectors, it is obvious that one sensible objective is to minimize some measure of the difference between the output the network actually produces and the output it ought to produce.
This approach led to effective "error-driven" learning rules such as the Widrow-Hoff rule (Widrow & Hoff, 1960) and the perceptron convergence procedure (Rosenblatt, 1961) and it was later generalized to multilayer networks by using backpropagation of the errors to get training signals for intermediate "hidden" layers (Rumelhart, Hinton, & Williams, 1986). Within the neural network community, the "Hebbian" approach of using the product of pre and postsynaptic activities to drive learning was seen as inferior to error-- driven methods that use the product of the presynaptic activity and the postsynaptic activity derivative - the rate at which the objective function changes as the postsynaptic activity is changed. Even when the task was merely to associate random input vectors with random output vectors, it was shown that an error-driven rule worked much better than a Hebbian rule.
Unfortunately, error-driven learning has some serious drawbacks. It requires a teacher to specify the right answer and it is hard to see how neurons could implement the backpropagation required by multilayer versions. It is possible to get the teaching signal from the data itself by trying to predict the next term in a temporal sequence (Elman, 1991) or by trying to reconstruct the input data at the output (Hinton, 1989) but it is also possible to use quite different objective functions for learning. Some of these alternative objective functions lead to learning rules that are far more Hebbian in flavour.
A common objective in processing high-dimensional data is to reduce the dimensionality without losing the ability to reconstruct the raw data from the reduced representation. If we measure the accuracy of the reconstruction by the squared error, the optimal strategy is to extract the principal components - the dominant directions of variation in the data. Oja (1982) showed how to extract the first principal component using Hebbian learning to maximize the squared output of a neuron combined with normalization of the synapse strengths to prevent them growing without bound. Sanger (1989) showed that lateral inhibition between neurons can be used to make them extract several different principal components.
Another objective that might have appealed to Hebb is to create a set of attractor states in a nonlinear network. Leading researchers (Marr, Palm, & Poggio, 1978) speculated that it would be very hard to analyze and manipulate the dynamical behaviour of networks of binary threshold neurons with recurrent interconnections, but in 1982, Hopfield pointed out that if the connections were symmetrical the network would settle down into states that were local minima of a simple "energy function." Moreover, new minima could be created by simple Hebbian learning. So the activity dynamics of a network with fixed weights could implement the retrieval of a memory from a corrupted or incomplete version of the memory, and Hebbian learning could be used to store new memories.
Hopfield networks introduced an extra level of complexity by using one objective function - the energy - to determine the fast dynamics of the neural activities and a quite different objective function - the proximity of the energy minima to the vectors that need to be stored - to determine the slow dynamics of the synapse strengths. Hinton and Sejnowski (1986) realized that Hopfield nets could be generalized by adding noise to the activity dynamics, so that instead of simply settling to a point attractor, the "Boltzmann machine" would wander around among its various possible activity states spending most of its time in low energy states but occasionally visiting higher energy ones. If the network is divided into a set of "visible" units that represent the sensory input and a set of hidden units whose states represent an interpretation of the sensory input, the stochastic dynamics can be interpreted as a way of sampling various possible interpretations of the sensory data. An interpretation that has energy E will get sampled with a probability proportional to exp(-E) that corresponds exactly to correct Bayesian inference if the probability of an interpretation is proportional to exp(-E).
Brought to you by CBS MoneyWatch.com
- Best- and Worst-Paid College Degrees
- 6 Things You Should Never Do on Twitter or Facebook
- How Much Sleep Do You Really Need?
- 6 Big Myths about Gas Mileage
Most Recent Health Articles
Most Recent Health Publications
Most Popular Health Articles
- Make running easier: with this unique 'pose running' technique, you'll learn to actually enjoy your fat-burning sessions
- 50 home remedies that work: these safe, fast, and effective fixes will relieve what ails you - Cover Story
- Detox in 7 days: a detoux diet can help you shed up to 10 pounds and leave you feeling terrific. Our weeklong plan shows you how to lose the weight and keep it off - Cover story
- Treat sinusitis naturally: breath easy and relieve sinus pressure with these remedies - Quick Fixes and Long-Term Solutions
- All about nightshades: explore the hidden hazards of your favorite food with macrobiotic nutritionist Lino Stanchich


