Neural Networks:  Biological Models and Applications

 

 

To appear in:  International Encyclopedia of the Social & Behavioral Sciences

 

 

 

 

 

Frank H. Guenther

Department of Cognitive and Neural Systems

Boston University

677 Beacon Street

Boston, MA 02215

 

 

 

 

4.3.263 Neural Networks:  Biological Models and Applications

 

Beginning with the seminal work of McCulloch and Pitts in the 1940’s, artificial neural network (or connectionist) modeling has involved the pursuit of increasingly accurate characterizations of the electrophysiological properties of individual neurons and networks of interconnected neurons.  This line of research has branched into descriptions of the nervous system at many different grains of analysis, including complex computer models of the properties of individual neurons, models of simple invertebrate nervous systems involving a small number of neurons, and more abstract treatments of networks involving thousands or millions of neurons in the nervous systems of humans and other vertebrates.  At various points in the history of neural network research, successful models have moved beyond the domain of biological modeling into a variety of engineering and medical applications. 

 


Modeling the Computational Properties Properties of Neurons

 

 

AltAlthough the ideaidea that that the brain iis the seat of the mind and the controller of behavior h is many centuries old,as been around for centuries, research into the  computational properties of networks of interconnected neurons interconnected neurons was largely absent until the 1940’s.  McCulloch and Pitts (1943) initiated the field of neural network research by formulating a modelinvestigating networks of interconnected neurons, with each neuron treated as a  in which neurons were treated as simple binary logic computing elementelements.   and demonstrating that networks of these simple neurons could be used to compute complex logical functions.  In this model, the axon of a neurona nerve cell carries the cell’sthe cell’s binary output signal.  This axon forms a set of synaptic connections to the dendrites, or inputs, of other neurons.  The axonal signal corresponds roughly to the voltage level, or membrane potential, of the neuron.  The total input to a neuron is the sum of its synaptic inputs, and if this sum exceeds a certain threshold, the McCulloch-Pitts neuron produces an output of 1.  Otherwise the cell’s output is 0.  This binary conception of a neuron’s output was based on observations of “all or nothing” spikes, or action potentials, in the membrane  potentials of biological  neurons.

 

The McCulloch-Pitts model of the neuron was formulated when our physiological understanding of neurons was rather limited.  Although simple binary neurons are still used in some neural network models, many later models treat a neuron’s output as a continuous, rather than binary, function of the cell’s inputs.  This change was motivated by neurophysiological observations that the frequency of neuron spiking, rather than the presence or absence of an individual spike, is the  more relevant quantity for measuring the strength of signaling between neurons..  ***could delete following***Perhaps the most widely used formulation of a neuron’s output in current neural networks is the following the followingequation:

where yj is the output (spiking rate) of a neuron labeled j, zij is the strength of the connection (synapse) from neuron i to neuron j,  is the firing threshold for neuron j, and the output function , and f(xx)  isis typically an increasing function of x, with f(x)=0 for x<0.  a monotonically increasing function whose value is 0 for x<0. In words, if a neuron’s total input () is below  the firing threshold, then the neuron’s output is zero, and if the input exceeds the firing threshold, then the output is a positive value related to the difference between the total input and the threshold.  Different models use different forms of f(x), with common choices including threshold linear and sigmoidal output functions.

**** could delete preceding****

The mathematical characterization of neuronal function was further improved by the work of A significantly more sophisticated account of neuron dynamics was formulated by Hodgkin and Huxley (1952), who won the Nobel prize for their their experimental and modeling work elucidating determining the relationship between the ionic currents flowing in and out of a neuron and the membrane potential of the neuron.  The behavior of networks of Hodgkin-Huxley-like neurons, or or shunting neural networks, has been studied in some detail (e.g., Grossberg 1980), and these networks have formed the basis of a number of models of biological nervous systems.

 

The models described above are point models of a neuron in that they treat the electrical properties of the neuron as uniform across the neuron’s membrane; in other words, they treat neurons as if they did not have a spatial extent.  However, the membrane potential of a real neuron varies as a function of position on the membrane.  For example, the membrane potential at a distal dendrite of a neuron can differ substantially from the membrane potential at the cell body or along the axon.  Sophisticated compartmental models of neurons, which treat the neuron as a collection of interconnected electrical sub-circuits (, or compartments), , that each correspond to a different portion of the neuron, have been developed in recent years, and a number of software tools for simulating the properties of these neurons are readily available.  Each sub-circuit in a compartmental model corresponds to a different portion of a real neuron, such as a single dendrite, and large-scale computer simulations are used to simulate the membrane potential of a single neuron. Although compartmental models provide a more accurate description of single neuron dynamics than the point models used in most neural networks, the complexity of this type of model has preventprevented its use in networks containing more than a handful of compartmental model neurons. (See Arbib 1995 for more information on single neuron models and neuron simulators.)

Learning in Neural Networks

 

The connections between cells in an artificial neural network correspond to synapses in the nervous system.  In the earliest neural network models, the strengths of these connections, which determine how much the pre-synaptic cells can influence the activity of the post-synaptic cells, were kept constant.  However, much of the utility of neural networks comes from the fact that they are capable of modifying their computational properties by changing the strengths of synapses between cells, thus allowing the network to adapt to environmental conditions (for biological neural networks) or to the demands of a particular engineering application (for artificial neural networks).  A majorThe challenge forchallenge for  computational neuroscientists has been to develop useful algorithms for changing the weights in a neural network in order to improve its performance based on a set of training samples. 

Training a neural network typically consists of the presentation of a set of input patterns alone, or the presentation of input/output pattern pairs, to the network.  During the presentation of each pattern or input/output pair, the weights of the synapses in the network are modified according to an equation that is often referred to as a learning law. In a supervised learning network, training consists of repeated presentation to the network of input/output pairs that represent the desired behavior of the network.  The difference between the network’s output and the training output represents the performance error and is used to determine how the weights will be modified. In a self-organizing network, the weights are changed based on a set of input patterns alone, and the network typically learns to represent certain aspects of the statistical distribution of the training inputs.  For example, a self-organizing network trained with an input data set that includes three natural clusters of data points might learn to identify the data points as members of three distinct categories. (See *** other chapters for self-organizing, supervised learning***.)

 

A variety of learning laws have been developed for both supervised and self-organizing neural networks.  Most of these learning laws fall into one of two classes. The origins of the first class, associative or Hebbian learning laws, can be tracked to a simple conjecture penned by the cognitive psychologist Donald Hebb (1949)in 1949:  “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A’s efficiency, as one of the cells firing B, is increased.”  With this statement, Hebb gave birth to the concept of Hebbian learning, in which the strength of a synapse is increased if both the pre- and post-synaptic cells are active at the same time.  Remarkably, Hebb’s conjecture, which was made before the development of experimental techniques for collecting neurophysiological data concerning synaptic changes, has proven to capture one of the main aspectsmost commonly observed aspects of synaptic change in biological nervous systems, and variations of the Hebbian learning law are used in many current neural network models.  It should be noted, however, that these learning laws are considerably simplified approximations to the complex and varied properties of real synapses (e.g., see Regulation of Synaptic Efficacy; Long-term Potentiation; Long-term Depression).

Kandel et al. 2000 for further information regarding synaptic modification in biological nervous systems). 

In a neural network using Hebbian learning, a synapse’s strength depends only on the pre- and post-synaptic cell activations, rather than on a measure of the network’s performance error.  Hebbian learning laws are thus well-suited for self-organizing neural networks (e.g., see Artificial Neural Networks:  Associative and Self-organizing).

 

A second common class of neural network learning laws arelaws,  error minimization  learning lawsg laws,.  These learning laws are commonly employed in supervised learning supervised learning networkssituations, where an error signal can be computed by comparing the network’s output to the a desired output provided in the training set. Whereas Hebbian learning laws arose from psychological and neurophysiological analyses, gradient descenterror minimization learning laws arose from mathematical analyses aimed at minimizing the network’s performance error, most oftenusually through a technique known as gradient descent. The network’s performance error can be represented as a surface in the space of the synaptic weights.  Valleys on this error surface correspond to synaptic weight choices that lead to low error values.  Ideally, one would choose the synaptic weights that correspond to the global minimum of the error surface.  However, the entire error surface cannot be “seen” by the network during a training trial; only the local topography of the error surface is known.  Gradient descent learning laws change the synaptic weights so as to move down the  steepest part of the local error gradient.

 

One of the first gradient descent learning laws was developed by Widrow and Hoff (1960) for a simple one-layer neural network, the ADALINE model, which has found considerable success in technological applications such as adaptive noise suppression in computer modems. However, one-layer networks have limited computational capabilities.  Gradient descent learning has since been generalized to networks with three or more layers of neurons, as in the commonly employed backpropagation learning algorithm, which was first derived by Paul Werbos in his 1974 Harvard Ph.D. dissertation and later independently rediscovered and popularized by Rumelhart et al. (1986).

Common Neural Network A Architectures

 

The cells in a neural network can be connected to each other in a number of different ways.  A feedforward network is one in which the output of a cell does not affect the cell’s input in any way.  Probably the most common artificial neural network architecture is the three-layer feedforward network, first described by Rosenblatt (1958).  In this architecture, an input pattern is represented by cells in the first layer. These cells project through modifiable synapses to the second layer, which is often referred to as the hidden layer since it is not directly connected to the input or output of the network. The hidden layer cells in turn project through modifiable synapses to the output layer.  The backpropagation learning algorithm is a common choice for supervised learning in three-layer ffeedforward networks.

 

Recurrent or feedback networks can exhibit much more complex behavior than feedforward networks, including sustained oscillations or chaotic cycles of cell activities over time. In their most general form, the cell outputs of one layer in a recurrent network not only project to cells in the next layer, but they can also project to cells in the same layer or previous layers. Variants of the backpropagation algorithm have been developed for training multi-layer recurrent networks (see Arbib 1995 for more information on supervised learning in recurrent neural networks)..

 

One important and heavily studied recurrent network architecture is the self-organizing map, which was,  first formulated to account for neurophysiological observations concerning cell properties in primary visual cortex (von der Malsburg, 1973; see also Artificial Neural Networks:  Associative and Self-organizing; Topographic Maps in the Brain). The basic principle of a self-organizing map is as follows.  Cells in the input layer, sometimes referred to as a sub-cortical layer, project in a feedforward fashion to the cells in the second, or cortical, layer, through pathways that have modifiable synapses.  Cells in the cortical layer are recurrently interconnected, and they compete with each other through inhibitory connections so that only a small number of the cortical layer cells are active at the same time.  These “competition winners” are typically the cells that have the most total input projecting to them from the sub-cortical layer, or the cells that lie near cells with a large amount of input.  The connections between the sub-cortical cells and the active cortical cells are then modified via an associative learning law so that these cells are even more likely to become active the next time the same input pattern is applied at the sub-cortical layer.  The net effect over many training samples is that cells that are near to each other in the cortical layer respond to similar input patterns (a property referred to as topographic organization), and more cortical cells respond to input patterns that are frequently applied to the network during training than to rarely encountered input patterns.  Although originally formulated as recurrent networks in which the cells in the cortical layer project to each other in a recurrent fashion, simplified feedforward versions of the self-organizing map architecture that approximate stable behavior of the recurrent system have been developed and thoroughly studied (e.g., Kohonen, 1984).

 

A related neural network architecture that has also been used to explain a number of neurophysiological resultobservations from cortical neurophysiology is the adaptive resonance theory (ART) architecture (Grossberg, 1980).  In this network, an additional set of recurrent connections project from the cortical layer back down to the sub-cortical layer.  The top-down projections emanating from a cortical cell embody the sub-cortical pattern that the network has learned to expect when that cortical cell is activated.  These learned expectations can be used to correct coding errors before learning has taken place in the bottom-up pathways, thereby providing a more stable cortical representation. In addition to their use in biological modeling, neural network systems based on the ART model have been applied to a number of pattern recognition problem domains (Carpenter and Grossberg, 1991).

 

Specialized Neural Models of Biological Systems

The neural networks described so far are “general-purpose” models in that the same architecture is used to attack a variety of biological modeling or engineering problems.  In addition to these models, many specialized models of particular neural circuits have been proposed.  Among the earliest were models of cerebellum function, proposed by researchers such as Marr and Albus beginning in the late 1960’s.  The cerebellum has a very regular and well-characterized anatomical structure, and cerebellar physiology has been heavily studied in recent decades (e.g., see Cerebellum; Long Term Depression (Cerebellum)).  Different cells and synapses in the cerebellum have different properties, and neural models of the cerebellum typically incorporate these differences. Relatively primitive invertebrate neural circuits, such as heartbeat oscillators, have also been the focus of numerous biologically specialized neural network models, as have vertebrate circuits such as the superior colliculus, hippocampus, basal ganglia, and various regions of cortex.

Another type of specialized biological model approximates the function of entire behavioral systems involving large-scale networks of the human brain. Individual cells in these models often correspond to relatively large brain regions, rather than to single neurons or distinct populations of neurons. These models often combine aspects of different neural network architectures or learning laws. The DIVA model of speech production (Guenther 1995), for example, combines several aspects of earlier neural network models into an architecture that learns to control movements of a computer-simulated vocal tract. The model has been shown to provide a unified account for a wide range of experimental observations concerning human speech that were previously studied independently.  Other models of this type address various aspects of human cognition, movement control, vision, audition, language, and memory.

Pattern Recognition Applications

Neural networks are capable of learning complicated non-linear relationships from sets of training examples.  This property makes them well suited to pattern recognition problems involving the detection of complicated trends in high-dimensional data sets.  One such problem domain is the detection of medical abnormalities from physiological measures.  Neural networks have been applied to problems such as the detection of cardiac abnormalities from electrocardiograms and breast cancer from mammograms, and some neural network diagnostic systems have proven capable of exceeding the diagnostic abilities of expert physicians.  Supervised learning networks have been applied to a number of other pattern recognition problems, including visual object recognition, speech recognition, handwritten character recognition, stock market trend detection, and scent detection (e.g., Carpenter and Grossberg, 1991).

 

For further reading on neural networks and their biological bases, see Anderson and Rosenfeld (1988), Arbib (1995), and Kandel et al. (2000).

 

important

Von der Malsburg model, which

All neural networks are based at least loosely on biological nervous systems. The networks described so far are “general-purpose” in that the same architecture is used to attack a variety of biological modeling or engineering problems.  In addition to these models, many specialized models of particular neural circuits have been proposed.  Among the earliest were models of cerebellar function, proposed by researchers such as Marr and Albus in the late 1960’s and early 1970’s.  The cerebellum has a very regular and well-characterized anatomical structure, and cerebellar physiology has been heavily studied in recent decades.  Different cells and synapses in the cerebellum have different properties, and neural models of the cerebellum typically incorporate these differences. Relatively primitive invertebrate neural circuits, such as heartbeat oscillators, have also been the focus of numerous biologically specialized neural network models, as have vertebrate circuits such as the superior colliculus, hippocampus, basal ganglia, and cortex.

Another type of biological model approximates the function of the brain using combinations of different neural architectures.  Cells in these models often correspond

Such models have been used to address a variety of human behavioral and perceptual systems, including speech perception and production, movement control, vision, audition, and reinforcement learning.

Neural Network Applications

Neural networks are capable of learning complicated non-linear relationships from sets of training examples.  This property makes them well suited to problems involving the detection of complicated trends in data sets involving large numbers of variables that cannot be interpreted with simple rules.  One such problem domain is the detection of medical abnormalities from physiological measures, such as the detection of  cardiac abnormalities from electrocardiograms or breast cancer from mammograms. In many instances neural network diagnostic systems have been shown to be capable of exceeding the diagnostic abilities of physicians.

Another technological area in which neural networks are used to  non-linear pro

Is the adaptive control of robotic manipulators. 

Other non-medical applications

adaptive control of robotic manipulators

Supervised learning networks have also been applied to a number of pattern recognition problems, including visual pattern recognition, speech recognition, handwritten character recognition, stock market prediction, and electronic scent detection (see Carpenter and Grossberg 1992, *** for more information on neural networks for pattern recognition).

For further reading on neural networks and their biological bases, see Anderson and Rosenfeld (1988), Arbib (1995), and Kandel et al. (2000). For further reading on neural network applications, see Carpenter and Grossberg, 199*)
Bibliography

 

Anderson J A, Rosenfeld E (eds.) 1988 Neurocomputing:  Foundations of Research.  MIT Press, Cambridge, MA

Arbib M A (ed.) 1995 The Handbook of Brain Theory and Neural Networks.  MIT Press, Cambridge, MA

Carpenter, G A, Grossberg S (eds.) 1992 Neural Networks for Vision and Image Processing.  MIT Press, Cambridge, MA

Carpenter G A, Grossberg S (eds.) 1991 Pattern Recognition by Self-Organizing Neural Networks. MIT Press, Cambridge, MA

***remove one of the above****

Grossberg S 1980 How does a brain build a cognitive code?  Psychol. Rev. 87: 1-51

Guenther F H 1995 Speech sound acquisition, coarticulation, and speaking rate effects in a neural network model of speech production.  Psychol. Rev. 102: 594-621

Hebb D O 1949 The Organization of Behavior.  Wiley, New York

Hodgkin A L, Huxley A F 1952 A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. (Lond.) 117: 500-44

Kandel E R, Schwartz J H, and Jessell T M (eds.) 2000 Principles of Neural Science, Fourth Edition.  McGraw-Hill, New York

Kohonen T 1984 Self-organization and Associative Memory. Springer-Verlag, New York

Malsburg C von der 1973 Self-organization of orientation sensitive cells in the striata cortex.  Kybernetic 14: 85-100

McCulloch W S, Pitts W 1943 A logical calculus of the ideas immanent in nervous activity.  Bull. Math. Biophys. 5: 115-33

Rosenblatt F 1958 The perceptron: A probabilistic model for information storage and organization in the brain.  Psychol. Rev. 65: 386-408

Rumelhart D E, McClelland J L, PDP Research Group 1986 Parallel Distributed Processing: Explorations in the Microstructure of Cognition.  MIT Press, Cambridge, MA

Widrow B, Hoff M E 1960 Adaptive switching circuits. 1960 IRE WESCON Convention Record, IRE, New York. Reprinted in Anderson J A, Rosenfeld E (eds.) 1988 Neurocomputing:  Foundations of Research. MIT Press, Cambridge, MA

Frank H. Guenther, Boston University