What does impulse mean in neural networks

Table of Contents

Preface

1. What are neural networks?

1.1. Basic ideas

1.2. Nerve cells and neurological findings

1.3. The abstract neuron model

1.4. Early experiments with simulated neural networks

1.5. The criticism of Minsky and Papert

1.6. The further development to the breakthrough

1.7. Two examples of network learning

1.8. State of research

Preface

The research area of ​​neural networks, also called connectionist systems, although known 30 years ago, has only gained more and more importance in the course of the last few years and is now regarded as an important area of ​​computer science. Great progress is expected in this area over the next few years. For a long time, applications of neural networks consisted only of use within space research or the armaments industry, e.g. for the control of weapon systems or the evaluation of aerial images for enemy reconnaissance. Later civil areas were added, such as surface analyzes or quality controls of materials or robot controls for dangerous tasks. For some years now, attempts have been made to solve medical diagnostic problems with the help of methods of modern artificial intelligence, or AI for short. Most efforts have been made in the field of expert systems, where an attempt is made to make diagnoses by evaluating rule-based expert knowledge, but this method could only be applied to a limited extent to the evaluation of clear symptoms. The other important route of AI is the use of neural networks, which, consisting of neurons and their connections to one another, similar to the human brain, can map and utilize learned knowledge. An example of an application within medicine is the development of retinal implants, which, coupled with neural networks, should enable blind patients to see again. The present work should be integrated into the series of medical applications of neural networks.

1. What are neural networks?
1.1. Basic ideas

These explanations are to be regarded as a brief explanation on the subject of "neural networks"; they are by no means complete. For more detailed information, please refer to the references given. The dream of "Artificial Intelligence" (AI) has always been one of its main driving forces as a partial aspect of computer science. Even before the fascinating ideas of J.v. Neumann, A.Turing and K.Zuse in the 50s (which established the architecture and programming of today's computers) the roots of "neuroinformatics" can be found: W.McCulloch and W.Pitts / 2 p Neuron model as a component of a threshold logic. The first hopeful experiments with "simulated neural networks" (SNN) for pattern recognition and learning were presented by F.Rosenblatt, B.Widrow and K.Steinbuch around 1960. In 1969, however, M.Minsky and S.Papert proved their fundamental limitations in their influential book "Perceptrons" and thus de facto stopped their research. In order to achieve AI, the majority of researchers back then relied on symbol processing / programming (in LISP). Despite considerable successes (e.g. "expert systems"), an AI crisis emerged in the mid-1980s. "Hard" problems (e.g. pattern recognition) often reached the limits of what was feasible. They required long computing times with millions of processing steps. If you compare the working speeds of modern computers (nanoseconds) with nerve cells (milliseconds), our reaction time of about 1/2 second in which we see an image shows that the biological system only needs about 100 processing steps. This achievement is based on the massive parallel processing of our nervous system. Because of foreseeable physical. Limits in VLSI technology (speed of light, quantum effects in thin conductors), there is now agreement that increases in performance in the required order of magnitude can only be achieved through parallel processing. The simulation of the structure of nerve associations was rediscovered as a successful solution. Some researchers were not deterred by the devastating judgment of Minsky and Papert from further plowing this field. In the meantime it turned out that the criticism does not apply to complex networks. The research results published by D.Rumelhart and J.McClelland under the catchphrase "PDP" (Parallel Distributed Processing) in 1986/2 p.32 / on associative memories with error tolerance, pattern recognition and - completion, self-organization and learning through training on examples - - all "hard" problems of AI --- impressed worldwide and have now led to a "paradigm shift". The synthesis of symbol processing (e.g. for sequential problem-solving strategies with heuristic rules in knowledge-based systems) and "simulation of neural networks" (for sensor technology, pattern recognition and learning) promises to lead out of the existing AI crisis.

1.2. Nerve cells and neurological findings

Before we talk about the simulation of the nerve networks, we should remember how the biological system works: The human brain consists of at least 10 billion neurons, the building blocks of the nervous system. The nerve cell can be viewed as a processor unit that receives signals from many other neurons via so-called synapses on their dendrites (Fig. 1).


Fig. 1: Scheme of a nerve cell
A distinction is made between excitatory (strengthening) and inhibitory (weakening) synapses, the signals of which are "calculated" in the nerve cell. If the result is below an internal threshold, nothing happens. Only when the threshold is exceeded does ("all or nothing" rule) "fire" the nerve cell, ie it sends an electrical impulse of approx. 100mV / 1ms via the axon to its synapses, which in turn are linked to other neurons . The flow of information alternates between chemical and electrical realization: At the synapses, the electrical impulse causes so-called neurotransmitters to be released. These reach the receptors in the dendrites of the subsequent neurons via the synaptic fissures. Not just a single pulse is generated, but a pulse train, the frequency of which is proportional to the strength of the stimulus. According to the hypothesis of D. Hebb (1949), learning takes place through modification of synaptic strengths. A connection is strengthened when there is a correlation between the incoming signal and the activity of the target cell ("facilitating"). An important neurobiological insight was gained in 1950 from K. Lashley / 2 p.28 /. He trained rats to run through a maze by the shortest possible route. Then he damaged any brain areas of these animals, but they still found their way through the labyrinth. He concluded from this that the memory information is not localized in certain places, but represented in a distributed manner. In 1970, J.Conel showed, using histological sections of the cerebral cortex of deceased infants, that the first three months after birth is the decisive period in which the connections of the nerve network organize themselves. Animal experiments confirmed the dependence on stimuli from the environment.

1.3. The abstract neuron model

Based on the findings at that time, W.Mc-Culloch and W.Pitts designed a neuron model in 1943 that can be used universally as an elementary processor of a threshold logic (Fig. 2):


Figure 2: McCulloch and Pitts (1943) neuron model
The input signals Xi {0,1} are weighted with (+, -) Wi according to the synaptic strength, added up and compared with the internal threshold value. Only when the threshold value is exceeded does the output value change from 0 to 1. It is easy to show (Fig. 3) that (depending on the threshold value and / or weights) both OR and AND (as well as NAND) gates can be realized and thus complex logical links are possible.

Fig. 3: Example for AND and OR links
1.4. Early experiments with simulated neural networks

As a learning classifier, the "Perceptron" / 2 Chapter 7.2 / by F.Rosenblatt (1958) was able to recognize patterns, generalize certain properties (e.g. horizontal / vertical) and was robust against input noise and interference. It consisted of 3 layers: The "retina" (for capturing the binary input field) was connected to the "association layer" via fixed weights and this was connected to the output via adaptive weights (Fig. 4). The aim of the training in the learning phase was to activate an associated output unit for a certain input pattern.


Figure 4: F.Rosenblatt (1958) PERCEPTRON network
In addition, there were inhibitory connections between the output and association layers, which meant that the first active output suppressed all other possible output units ("winner-take-all" principle). In 1960 B. Widrow and M. Hoff specified a simplified perceptron, the "Adaline" (adaptive linear neuron, Fig. 5), whose learning algorithm ("delta rule") adjusts the weights so that the error between actual and desired output minimized.

Figure 5: B. Widrow (1960) ADALINE network
The learning converges faster than with the Perceptron, but only in a limited problem area. The "Madaline" was supposed to enlarge this (Fig. 6) by evaluating several Adalines connected in parallel using a majority circuit.

Fig. 6: B. Widrow (1960) MADALINE network
In 1961 K. Steinbuch proposed the "learning matrix" (Fig. 7), which can differentiate between several classes. In the learning phase, a set of properties with a set of associated meanings creates "conditional links" (by changing resistance values) at the intersections. In the optional phase, the learning matrix can be used twice: when properties are entered, the associated meaning is determined (by forming extreme values); When entering the meaning, the corresponding properties are to be read out.

Figure 7: K. Steinbuch (1961) learning matrix
1.5. The criticism of Minsky and Papert

With the rapid development of communications engineering, control engineering and system theory, these SNN experiments contributed to recognizing a common goal in cybernetics. In 1969, during this romantic phase, the book "Perceptrons" by M.Minsky and S.Papert was published. The authors showed that F.Rosenblatt's "Konvergence-Thoerem" is meaningless for practical purposes: Perceptrons (and related circuits) can only learn and classify "linearly separable" patterns (cf. Fig. 8), so do "nothing interesting". If multi-level perceptrons could solve "non-linearly separable" problems (e.g. XOR), there would be no possibility of correctly setting their weights due to a combinatorial explosion.


Figure 8: Linear separability
1.6. The further development to the breakthrough

As a result of the negative judgment by von Minsky and Papert, research in the field of the SNN (through the withdrawal of financial support) practically came to a standstill. The majority tried to achieve AI by programming with symbol processing languages ​​(e.g. LISP). Fortunately, some researchers were not impressed by this and continued to work. The gradually published results gave rise to new hopes. Some examples: T. Kohonen (1977) experimented with associative memories which (in contrast to conventional addressing) associate a desired output pattern (e.g. passport photo) via a "key" pattern. The access key can also be incomplete (e.g. only containing the eye area) and / or noisy (Fig. 9).


Figure 9: T.Kohonen (1977) Associative memory
In 1982, J.Hopfield described an associative memory as a network with internal feedback (Fig. 10). The processor elements have "soft" threshold value transitions which (instead of 0 and 1) can also assume intermediate values.

Figure 10: Hopfield network with internal feedback
He interpreted the assignment of the input key with the associated output value as a collective oscillation of the system towards a near equilibrium state with minimal potential energy. In the (n-dimensional) "energy landscape", each valley corresponds to a learned (n-tuple) pattern. G.Hinton and T.Sejnowski extended this "thermodynamic model" in 1984 to the "Boltzmann machine" with which they solved, for example, the combinatorial "problem of the traveling salesman". To avoid that the search gets stuck in a local minimum, they used a sigmoid curve (Fig. 11) as the threshold function in which the parameter T is interpreted as "temperature". The search for a solution ("simulated annealing") begins at high temperatures (strong shaking of the energy landscape; local energy barriers can be overcome) ends at T = 0.

Fig. 11: "Simulated annealing"
Thus there is a certain probability of finding the optimal solution. K.Fukushima presented his "Neokognitron" in 1983: a hierarchically organized network linked the output of trained "feature detectors" in higher-level layers and was thus able to correctly recognize handwritten characters (Fig. 12) despite deformation, position shift and noise.

Image 12: F. Fukushima (1983) NEOCOGNITRON Example for correctly recognized handwriting
In 1986 T.Sejnowski and C.Rosenberg achieved another success with "NETtalk". They taught their SNN how to generate the correct phoneme strings from written English text using a voice output unit. At first the web babbled like an infant. After each additional training cycle, the pronunciation became clearer - until after about 50 runs only a few errors occurred. Unfamiliar text then read it completely intelligible and made the same mistakes as a beginner in English. So the network had learned the pronunciation rules hidden in the examples. Compared to the commercial program "DECtalk", which put man-years of development effort into implementing the speech output using an algorithm, this performance is particularly impressive. The NETtalk project was implemented in a few weeks. In 1986 D.Rumelhart, J.McClelland and the "PDP-Group" (to which G.Hinton and T.Sejnowski also belonged) published their research reports. The books became bestsellers and the "backpropagation" learning strategy developed by D.Rumelhart, G.Hinton and R.Williams became the most popular for multilayer networks. The decisive factor here are the processor elements within (one or more) "hidden" layers, the weights of which can, however, be trained by the backpropagation strategy (which was considered impossible by Minsky and Papert). The results are convincing, SNN has achieved the breakthrough. A paradigm shift is now taking place within AI. The SNNs are promised a way out of the existing crisis. Further basic research is now being carried out intensively worldwide, appropriate funding is being made available and the first commercial SNN applications are proving themselves in everyday life (e.g. as a [plastic] explosives detector for monitoring flight luggage).

1.7. Two examples of network learning

The minimal "back propagation" network for solving the "XOR problem" consists of 3 layers, an input layer (for buffering the input values), a hidden one with a processor element (PE) and an output layer with a PE. The PEs work here like the McCulloch-Pitts neuron (see Fig. 2). The numerical values ​​at the connections indicate the input weights of the PEs. The additional input # 1 (with 1 = const) is noteworthy. The PE thresholds are trained using the assigned weights (see W0, Figure 8). Before the start of the training, all weights are assigned random values ​​(e.g. -0.2 .. +0.2). The XOR truth table with input values ​​and (desired) output value is available for training. The learning strategy now accesses an input value pair and propagates the information forwards, taking into account the current weights. A result appears at the exit that is probably not the same as the one desired. The error is determined and then backwards (from the output to the input), which weights had which influence on the result. If a weight is positive or negative, and should be, it is increased by a small amount, otherwise it is decreased. In the course of the training cycle, input value pairs are created many times, calculated forwards, corrected backwards, etc. until all weights have been adjusted in such a way that the desired result has been achieved. Since all possible patterns were used for training here, this network is redundancy-free and can therefore only be fault-tolerant to a limited extent.
We want to investigate a "1-out-of-n decoder" (Fig. 13) that has no direct connections from the input to the output. How many PEs do we need at least in the hidden layer?


Figure 13: 1-out-of-N encoder network
The result is that ld n PEs are required for coding n states. A minimal binary coding develops in these. Such a network can be used for data compression. The result is even more remarkable: In the hidden layer, internal states can be read that could be used to explain the network behavior. When learning is generallyalso observe self-organization (in its simplest form). If weights converge to zero, this connection has no effect and can be removed. Problems can also arise when exercising
(a) As a result of "linear dependency" (different patterns use common weight paths) it may be that certain patterns cannot be learned.
(b) The order of the training examples can be decisive for the learning success.
(c) If a few patterns cannot be learned, you can try to include them in the training several times. After the training phase has been successfully completed, the "knowledge" (possibly not explicitly known) is stored in the network, distributed in the weights. The SNN now masters the examples without errors and can generally generalize (interpolate and extrapolate) and is fault-tolerant. Depending on requirements, this network could then be implemented either in hardware (as a filter) or as a software module and integrated into applications.

1.8. State of research

The >> SNN << is developing rapidly. There are a number of different approaches. Describing these would go beyond the scope of this introduction. In Figure 14, important keywords are compiled for orientation. For theoretical familiarization, we recommend the literature given, from which further sources can be accessed.


Fig. 14: Final summary