In AI what is the dropout technique

Neural Networks¶

Neural Networks¶

Artificial neural networks are information processing systems, the structure and functionality of which is reminiscent of the nervous system and especially the gerhin of humans and animals. A neural network often consists of a large number of simple units that work in parallel, the so-called neurons. These neurons send information to each other in the form of activation signals via weighted connections. A neural network is arranged in several layers: the input layer, the output layer and the “hidden layers” in between.

Let's consider an example on property prices:

A neural network can be illustrated using the example of calculating house prices. We have a number of inputs. These are e.g. characteristics such as the number of bedrooms, the presence of a swimming pool (yes or no), is there a garden and the size of the living space. The inputs are then connected to the hidden layers in between at different points. The hidden layers can in turn be connected to other layers. In the end, this is how the output is created. In our example this is the calculated house price. Each hidden layer is created from a different combination of inputs. This results in combinations of combinations, etc. The function behind it is very complex. This is why we speak of a “black box” in this context. Since every calculation operation could only be traced with enormous effort. Classic neural networks work very well in the background outlined here. But what about the processing of images? This is where a conventional neural network reaches its limits. For example, for an image with 7 million pixels, we would have an enormous number of inputs with an equally large number of layers. This can hardly be achieved even with huge computing clusters. Another method must be used for this [3,4,7]:

Convolutional Neural Networks (CNN) / Deep Learning¶

Convolutional Neural Networks (CNN) extract localized features from input images and unfold these image fields using filters. The entrance to a convolutional layer is an m x m x r image, where m is the height and width of the image and r is the number of channels. For example, an RGB image has r = 3 channels. This data is now passed through several layers and repeatedly filtered and subsampled [8,10].

The last layer outputs a score for each image class that represents the probability of entering that class. Some of the filters used are briefly explained below [9].


The batch size defines how many images are trained per update [9].

Activation functions¶

In this work, the activation functions Sigmoid and ReLu are used by means of the TensorFlow implementations.

The sigmoid function looks like this:

The ReLu (Rectified Linear Unit) function is the preferred activation function in CNN today:

The sigmoid function only covers a range between [0.1]. The ReLu, however, has a range between [0, ∞]. Therefore the sigmoid function can be used to model probabilities. However, all positive real numbers can be modeled using ReLu. The main advantage of the ReLu function is that when calculating CNNs you have no problems with the “fading” of the gradient, as can occur with the sigmoid function. Furthermore, it has been found that CNNs can be trained more efficiently using ReLu [1,5,6].


Max pooling is an example-based discretization process. The aim is to scan an input representation (image, hidden layer output matrix, etc.), to reduce the dimensionality and to make assumptions about the features contained in the sub-regions. Max pooling reduces the number of parameters to be learned - and thus also the computational effort.


Flatten and Dense¶

The classifier is the final step in a CNN. This is called the dense layer, which is a common classifier for neural networks. The dense layer feels its way down from the pooling layer. In this layer, every node is connected to every node in the previous level. Like any classifier, it needs individual features. So he needs a feature vector. To do this, the multidimensional output from the conversions must be converted into a one-dimensional vector. This process is called “flattening” [12].


Overfitting can be a huge problem with CNNs with a large number of parameters. Dropout is a technique to counteract this. The idea is as follows: During the training, units and their input and output connections are randomly removed from the network (“drop out”). This prevents the units from "co-approximating" each other too closely. If possible, the units should differ individually from each other so that their characteristics come to light.

Using dropout means creating “thinned out” samples of the network. The thinned out network consists of all units that survived the dropout. A neural network of n units can be viewed as a collection of 2 ^ n possible thinned out networks [12].