Original Source Here
Comprehend Dropout: Deep Learning by doing toy examples
Dropout is one of the main regularization techniques in deep neural networks. This story helps you deeply understand what Dropout is and how it works.
In Deep Learning, especially in Object Detection, overfitting can easily happen. Overfitting means the model is very complex such that it fits the train set very well but fails on the test set. Failing means it sometimes detects even noises in a test image.
In Object detection, it is common to train with a pretrained backbone or continue training with a pretrained model. That’s why the loss of validation goes higher than the loss of train after a few epochs. In such cases, adding a Dropout layer is helpful.
In Pytorch, we can add a Dropout layer simply by:
from torch import nn
dropout = nn.Dropout(p=0.2)
But what happens under the hood?
The Dropout Regularization Scheme
The Dropout technique creates a sub-neural network from the original one by selecting some neurons in the hidden layers. The selection is resampling the nodes in the neural network (only nodes in the hidden layers) and defining some masks.
The Dropout is not for bias nodes! The Dropout is a regularization technique, and the idea is to reduce overfitting caused by weights. Therefore, regularization is not for the bias nodes because they don’t receive any input. Therefore, dropping them out does not help to improve the predictions.
Consider the following fully connected network, wherein the activation function is the ReLU. (To see how the network is related to matrices, watch the above GIF.)
In this example, the goal is to predict for x = (1 , 1) using the following dropout masks.
To compute the prediction, we should calculate the values in the hidden layer.
The first hidden layer:
where g⁽¹⁾ is the ReLU activation function, then
Note: The first dropout layer, μ⁰, is one for all nodes. Therefore, it doesn’t have any effect on the result.
The second hidden layer:
where g⁽²⁾ is the ReLU activation function, then
The third hidden layer:
where g⁽³⁾ is the ReLU activation function, then
The output layer:
Compute the prediction if the dropout masks are:
The final answer is:
This story helps you understand the dropout technique. I plan to add more such toy examples in Machine Learning and Deep Learning. Therefore, stay tuned if you like to read more 😊.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot