Original Source Here
A better Dropout! Implementing DropBlock in PyTorch
An interactive version of this article can be found here
DropBlock is available on glasses in my computer vision library!
Today we are going to implement DropBlock in PyTorch! DropBlock introduced by Ghiasi et al is a regularization technique specifical crafter for images that empirically works better than Dropout. By why Dropout is not sufficient?
The problem with Dropout on images
Dropout is a regularization technique that randomly drops (set to zeros) parts of the input before passing it to the next layer. If you are not familiar with it, I recommend these lecture notes from Standford (jump to the dropout section). If we want to use it in PyTorch, we can directly import it from the library. Let’s see an example!
As you can see, random pixels of the input were dropped!
This technique works well on 1D data, but with 2D data, we can do better.
The main issue is that we are dropping independent pixels and this is not effective in removing semantic information because nearby activations contain closely related information. I think this is fairly intuitive, even if we zero out one element, the neighbors can still carry out important information.
Let’s explore what happens with the feature map. In the following code, we are first getting a baby yoda image, then we create a pretrained resnet18 using glasses. Then we feed into the image and get the feature map from the second layer. Finally, we show the activation of the first channel with and without
On the left, we have the feature map’s activations, on the right the activations of the same feature map after dropout. They look very similar, notice how in each region, even if some units are zero, neighbors’ activation is still firing. This means, information will be propagated to the next layer, that’s not ideal.
DropBlock solves this problem by dropping continuous regions from a feature map, the following figure shows the main idea.
Dropblock works as follow
We can start by defining a
DropBlock layer with the correct parameters
block_size is the size of each region we are going to drop from an input,
p is the
keep_prob like in
So far so good. Now the tricky part, we need to compute gamma that controls the features to drop. If we want to keep every activation with
p prob, we can sample from a Bernoulli distribution with mean
1 - p like in Dropout. The problem is we are setting to zeros
block_size ** 2 units.
Gamma is computed using
The left-hand side of the multiplication is the number of units that will be set to zero. While the right-hand side is the valid region, the number of pixels not touched by dropblock
The next step is to sample a mask $M$ with the same size as the input from a Bernoulli distribution with center gamma, in PyTorch is as easy as
Next, we need to zero out regions of size
block_size. We can use max pool with
kernel_size equal to
block_size and one pixel stride to create. Remember that mask is a binary mask (only 0s and 1s) so when maxpool sees a 1 in his kernel_size radius it will output a one, by using a 1 stride we ensure that in the output a region of size
block_size x block_size is created if at least one unit in the input was set to 1. Since we want to zero them out, we need to invert it. In PyTorch
Then we normalize
Let’s test it with baby yoda, for simplicity, we are going to show the dropped unit in the first channel
Looking good, let’s see a feature map from a pretrained model (like before)
We successfully zero out continuous regions and not only individual units.
By the way,
DropBlock is equal to
block_size = 1 and to
Dropout2d(aka SpatialDropout) when
block_size is the full feature map.
Now we know how to implement DropBlock in PyTorch, a cool regularization technique. The paper shows different empirical results. They use a vanilla resnet50 and iteratively add different regularization, this is shown in the following table
As you can see, `ResNet-50 + DropBlock` archives + 1% compared by SpatialDropout (the classic `Dropout2d` in PyTorch).
In the paper there are more studies with different DropBlock’s hyperparameters, if you are interested have a look 🙂
Thank you for reading!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot