A better Dropout! Implementing DropBlock in PyTorch


Original Source Here

A better Dropout! Implementing DropBlock in PyTorch

An interactive version of this article can be found here

DropBlock is available on glasses in my computer vision library!


Today we are going to implement DropBlock in PyTorch! DropBlock introduced by Ghiasi et al is a regularization technique specifical crafter for images that empirically works better than Dropout. By why Dropout is not sufficient?

The problem with Dropout on images

Dropout is a regularization technique that randomly drops (set to zeros) parts of the input before passing it to the next layer. If you are not familiar with it, I recommend these lecture notes from Standford (jump to the dropout section). If we want to use it in PyTorch, we can directly import it from the library. Let’s see an example!

Image by the Author

As you can see, random pixels of the input were dropped!

This technique works well on 1D data, but with 2D data, we can do better.

The main issue is that we are dropping independent pixels and this is not effective in removing semantic information because nearby activations contain closely related information. I think this is fairly intuitive, even if we zero out one element, the neighbors can still carry out important information.

Let’s explore what happens with the feature map. In the following code, we are first getting a baby yoda image, then we create a pretrained resnet18 using glasses. Then we feed into the image and get the feature map from the second layer. Finally, we show the activation of the first channel with and without Dropout

Image by the Author

On the left, we have the feature map’s activations, on the right the activations of the same feature map after dropout. They look very similar, notice how in each region, even if some units are zero, neighbors’ activation is still firing. This means, information will be propagated to the next layer, that’s not ideal.


DropBlock solves this problem by dropping continuous regions from a feature map, the following figure shows the main idea.

Image by Ghiasi et al.

Dropblock works as follow

Image by Ghiasi et al.


We can start by defining a DropBlock layer with the correct parameters

block_size is the size of each region we are going to drop from an input, p is the keep_prob like in Dropout.

So far so good. Now the tricky part, we need to compute gamma that controls the features to drop. If we want to keep every activation with p prob, we can sample from a Bernoulli distribution with mean 1 - p like in Dropout. The problem is we are setting to zeros block_size ** 2 units.

Gamma is computed using

Image by Ghiasi et al. (eq 1 in the paper)

The left-hand side of the multiplication is the number of units that will be set to zero. While the right-hand side is the valid region, the number of pixels not touched by dropblock

# Output

The next step is to sample a mask $M$ with the same size as the input from a Bernoulli distribution with center gamma, in PyTorch is as easy as

Next, we need to zero out regions of size block_size. We can use max pool with kernel_size equal to block_size and one pixel stride to create. Remember that mask is a binary mask (only 0s and 1s) so when maxpool sees a 1 in his kernel_size radius it will output a one, by using a 1 stride we ensure that in the output a region of size block_size x block_size is created if at least one unit in the input was set to 1. Since we want to zero them out, we need to invert it. In PyTorch

Then we normalize


Let’s test it with baby yoda, for simplicity, we are going to show the dropped unit in the first channel

Image by the Author

Looking good, let’s see a feature map from a pretrained model (like before)

Image by the Author

We successfully zero out continuous regions and not only individual units.

By the way, DropBlock is equal to Dropout when block_size = 1 and to Dropout2d(aka SpatialDropout) when block_size is the full feature map.


Now we know how to implement DropBlock in PyTorch, a cool regularization technique. The paper shows different empirical results. They use a vanilla resnet50 and iteratively add different regularization, this is shown in the following table

As you can see, `ResNet-50 + DropBlock` archives + 1% compared by SpatialDropout (the classic `Dropout2d` in PyTorch).

In the paper there are more studies with different DropBlock’s hyperparameters, if you are interested have a look 🙂

Thank you for reading!



Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: