Explaining how I reached the top ranks of the new Data-Centric competition


Original Source Here

2.5 Using the “label book” pictures to train the model

The organizers from DeepLearning.ai had provided a set of 52 pictures, not existing in the “train” or “validation” folders, to evaluate our model’s performance when the ResNet50 training was over.

It was a good way to have a sense of how the model would be performing on the final and hidden dataset but I had also “guessed”, thanks to the scores displayed on the leaderboard, that the final evaluation on the hidden dataset was including 2420 pictures (see the corresponding notebook here). So 52 pictures were not very representative anyway!

So I simply included these pictures in my training folder! The merrier, the funnier 😁

2.6 Evaluating the impact of the augmentation technics

As you might know, it is quite common to use augmentation technics on a dataset composed of pictures to help deep learning models identify the features that allow to properly infer the classes.

I decided to consider a few of them:

  • Horizontal and Vertical Symmetries
  • Clockwise and Anti-clockwise rotations (10° and 20°)
  • Horizontal and Vertical Translations
  • Cropping the white areas in the pictures
  • Adding synthetic “salt and pepper” noise
  • Transfering noise of some pictures to some others

2.7 Implementing the customs functions

The first functions are quite simple and easily implemented with PIL, OpenCV, or even “packaged solutions” such as ImgAug. I thought it would be more interesting to share some tips regarding some of the custom functions I had designed 😀

2.7.1 Squared Cropping Function

The cropping operation is an interesting one! As the picture will, ultimately, be converted to a 32×32 picture, it might be better to zoom in on the area where the number is located.

However, if the number does not have a “squared” shape, the result could be distorted when converted to 32×32 (as shown below). I redesigned the function so that the cropped output will always have a square shape and avoid this distortion effect:

Original and Cropped Possible Outputs — Image by Author

2.7.2 “Salt and Pepper” Function

As the background is probably not always plain white on the final evaluation dataset, I tried to augment pictures by adding a synthetic background.

I used the “salt & pepper” function which is, basically, adding “0” and “1” randomly into the NumPy arrays describing the pictures:

Original and “Seasoned” picture — Image by Author

2.7.3 Background Noise Transfer Function

I was not fully happy with the results of the “Salt and Pepper” function as the noise was always homogeneous so I imagined another way to add noise to the pictures.

I recycled some of the pictures that I had originally considered as unreadable and made them become some “noisy background” basis. There were also some pictures with a “heavy background” for which I removed the number (as shown below) to get more samples.

It provided me a “noisy backgrounds bank” of 10 pictures that I added randomly to some pictures after applying horizontal or vertical symmetries:

Noisy Background Bank— Image by Author
Example of a background noise transfer — Image by Author

2.8 Choosing the best augmentations

As the number of pictures allowed could not exceed 10.000 elements, I had to know which transformations were providing the highest impact.

I decided to benchmark them by comparing a baseline (a cleaned dataset with no transformation) and the individual performance of each of the augmentation technics (summary below):

Augmentations’ impact, based on the loss or accuracy — Image by Author

We can observe that the rotations, translations, and cropping were bringing a significant impact compared to others so I decided to focus on that ones.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: