Original Source Here
2.5 Using the “label book” pictures to train the model
The organizers from DeepLearning.ai had provided a set of 52 pictures, not existing in the “train” or “validation” folders, to evaluate our model’s performance when the ResNet50 training was over.
It was a good way to have a sense of how the model would be performing on the final and hidden dataset but I had also “guessed”, thanks to the scores displayed on the leaderboard, that the final evaluation on the hidden dataset was including 2420 pictures (see the corresponding notebook here). So 52 pictures were not very representative anyway!
So I simply included these pictures in my training folder! The merrier, the funnier 😁
2.6 Evaluating the impact of the augmentation technics
As you might know, it is quite common to use augmentation technics on a dataset composed of pictures to help deep learning models identify the features that allow to properly infer the classes.
I decided to consider a few of them:
- Horizontal and Vertical Symmetries
- Clockwise and Anti-clockwise rotations (10° and 20°)
- Horizontal and Vertical Translations
- Cropping the white areas in the pictures
- Adding synthetic “salt and pepper” noise
- Transfering noise of some pictures to some others
2.7 Implementing the customs functions
The first functions are quite simple and easily implemented with PIL, OpenCV, or even “packaged solutions” such as ImgAug. I thought it would be more interesting to share some tips regarding some of the custom functions I had designed 😀
2.7.1 Squared Cropping Function
The cropping operation is an interesting one! As the picture will, ultimately, be converted to a 32×32 picture, it might be better to zoom in on the area where the number is located.
However, if the number does not have a “squared” shape, the result could be distorted when converted to 32×32 (as shown below). I redesigned the function so that the cropped output will always have a square shape and avoid this distortion effect:
2.7.2 “Salt and Pepper” Function
As the background is probably not always plain white on the final evaluation dataset, I tried to augment pictures by adding a synthetic background.
I used the “salt & pepper” function which is, basically, adding “0” and “1” randomly into the NumPy arrays describing the pictures:
2.7.3 Background Noise Transfer Function
I was not fully happy with the results of the “Salt and Pepper” function as the noise was always homogeneous so I imagined another way to add noise to the pictures.
I recycled some of the pictures that I had originally considered as unreadable and made them become some “noisy background” basis. There were also some pictures with a “heavy background” for which I removed the number (as shown below) to get more samples.
It provided me a “noisy backgrounds bank” of 10 pictures that I added randomly to some pictures after applying horizontal or vertical symmetries:
2.8 Choosing the best augmentations
As the number of pictures allowed could not exceed 10.000 elements, I had to know which transformations were providing the highest impact.
I decided to benchmark them by comparing a baseline (a cleaned dataset with no transformation) and the individual performance of each of the augmentation technics (summary below):
We can observe that the rotations, translations, and cropping were bringing a significant impact compared to others so I decided to focus on that ones.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot