Level Up Your Computer Vision Game: Harnessing the Potential of Image Augmentation

Original Source Here

Part 1: Image Annotation — Labelme

Labelme is an open-source graphical image annotation tool used for labeling and annotating various objects in images, including bounding boxes. It provides an intuitive user interface that allows users to interactively annotate objects of interest in images, making it popular among researchers and developers working on computer vision tasks.

Image By Author: Image Annotation

Here, we will capture a couple of images featuring Cola and Pepsi cans. Subsequently, we will proceed to annotate these images by enclosing the cans within bounding boxes. The resulting annotations will be saved in JSON format, which can be conveniently utilized for augmentation purposes.

The Steps to annotate include the following:

  1. launch Labelme
  2. Open the image directory containing the images you want to annotate.
  3. Draw bounding boxes around ROI.
  4. Save the annotation

Labelme will generate an annotation file, typically in JSON format, that contains the bounding box coordinates and associated label information.

├── raw
└── aug/
├── images
└── labels

Within the raw directory, we have stored all our original data images along with their corresponding JSON files. Our objective is to perform image augmentation and organize the augmented images and their labels in separate folders.

To optimize our workflow, we will create a data dictionary that combines image data, file names, and corresponding labels. This way, we can conveniently access and utilize the dictionary instead of repeatedly reading images and JSON files.

Image By Author: Generate Data Dictionary

Having successfully created our data dictionary, it is now time to proceed with the augmentation process. To accomplish this, we will employ a popular Python library called Albumentations. You can refer to the official documentation of Albumentations for detailed information and usage instructions.

Part 2: Bounding Box Augmentation — Albumentations

Albumentations is a popular open-source Python library that provides a comprehensive suite of image augmentation techniques for computer vision tasks. It is designed to facilitate fast and efficient data augmentation, empowering researchers and practitioners to enhance their training datasets and improve the performance of computer vision models.

Image Source: Albumentations

With Albumentations, you can apply a wide range of transformations to images, including geometric transformations (e.g., rotations, translations, scaling), color manipulations (e.g., brightness, contrast, saturation adjustments), noise addition, blurring, and much more. These transformations can be combined into powerful augmentation pipelines, allowing for complex and diverse data augmentation strategies.

A visualized version of the augmentation pipeline: Albumentations

Moreover, Albumentations supports a variety of data formats commonly used in computer vision, including image arrays, image paths, and annotations in formats like COCO, Pascal VOC, and YOLO. This flexibility enables easy integration into different annotation and dataset structures, simplifying the data preparation process.

Image By Author: Different formats

A typical workflow for bounding box augmentation will have the following steps:

  1. Import the required libraries.
  2. Define an augmentation pipeline.
  3. Read images and bounding boxes from the disk.
  4. Pass an image and bounding boxes to the augmentation pipeline and receive augmented images and boxes.

Without any more delay, let’s dive right in and start creating our augmented dataset ⚡

Step 1: Get the coordinates and class labels from JSON file

Let’s develop a function capable of extracting class labels and bounding box coordinates from a given label. This function will separate the labels and bounding box coordinates into individual lists, allowing us to easily pass them as parameters in the augmentation pipeline.

Image By Author: Getting labels and bounding Box

Step 2: Define an Augmentation Pipeline

In Albumentations, a pipeline is created using the Compose class, which allows you to chain together various transformations and define their parameters. Each transformation is represented by a class from the Albumentations library, such as Rotate, Flip, ShiftScaleRotate, or Blur, among others.

Let’s simplify and break down the above pipeline to better understand its different parts and how they work together.

The A.Compose function is used to create the pipeline, and it takes a list of augmentation transformations as its argument.

Here are the transformations included in the pipeline:

  1. A.RandomCrop: Randomly crops the image to a specified width and height of 400 pixels each.
  2. A.HorizontalFlip: Performs horizontal flipping of the image with a probability of 0.5, effectively mirroring it.
  3. A.MotionBlur: Applies motion blur to the image with a probability of 0.5, simulating motion blur effects.
  4. A.Blur: Blurs the image with a probability of 0.5, reducing sharpness and introducing a smoother appearance.
  5. A.RandomBrightnessContrast: Randomly adjusts the brightness and contrast of the image with a probability of 0.2.
  6. A.ShiftScaleRotate: Randomly applies shifts, scaling, and rotations to the image with a probability of 0.5.
  7. A.RGBShift: Randomly shifts the RGB channels of the image with a probability of 0.3, adding color variations. The shift limits for each channel are set to 30.

The pipeline also includes bbox_params, which specifies the parameters for handling bounding boxes. It uses Pascal VOC format for bounding box annotations and sets a minimum area of 20000 pixels for the bounding boxes. If the area of a bounding box after augmentation becomes smaller than min_area, Albumentations will drop that box. The 'class_labels' parameter specifies that the class labels for bounding boxes are included in the pipeline.

Step 3: Transformation & Verification

With our transformer prepared and the data at hand, there’s no need to wait any longer. Let’s select an image and apply the transformation to it. We can then use the transformed coordinates to plot the box and verify the effectiveness of the transformation.

Image By Author: BBox Augmentation
Image By Author: Augmented Image

To validate the accuracy of the transformed boxes, let’s plot the bounding boxes on the image. We will use a red box to represent cola and a blue box to represent Pepsi.

Image By Author: Verification of Bounding Boxes
Image By Author: Transformed Bounding Box

Voila! We have successfully created our first image augmentation pipeline and verified its effectiveness. Great job!

What lies ahead in our journey? Having accomplished a singular transformation on an individual image, it is time to expand our horizons and venture towards scaling up our augmentation process. By applying a multitude of diverse transformations to a single image, we will fully embrace the essence of augmentation, which encompasses the exploration of various modifications. Let us now proceed with confidence as we embark on the path of augmenting our images at a grander scale.

Step 3: Scale it Up 🚀

And now, we have reached the final stage of our journey. In the beginning, we started with a modest selection of three images accompanied by their corresponding labels. However, at this juncture, we are prepared to push the boundaries of augmentation further. By applying a total of 100 transformations to each of the three original images, we shall witness the creation of a magnificent ensemble of 300 augmented images. Are you ready to embrace this thrilling endeavor? Let us proceed with enthusiasm and embark upon this exciting phase.

Image By Author: Scaling it Up

Let’s verify the augmented files:

Image By Author: Files Verification

And that concludes our blog on applying transformations using bounding boxes. You are now equipped with the knowledge and skills to embark on object detection challenges, which we will delve into further in our upcoming blog. Additionally, you can also explore applying transformations for key points or utilize image transformations for classification problems. The possibilities are vast, and it’s time for you to unleash your creativity. Select your images, embark on the augmentation journey, and enrich your own dataset. Happy augmenting!

I hope you enjoyed this article! You can follow me Afaque Umer for more such articles.

I will try to bring up more Machine learning/Data science concepts and will try to break down fancy-sounding terms and concepts into simpler ones.

Thanks for reading 🙏Keep learning 🧠 Keep Sharing 🤝 Stay Awesome 🤘


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: