Analyzing Computer Vision Model Performance Like a Pro*nBGwTchLEDddaVSq

Original Source Here

Analyzing Computer Vision Model Performance Like a Pro

Photo by Rohan Makhecha on Unsplash

My name is Manpreet and I am a Deep Learning/Computer Vision Research Engineer. I have extensive experience working with various deep learning architectures for computer vision tasks like classification, object detection, tracking, segmentation, etc. Over the years I have developed my code for analyzing the performance of these models. However, the initial data exploration phase for images and the post-training performance analysis can be slow if done via text overlayed images saved in different folders. I recently learned about an incredible tool called FiftyOne by Voxel51 and I am in love with it. I can’t recommend it enough for use in your work or research. In this blog post, I will explain how to use the tool for image classification.


FiftyOne is an open-source tool that provides a powerful graphical interface for dataset labeling and computer vision model analysis. “Improving data quality and understanding your model’s failure modes are the most impactful ways to boost the performance of your model.”[1] Having a standardized tool tremendously expedites and streamlines the data and model quality analysis process. Being an open source project is a cherry on top. The official documentation of the tool is beautifully written and is available here:

FiftyOne — FiftyOne 0.16.5 documentation. The tool can be run both as a standalone application as well as from within your Jupyter notebooks.

Source: By the author

FiftyOne Installation [1]

To install fiftyone you can use pip. Use the following commands to create a conda environment named fo and then install fiftyone library using pip. [Note: For MacBooks with M1 chip you would need to manually set up MongoDB backend since the bundled DB installation doesn’t work out of the box. You can read about it here:].

If everything is alright you should be able to load the package in python. We will next look at two fundamental core classes of fiftyone.

FiftyOne Dataset and Samples [2]

1. Dataset: This class is the heart of fiftyone and has powerful functionalities to represent data and manipulate it using the python library as well as the fiftyone UI. You can load, modify, visualize and evaluate data along with labels (classifications, detections, etc.) [2] Even if you have unlabelled data the initial exploratory phase can be completed in the fiftyone app. It also has integration with CVAT and other labelling platforms. A dataset is an ordered collection of Sample classes that are assigned unique IDs for retrieval. You can use the following code to instantiate an empty dataset named “emotion-dataset”.

2. Sample: Dataset consists of Sample class objects that store information related to any given data sample. Each sample has a filepath as a mandatory field. Other than that you can add as many keyword fields as you like so long as the data type is consistent across all the samples. Let us look at an example below.

This will create a sample that has filepath and ground_truth fields. Note that for the entire dataset ground_truth needs to be string class names. If you want to use integers you will have to be consistent for the entire dataset.
Adding a sample to the dataset is pretty easy.


For this tutorial, I will be downloading two classes [Wine, Bagel] from the Open Images v6 dataset using the fiftyone library. Using fiftyone library is the recommended method for downloading data as per the open images website.

The following code will download the images for the Wine and Bagel classes from the validation split and register them under the open-image-v6-demo name. We have specified label_types as classification since for this tutorial we will not be utilizing the detection annotations.

At this point, we have a dataset that has positive_labels and negative label fields populated along with some others.

Source: By the author

But for our example evaluation, we need to create a ground_truth field which will contain fo.Classification objects. The following code will add the ground_truth to all the relevant samples of interest. We first filter the dataset based on positive_labels field for Wine and then add ground_truth to them. It is important to note here that you need to call the save method on every sample for the change to be reflected in the database. If a new field is created for one sample, all the others will get the field populated with a default value of None. In this way, we have created Wine and Bagel ground_truth labels for our dataset.

You can launch the fiftyone app at this point and start to look at the downloaded dataset. The following code will launch a session and let you look at the data.

There are several elements of the UI which you should familiarise yourself with. A reading of this page should help you do that:

Source: By the author

You can quickly scroll through the dataset and analyze if the labels make sense. If there are any erroneous samples, you can just select them by hovering on the image and selecting the checkbox (or opening the image and then selecting it).

Source: By the author

All the selected images can be tagged to filter those out. Or you can get access to the selected images by using session.selected property. This will give you the unique ids of all the selected samples. Which can then be used to manipulate those samples.

Source: By the author

Now we need to use a model to add predictions to our dataset. Imagenet has three classes called “red wine”, “wine bottle” and “bagel” which can be used for the samples we downloaded from the open images dataset. We will use a pre-trained model to perform prediction on our dataset and then do the evaluation. I selected the densenet169 (PyTorch) model pre-trained on the imagenet dataset.

This code will add a predictions label field with classification results to the samples. However, it will assign the argmax class label and confidence to the sample from the 1000 imagenet classes. For our use case, we want only the ones relevant to the Wine and Bagel categories. We store the logits info in the predictions field too by specifying store_logits=True. Next, we find the relevant class indices for our classes of interest.

Now we iterate over the dataset and assign the correct ground_truth values based on the data available in the positive_labels generated by the open image downloader. We create wine_confidence by adding the “red wine” and “wine glass” softmax values and bagel_conficence as the bagel softmax value. The prediction label is assigned based on which confidence is greater.

Now we have everything ready in the dataset to do our evaluation.

FiftyOne for Performance Evaluation

Once the ground_truth and predictions have been registered for all the samples of interest in the dataset, evaluation is pretty fast. For classification, we can look at the classification report and the confusion matrix for doing the analysis.

We filter the dataset to select only ground_truths that have a Wine or Bagel label. Then using this filtered view, we run the evaluate_classifications method. We specify the predictions and ground_truth field names and the eval key. This will compute the sklearn style classification report as well as a confusion matrix.

To view the classification report we can simply use the print_report method.

Source: By the author

I always look at the classification report to instantly get a deep understanding of how the model is performing at a class level. Accuracy values can be misleading in datasets with heavy class imbalance. However, precision, recall, and f1-score values show a more realistic picture of the performance. Looking at this report you can instantly pick up which classes are performing well and which are not.

Next, the confusion matrix is another powerful tool for analysing the performance of a classifier. Creating a confusion matrix using the ClassificationResults result object. You can create a spectacular interactive heatmap object using the plot_confusion_matrix method. This plot can then be attached to the session to allow for an interactive experience. The following code creates a confusion matrix and attaches to the session.

You can hover over each cell in the matrix to see the total count, ground truth label, and predicted label.

Source: By the author

You can also select an individual cell or groups of cells to dynamically filter the samples in the fiftyone UI to show only the examples belonging to certain cells in the confusion matrix. This makes analysing false positives, false negatives, and misclassification a piece of cake!

For example, if we want to see bagels that were misclassified as wine we just click on the top right cell.

Source: By the author
Source: By the author

The above figure shows the filtered view after clicking on the top right cell in the confusion matrix. It is so convenient to find hard samples or spot erroneous annotations using the fiftyone tool.

Accessing these samples is pretty straightforward. I clicked on one of the images and observed that the confidence shown on the UI is 0.00. On hovering over the label in the detailed floating window it showed confidence of 0.002.

Source: By the author

However, if we want to look at all the samples in view or certain selected samples in detail programmatically, we can do so with ease.

These samples can be used for finding trends and patterns in the wrong predictions and can be used for sourcing new training data to address these issues.


To summarise, we looked at the fiftyone library, an open-source power tool for analyzing model performance. We learned about Dataset, Samples, and the fiftyone app. Next, we created a dataset from the open images dataset [3] and calculated predictions using a densenet model pre-trained on the imagenet dataset [4]. We also looked at how to create the classification report and confusion matrix. The confusion matrix plot from fiftyone is an interactive plot. We learned how to attach the plot to a fiftyone session and interactively filter the samples to analyze the wrong predictions. I have barely scratched the surface of this tool. There are several other features available and I may cover some of them in a future post. However, you are welcome to read their documentation. Finally, thank you for reading the article! I hope you found it useful and learned something new. Follow for content on deep learning, machine learning, data science, computer vision, and computer science. You can connect with me on LinkedIn here:







Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: