Fluorescent Neuronal Cells dataset — part I


Original Source Here

Fluorescent Neuronal Cells Dataset — Part I

Photo by National Cancer Institute on Unsplash

One of the key factors for the success of computer vision in the last decade has been the availability of vast amounts of labeled data.

In fact, the possibility to collect images of modern devices such as smartphones, webcams, and others has generated an unprecedented quantity of data depicting a wide range of daily-life settings. Together with this increased availability, the familiarity of the scenes portrayed made it possible to resort to untrained operators to annotate the images, thus requiring relatively low costs. In turn, this allows researchers and practitioners to address several learning tasks such as face recognition, keypoint detection, autonomous driving, and more by exploiting supervised learning techniques.

Unfortunately, the same does not hold for more niche applications, where there are fewer data and labeling requires some expertise. For example, think about fields like biology or life sciences, where the state of the art is still behind with respect to more mainstream applications due to the lack of curated data of decent sizes.

In the first article of this series, we will present the Fluorescent Neuronal Cells dataset: a collection of 283 high-resolution pictures (1600×1200 pixels) of mice brain slices and the corresponding ground-truth masks.

Figure 1. Dataset – Image by the author


The Fluorescent Neuronal Cells dataset (Morelli et al., 2021) is a collection of 283 pictures of mice brain slices at high resolution (1600×1200 pixels), which is freely available for download here.

The images were acquired thanks to an imaging technique, fluorescent microscopy, to study the mechanisms responsible for torpor in rodents (Hitrec et al., 2019). In practice, after the mice were subject to controlled experimental conditions, a marker was injected into their brains to “highlight” some neuronal cells of interest. As a result, these structures appear as yellow-ish spots of variable brightness and saturation over a composite, generally darker background (Figure 1, top row).

Although the number of pictures is limited — especially compared to the massive datasets typically involved in computer vision applications —, their high resolution allows dividing them into smaller patches. In this way, and together with typical augmentation pipelines, the total amount of data can be increased by hundreds of times, thus guaranteeing sufficient information to learn from.

Ground-truth masks

Alongside the images, the data contain the corresponding ground-truth annotations for semantic segmentation, i.e. binary masks where each pixel is labeled as 255 (white) if it belongs to a cell, and 0 (black) otherwise (see Figure 1, bottom row).

In order to mitigate the effort for labeling, a semi-automatic procedure was exploited, involving adaptive thresholding and manual segmentation by domain experts.

In particular, the majority of the images (252 pictures) were first annotated automatically through adaptive thresholding. This means that a luminosity cutoff was chosen for each image depending on its distribution of pixel intensities, and all pixels above that threshold were considered as cells.

Figure 2. Adaptive thresholding — Image by the author

These drafts were then refined manually to exclude false positives and/or add false negatives.

The remaining 31 images, instead, were segmented manually by domain experts. Particularly relevant examples were included in the latter set, with the intent of collecting highly accurate annotations for the most challenging observations.

Learning Tasks

The Fluorescent Neuronal Cells dataset can be leveraged to study different learning tasks.

Having binary ground-truth masks, the more natural approach is to exploit the labels as-is in a semantic segmentation setting. Alternatively, one can easily extend it for object detection by drawing bounding boxes around the segmented objects. Also, one can focus on object counting by considering only the total number of cells in each image, thus analyzing the data as a regression problem.

In this first article, we introduced the Fluorescent Neuronal Cells dataset: a curated set of fluorescence microscopy images and the relative ground-truth masks.

We covered its origins, the data format and briefly mentioned some learning problems that can be studied using these data.

In the next article, we will go through the data in more detail, highlighting some peculiar traits and challenges connected to these data.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: