Learning Semantics-Enriched Representation via Self-discovery, Self-Classification, and…


Original Source Here

Learning Semantics-Enriched Representation via Self-discovery, Self-Classification, and Self-Restoration: A Summary

Photo by Jonathan Borba on Unsplash

One of the primary problems with applying machine learning and deep learning models to medical imaging tasks is the scarcity of sufficient data to train the model. Manual generation and labelling of medical images are costly and time taking as highly trained experts are needed to understand and label the medical images correctly. To counter the problem of scarce data in computer vision, transfer learning techniques such as pretraining and fine-tuning are usually used, where a model is first trained with data in another domain(usually one where a lot of training data is available), and this pre-trained model is then fine-tuned to the domain having a few labelled data. In the case, where a lot of unlabelled data is available self-supervised learning techniques are usually used which exploit useful information from the training data to pre-train the model on these unlabelled images. To know more about how self supervised learning differs from other training paradigms, please refer to this article by Louis Bouchard.

An important step in any self-supervised learning algorithm is to determine the learning signals and the properties of the data which can be exploited for the model training. When the data is an image, techniques such as colorization[1,2], jigsaw[3,4], rotation[5,6] and many others are used to pre-train the model from the unlabelled data. Colorization techniques usually try to predict the color properties of an image from its grayscale counterpart. Jigsaw techniques damage an image and train the network to recover the original image. Rotation techniques try to predict the image rotation.

Although these self-supervised techniques work well with natural images, however, these are not the most optimal techniques for pretraining from medical datasets. Medical data has repeating anatomical patterns which can also be exploited as a learning signal to pre-train the model. This paper[8] introduces a self-supervised pretraining method that exploits the repeating patterns in a medical image to learn pre-trained models better suited to various medical imaging tasks. Figure 1 shows an example of recurrent patterns in medical images.

Figure 1. Recurrent patterns in medical images. Source link.

The self supervised technique to exploit recurrent anatomical patterns in this paper[8] introduces three steps namely — self discovery of anatomical patterns in similar patients, self classification of learned anatomical patterns, and self restoration of transformed patterns. The model in its entirety is called Semantic Genesis. Just using the self restoration module without the self classification and self-discovery is one of the earlier papers from the same research group, and is referred to as Models Genesis[7].

The self classification module helps the model in learning the semantics of the image and the self restoration helps the model in learning the visual properties of the data such as appearance, texture, geometry etc. We will go over each of these steps next.

Self Discovery — The goal of this step is to identify the repeating anatomical patterns from the unlabelled images. This primarily consists of 3 steps —

  1. Train an autoencoder with unlabelled images. To know more about autoencoders please refer to this comprehensive article written by Matthew Stewart. The latent representation of the image is used as an identifier for the image, which means that for future steps we use the learned latent representation of the image, instead of the original image.
  2. Randomly select a reference image, and then find k nearest images to the reference in the latent space(distance is measured on the latent representation of the image and not on the original image). Note – k is a hyperparameter and the choice of values used in the paper is discussed in the Experiments section.
  3. Choose n random points in all these similar images and crop a patch. Assign pseudo labels to the patch. These patches contain recurrent patterns in the similar images discovered in step 2. The number of patches and thus pseudo labels(C) is another hyperparameter and the values used in the paper are mentioned in the Experiments section.

At the end of the self discovery process, we have a collection of patches with pseudo labels assigned, possibly capturing some useful anatomical patterns in each of the patches. Figure 2 shows the complete self discovery process.

Figure 2. Source link.

Self Classification — This step exploits the labelled patches obtained after the self discovery step to train a multi-class classifier for predicting the pseudo labels correctly. The classifier has an encoder-like network followed by a fully connected layer. The encoder is shared with the self-restoration step discussed next. The idea is that by training the classifier to predict the correct pseudo labels of the recurrent anatomical patterns discovered in the self-discovery step, the learned weights of the model store information about these semantic structures in the image.

Self Restoration — This step first modifies the image with certain transformations(will discuss the transformations later), and then tries to reconstruct the original image from the transformed image using an encoder-decoder network. Training the model to reconstruct the original image helps in learning various visual representations.

The encoder is the same one used in the self classification step. The self-classification and the self restoration networks are trained together in a multi-task learning format. Figure 3 shows the self classification and the self restoration modules.

Figure 3. Self Classification and Self Restoration Module. Note that the encoder is common for both the modules and the transformation is done only for self restoration. Source link.

The visual properties learned by the model depend on the type of transformations done to the image before restoration. There are 4 types of transformations discussed in the paper — non-linear, local pixel shuffling, out-painting and in-painting.

Learning appearance via non-linear transformations — This paper uses the Bezier curve(video explanation), as the non-linear transformation, which assigns a unique value to each pixel. The restoration of the original image teaches the network about the organ appearance, as the intensity values in the medical images give insights into the organ structures.

Learning local boundaries and texture via local pixel shuffling — Local pixel shuffling involves shuffling the pixel orders in a randomly chosen window from a patch to obtain a transformed patch. The size of the window is chosen such that the global content of the image is unchanged. The restoration from this transformation learns the local boundaries and texture of the image.

Learning context via out-painting and in-painting — In both out-painting and in-painting, a single window of a complex shape is obtained by superimposing windows of different sizes and aspect ratios on top of each other.

Out-painting — Assigns random pixels outside the window, while retaining the original intensities for the pixels within. Restoring from out-painting learns global geometries and spatial layout.

In-painting — Retains the original intensities outside the window, and replaces intensity values of inner pixels. Local continuities of organs are learned in the restoration process from an in-painted image.

Figure 4 shows the visualization of each of these transformations applied to a CT image.

Figure 4. Transformations done on 3D CT images. Source link.

Training — The entire model involving the self classification and the self restoration module is trained together in the multi-task learning paradigm. This essentially means that the loss function used to train the entire model is a weighted sum of the loss functions of the self classification(categorical cross-entropy loss) and self restoration(reconstruction loss) module. The weights of the individual loss functions is a hyperparameter learned empirically.

Fine tuning and model reuse — After training the model using self discovery, self classification and self restoration, different components of the model can be reused and fine-tuned for the target task domain. For image classification tasks the encoder of the model is reused. For image segmentation tasks both the encoder and the decoder are reused.


The model is trained on two different datasets based on the target image modalities. Publicly available CT scans are used for 3D image modalities and X-ray is used for 2D image modalities.

Training DatasetsLUNA 2016[9](Creative Commons Attribution 4.0 International License) consisting of 623 CT scans and Chest X-Ray 14[10](CC0: Public Domain) consisting of 75,708 XRay images are used for training the Semantic Genesis model.

Hyperparameters —

  • For self discovery, top k similar patients are selected. k is empirically set to 200/1000 for 2D/3D cases.
  • C(number of pseudo labels) is set to 44/100 for 3D/2D images to cover the entire image while avoiding overlap.

Baselines — Across all the experiments, the models are evaluated on six publicly available medical imaging applications across classification and segmentation. Figure 5 shows the different tasks used for evaluating the models.

Figure 5. Datasets used for evaluation. Source link.

Evaluation/FineTuning Datasets- LUNA-2016[9]( Creative Commons Attribution 4.0 International License), LIDC-IDRI[16]( Creative Commons Attribution 3.0 Unported License), LiTS-2017[17](Attribution-NonCommercial-NoDerivatives 4.0 International), BraTS2018[18], ChestX-Ray14[10](CC0: Public Domain), SIIM-ACR-2019[19]

Pretrained 3D models for 3D transfer learning — NiftyNet[11], MedicalNet[12], Models Genesis[7], Inflated 3D[13].

Pretrained Self supervised learning — Image in-painting[14], patch shuffling[15], Models Genesis[7].


  1. Adding self classification and self restoration to existing self supervised learning approaches

Figure 6 compares the results of adding semantics(self restoration +self classification) on top of existing self supervised learning methods of Inpainting[14], Patch Shuffling[15] and Models Genesis[7]. Note — Models Genesis is a paper by the same research group, which involves just the self restoration module without the self discovery and self classification module.

The experiments are performed across 3 different domains(NCC — Lung Nodule Classification on CT images, LCS — Liver Segmentation on CT images, BMS — Brain Tumor Segmentation on MRI images). Adding the semantics on top of existing self supervised learning techniques results in improvements across these 3 domains.

Figure 6. Source link.

2. Comparing Semantic Genesis 3D with pretrained 3D models — This experiment compares semantic genesis with other pretrained(supervised and self-supervised) 3D models. The results(Figure 7) are evaluated on 4 of the 6 tasks which involve 3D images(CT and MRI images).

Figure 7. Source link.

3. Comparison of self classification and self restoration module — The self restoration and self classification are compared separately to the combined Semantic Genesis methods. The results(Figure 7) show two important conclusions. Firstly, the combination of self restoration and self classification outperforms the individual components across three out of the four different tasks. Secondly, self classification shows better performance in some tasks and self restoration is better in other tasks showing that they learn complementary features, and adding them together leads to learning extra features than using each one of them individually.

4. Semantic Genesis 3D in comparison to 2D slice-based approaches — Often tasks in 3D imaging modalities are reformulated and solved in 2D. This experiment compares the Semantic Genesis 3D to the 2D slice-based approaches. The results are evaluated in two 3D imaging modalities(NCC — lung nodule detection on CT, NCS — lung nodule segmentation on CT images). The results(First two results in Figure 8) show that Semantic Genesis 3D outperforms other 2D slice-based approaches.

Figure 8. Source link.

5. Comparison of Semantic Genesis 2D with other pretrained 2D models — The comparison is done on 2 medical imaging tasks(PXS — Pneumothorax Segmentation on Xray images, DXC — Chest disease Classification on XRay images) including 2D Xray images, and two 3D medical imaging tasks(NCC and NCS). The results(Figure 8) show that Semantic genesis outperforms in PXS and has equivalent performance to ImageNet in NCC and NCS.


This paper provides a model and training algorithm to learn better representations and better pretrained models for medical imaging tasks, which can be fine tuned to different medical image domains to counter the data scarcity problem in medical application tasks. The paper designs the model to utilise the recurrent anatomical patterns in the medical images and exploits them in a self-supervised training paradigm. I feel the idea and the results are very promising and can be used as a pretraining method for medical classification/segmentation tasks, although the implementation is more time taking and complex compared to publicly available pretrained image net weights.

You can find the official GitHub implementation of the paper at the following URL — https://github.com/fhaghighi/SemanticGenesis.

I hope you find this article helpful and insightful. You can find other paper summaries I have written here and here.

Please follow my profile to get notified of my future articles.


  1. Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: European Conference on Computer Vision. pp. 577–593. Springer (2016)2.Larsson, G., Maire, M., Shakhnarovich, G.:
  2. Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6874–6883 (2017)
  3. Kim, D., Cho, D., Yoo, D., Kweon, I.S.: Learning image representations by completing damaged jigsaw puzzles. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). pp. 793–802. IEEE (2018)
  4. Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: European Conference on Computer Vision. pp. 69–84. Springer (2016)
  5. Feng, Z., Xu, C., Tao, D.: Self-supervised representation learning by rotation feature decoupling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 10364–10374 (2019)
  6. Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018)
  7. Z. Zhou, V. Sodha, M. M. Rahman Siddiquee, R. Feng, N. Tajbakhsh, M. B. Gotway, and J. Liang, “Models genesis: Generic autodidactic models for 3d medical image analysis,” in Medical Image Computing and Computer Assisted Intervention — MICCAI 2019. Cham: Springer International Publishing, 2019, pp. 384–393.
  8. F. Haghighi, M. R. Hosseinzadeh Taher, Z. Zhou, M. B. Gotway, and J. Liang, “Learning semantics-enriched representation via self-discovery, self-classification, and self-restoration,” in Medical Image Computing and Computer Assisted Intervention — MICCAI 2020. Cham: Springer International Publishing, 2020, pp. 137–147.
  9. Setio, A.A.A., Traverso, A., De Bel, T., Berens, M.S., van den Bogaard, C., Cerello, P., Chen, H., Dou, Q., Fantacci, M.E., Geurts, B., et al.: Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge. Medical image analysis 42, 1–13 (2017)
  10. Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., Summers, R.M.: Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2097–2106 (2017)
  11. Gibson, E., Li, W., Sudre, C., Fidon, L., Shakir, D.I., Wang, G., Eaton-Rosen, Z., Gray, R., Doel, T., Hu, Y., et al.: Niftynet: a deep-learning platform for medical imaging. Computer methods and programs in biomedicine 158, 113–122 (2018)
  12. Chen, S., Ma, K., Zheng, Y.: Med3d: Transfer learning for 3d medical image analysis. arXiv preprint arXiv:1904.00625 (2019)
  13. Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 6299–6308 (2017)
  14. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2536–2544 (2016)
  15. Chen, L., Bentley, P., Mori, K., Misawa, K., Fujiwara, M., Rueckert, D.: Selfsupervised learning for medical image analysis using image context restoration. Medical image analysis 58, 101539 (2019)
  16. Armato III, S.G., McLennan, G., Bidaut, L., McNitt-Gray, M.F., Meyer, C.R., Reeves, A.P., Zhao, B., Aberle, D.R., Henschke, C.I., Hoffman, E.A., et al.: The lung image database consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on ct scans. Medical physics 38(2), 915–931 (2011)
  17. Bilic, P., Christ, P.F., Vorontsov, E., Chlebus, G., Chen, H., Dou, Q., Fu, C.W., Han, X., Heng, P.A., Hesser, J., et al.: The liver tumor segmentation benchmark (lits). arXiv preprint arXiv:1901.04056 (2019)
  18. Bakas, S., Reyes, M., Jakab, A., Bauer, S., Rempfler, M., Crimi, A., Shinohara, R.T., Berger, C., Ha, S.M., Rozycki, M., et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)
  19. Siim-acr pneumothorax segmentation (2019), https://www.kaggle.com/c/ siim-acr-pneumothorax-segmentation/


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: