Build a Semantic Segmentation Model With One Line Of Code

Original Source Here

Build a Semantic Segmentation Model With One Line Of Code

Image by MartinThom on Wikimedia Commons (edited)

When performing a new image segmentation task, you test multiple models to find the one that works best for you. This may take a long time, as changing the encoder or architecture requires you to write a lot of boilerplate code at each attempt. If you have this issue, today is your lucky day. Let me show you the power of the Segmentation Models library with a little spoiler;

The following code shows how to create an FPN model with a ResNet-50 encoder pre-trained on ImageNet:

model = sm.FPN("resnet50", encoder_weights="imagenet", classes=10)

There are two versions of the library, one for TensorFlow and one for PyTorch.

To install Segmentation Models latest version from PyPI:

For TensorFlow:
pip install segmentation-models
For PyTorch:
pip install segmentation-models-pytorch

Building Blocks

You have the choice between different architectures with different encoders.
Each architecture has a related class, and each encoder has a related string. In order to build a model, you instantiate an object from the selected architecture class and choose the encoder by passing its string name.
I’m going to show you this in the following section.

Note: I list all the architectures in the format official_name— class_name and the encoders in the format official_name x₁/x₂/xₙ — stringY where x₁…xₙ are the possible choices for that encoder (for example, there is ResNet 18, 34, 50, 101 and 152) and Y is your choice.

These are all the architectures and the most common encoders;


  • U-Net¹ — Unet
  • U-Net++² — UnetPlusPlus*
  • DeepLabV3⁴ — DeepLabV3*
  • DeepLabV3+⁵ — DeepLabV3Plus*
  • MANet⁴ — MAnet*
  • Linknet⁶ — Linknet
  • FPN⁷ — FPN
  • PSPNet⁸ — PSPNet
  • PAN⁹ — PAN*


  • VGG¹⁰ 11*/13*/16/19 — vggX
  • ResNet¹¹ 18/34/50/101/152 — resnetX
  • ResNeXt¹² 50/101 — TensorFlow: resnextX, PyTorch: resnextX_32x4d
  • SENet¹³ 154 — senet154
  • SE-ResNet¹³ 18**/34**/50/101/152 — TensorFlow: seresnetX, PyTorch: se_resnetX
  • SE-ResNeXt¹³ 50/101/152*— Tensorflow: seresnextX, PyTorch: se_resnextX_34x4d
  • DenseNet¹⁴ 121/161*/169/201 — densenetX
  • EfficentNet¹⁵ b0/b1/b2/b3/b4/b5/b6/b7 — TensorFlow: efficentnetX, PyTorch: efficentnet-X

* Not available in TensorFlow version
** Not available in PyTorch version

A comprehensive list of encoders you can use is available for TensorFlow and for PyTorch.

Build a Model

Once you have selected the architecture and encoder, building the model is pretty simple.

You need to instantiate an object from the architecture class (you can find it in the list above), and you pass the name of the chosen encoder.

The code below shows all the parameters needed to create a model both in TensorFlow and PyTorch versions.


In the TensorFlow version, if you want to train with non-RGB data (so with input channels not equal to 3) using ImageNet pre-trained, you have to add an extra convolution layer to map your N channels to 3 channels. The complete procedure is described here.


In the PyTorch version, if you want to train with non-RGB data (so with input channels not equal to 3) using ImageNet pre-trained, the first layer will be initialized by reusing the weights from the pre-trained first convolution. If you are interested in how they are initialized the procedure is described here.

Once you have constructed the model, you can start the training as usual.

Last Words

The library also offers many implementations of loss functions like Jaccard Loss, Dice Loss, Dice Cross-Entropy Loss, Focal Loss, and metrics like IOUScore, F1Score, F2Score, Precision, and Recall. These are the losses and metrics from the TensorFlow version, and these are the losses and metrics from the PyTorch version. I didn’t speak about them because I wanted to focus on how to make a model.

I think it’s an excellent library for testing different models because, as you saw, it’s very easy to build one and apply changes.
I hope you will find it useful and begin to use it!

[1] O. Ronneberger, P. Fischer and T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation (2015)

[2] Z. Zhou, Md. M. R. Siddiquee, N. Tajbakhsh and J. Liang, UNet++: A Nested U-Net Architecture for Medical Image Segmentation (2018)

[3] L. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking Atrous Convolution for Semantic Image Segmentation (2017)

[4] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, H. Adam, Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (2018)

[5] R. Li, S. Zheng, C. Duan, C. Zhang, J. Su, P.M. Atkinson, Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images (2020)

[6] A. Chaurasia, E. Culurciello, LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation (2017)

[7] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature Pyramid Networks for Object Detection (2017)

[8] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, Pyramid Scene Parsing Network (2016)

[9] H. Li, P. Xiong, J. An, L. Wang, Pyramid Attention Network for Semantic Segmentation (2018)

[10] K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition (2014)

[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition (2015)

[12] S. Xie, R. Girshick, P. Dollár, Z. Tu, K. He, Aggregated Residual Transformations for Deep Neural Networks (2016)

[13] J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu, Squeeze-and-Excitation Networks (2017)

[14] G. Huang, Z. Liu, L. van der Maaten, K. Q. Weinberger, Densely Connected Convolutional Networks (2016)

[15] M. Tan, Q. V. Le, EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (2019)


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: