Detect the Defects in Steel with Convoluted Neural Networks (CNNs) and Transfer Learning*edJXh3LPR9st-XRz

Original Source Here

Detect the Defects in Steel with Convoluted Neural Networks (CNNs) and Transfer Learning

Detect the defects in steel with the help of machine learning and data science in order to provide insights to the engineers and manufacturers of steel.

Steel is used in a wide range of applications such as in building railways, buildings, roads, appliances and other infrastructures. Things such as stadiums and bridges are mostly created with the help of steel. By looking at the various ways in which steel is used, we find that it is quite ubiquitous in nature respectively.

Photo by Luca Upper on Unsplash

Note: The owner of the dataset is PAO Severstal (

There are often possibilities, however, during the manufacture of steel where defects go unnoticed and unidentified. As a result, the final product of steel is left with quality compromised. In other words, we get output from steel that has defects and does not provide a lot of functionality for our applications of building bridges and railways. In addition to this, it is also quite difficult for humans to identify defects as it is often time-consuming and difficult process.

With the aid of Convoluted Neural Networks, nonetheless, it is possible to detect and accurately identify the defects in steel along with localization so that it would be really handy for engineers and manufacturers. This would significantly reduce the time to produce good quality steel. When localization is also provided, it would help the engineers to spot the design errors and material problems that are causing this defect in steel. We mostly consider identification of defects of steel in this article without including localization.

In this article, I highlight my GitHub project of Steel Defect Detection and specify the steps taken to predict the defects. Therefore, this reduces the time and effort on the part of engineers and manufacturers in the production of steel.

Reading the libraries

It is now time to see how convoluted neural network work and also how transfer learning could significantly improve our accuracy. Below are the libraries we are going to be using in this project.

These are some basic libraries that must be imported for performing convoluted neural network computation.

Numpy is used for converting lists to arrays, computation and so on.

Seaborn is used to help us with plotting interactively.

Pandas is used to read and write the data which is present in “.csv” files in our case.

cv2 is an efficient library for reading the images and converting them to pixel values or mathematical values for convolution.

Matplotlib is used for plotting the visualizations so that a user is able to understand the working respectively.

Tensorflow library is used for training and performing computation for images that we are going to be using in our project.

Specifying train and test path

We would also specify the path of the files for the training and test set. That is, we need to look for the directories which contain our images of steel. After specifying this path, we are going to read the images in the form of arrays so that computation is performed with it.

Transfer Learning

Transfer learning reduces our training time significantly and also improves the accuracy of the model as well. In transfer learning, the model that is already trained for a specific dataset is being used for our application. Rather than training our model from scratch which requires a lot of time, we use pre-trained models with all the learned weights from the previous dataset so that it yields good result for our tasks of classification of steel. To learn more about transfer learning, I would suggest checking out this blog by Machine Learning Mastery.

A Gentle Introduction to Transfer Learning for Deep Learning (

VGG19 Network

In the above coding cells, we see the headmodel variable that takes into account the VGG19 model. The pre-trained weights are taken from the imagenet dataset. We use VGG19 as our headmodel or our starting model. After adding this architecture, we take the output from the layers and perform average pooling in 2 dimensions. Later, there is a flattening of the layers so that, finally, it is converted to a 1D array which is used for our training respectively. We make all the layers from the headmodel (VGG19) non-trainable which means that those weights would not be modified when we are training with our steel dataset.

Compiling the model

It is now time to compile the model along with selecting the right loss, optimizer and the metrics for our final model. We chose the categorical_crossentropy as it is a multi-class classification problem. Adam optimizer is selected as it is good in reducing the loss to a large extent without much noise. The metric that is going to be displayed is the accuracy.

Defining checkpoint

Model callbacks are used in deep learning when a model reaches a particular outcome in the task of training a model. We use checkpoint variable that stores the best model based on the cross-validation accuracy as seen above. When fitting the model, the checkout variable is given so that the callbacks are initiated.

Fitting the final model and taking the parameters in fitted_model variable so that it could be used for the plots and evaluating the performance of the machine learning models. We train with a smaller number of epochs (10) along with printing the cross-validation accuracy.

Evaluating the model

One of the best ways to evaluate the performance of the deep learning and convoluted network (CNN) models is to include plots of their training error and the cross-validation error. In order to understand whether the model is overfitting or underfitting, it is good to see how the value of the error is decreasing or increasing with the increase in the number of epochs. For more information about what overfitting and underfitting is, feel free to refer the article I’ve written below.

What is Overfitting and Underfitting in Machine Learning? | by Suhas Maddali | | Medium

Let us also explore other networks to pick the best network for our task of predicting the defects in steel.

EfficientNetB0 Network

It is now time to use the efficient net to see how well the model would be performing and whether the best model reduces the cross-validation error along with increase in the accuracy. It is important to note that there are multiple variants of efficient nets, but we use EfficientNetB0 as a starting point. We usually don’t change the final layers of the network. We are again using the pre-trained EfficientNet model for imagenet data. But we are going to be training the last few layers of our network for our task of detecting the defects in steel. The above steps that were taken for the VGG19 model are again applied. That is, we would be fitting the model and storing the best model in ‘.h5’ files and also evaluating the model. Let us also go over other networks that improve the accuracy of the cross-validation data.

Xception Network

Similarly with the aid of Xception network with the pre-trained weights on imagenet, we are going to be taking a look at how there would be a decrease in the cross-validation error or increase in the accuracy. For the Xception to perform the best, the input must be of the shape (299, 299, 3) respectively. Therefore, the shape of the input with the dimensions is specified when initiating the network.

The same optimizers and loss are specified for Xception network with the metric being accuracy for the multi-class classification problem.

We fit the model and store the results again in fitted_model variable which is later used for plotting respectively. Taking a look at the plots after performing the training helps understand whether the model is having high variance or bias respectively. Based on the plots, action is taken whether to increase the number of epochs or increase the number of training examples.

InceptionV3 Network

It is now time to initialize the inceptionV3 network for our task of predicting the defects in steel. InceptionV3 is a network that is developed by Google. It initially starts with a smaller number of channels. As we forward propagate deep into the network, it is evident that the number of channels increases while the convolution operation reduces the width and height of the activation units. I know this sounds a lot. You might take a look at the website below to get a good understanding of InceptionV3 network.

Advanced Guide to Inception v3 | Cloud TPU | Google Cloud


After training and testing the models, I found that the InceptionV3 network performs the best compared to the other networks. However, this depends on the problem or the task that we have considered. For other tasks such as imagenet, efficient net performs the best while also significantly reducing the training time. Therefore, there might be many models that perform well on specific set of tasks. It is important to consider all the models before deploying the best model in production. Hope you found this article helpful. Below are the details where you could contact me or take a look at my work.

GitHub: suhasmaddali (Suhas Maddali ) (

LinkedIn: (1) Suhas Maddali, Northeastern University, Data Science | LinkedIn

Medium: Suhas Maddali — Medium


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: