Deploy MNIST Trained Model as a Web Service

Original Source Here

Deploy MNIST Trained Model as a Web Service

I cover the training, service and client implementation. The service receives an image of a hand-written digit between 0 and 9 (tensor format) and guesses which number the image represents

Image by the author.

In one of my articles on deep learning, I teach how to implement a simple image recognition system. The program does everything coded in one file — dataset loading, model definition, training and evaluation.

In this post, I’ll walk you through how to save the model and load it from a service implemented using Flask (Python web framework).

I’ll also show how to build a simple client to invoke the service.

The code is available on Github. The repo contains the following files,


Training and Saving The Model

I covered the training phase in my previous post. I used PyTorch.

It’s a simple neural network with one input layer, two hidden layers and one output layer.

The example I use is the “Hello World” of image recognition. It’s great for beginners starting with deep learning.

After training, I save the model learned weights/parameters. Saving the learning parameters or also referred to as “saving for inferencing”. It’s the recommended approach.

Although it’s easier (less code) to save the entire model, it’s not advised because the data is bound to the classes and directory structure when the model was saved.

Below is how I save the model,, "model.pth")


For developing the service, I use a Python web framework called Flask. The implementation contains only one endpoint (/guess).

As a reference, I used this article by Tanuj Jain. Tanuj also teaches how to deploy in a cloud virtual machine (VM).

When is executed, the program loads the model (previously saved). The model is loaded only once.

Request and response

The service receives a JSON containing the hand-written digit (tensor representation). It converts the string JSON to an array and then converts it to a tensor.

Next, it flattens the tensor to size 784 (28 times 28) and passes to the model. The model output (10 nodes) contains the guess for each digit from 0 to 9.

torch.argmax returns the maximum value of all output nodes. Lastly, the service response is a string.
Figure 1. Service running. Image by the author.

The service runs on port 8888. It could be any port. If you choose port 80 or another standard port number, you may face a “Permission” error.

Stopping the service

If you need to stop the service, the method I used is simply looking for the process based on some keyword and killing it.

ps -ef | grep
kill -9 <process>


The client is simple. It’s a program that loads the MNIST data set and passes some hand-written digit (tensor representation) to the service. You choose which number to send.

Before invoking the service, the code converts the tensor to a list and wraps it in JSON. Also, before calling the service, it shows the digit as an image to compare it with the service’s response.
Figure 2. Response from service. Image by the author.

Final Thoughts

I hope you enjoyed this tutorial. The example shown here is for you to get started and evolve to something more interesting. For instance, imagine writing a number on the paper, pointing to your camera, and having the service invoked.

The most critical step is learning to save, load the model, and have it as a service. Explaining how to deploy in different cloud vendors comes next. I plan to write a tutorial on it.

That’s it for now. Thanks for reading.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: