Comprehensive Guide to Deploying Any ML Model as APIs With Python And AWS Lambda

Original Source Here

Comprehensive Guide to Deploying Any ML Model as APIs With Python And AWS Lambda

Photo by Suzy Hazelwood


People say that the best thing about Python is the abundance of its open-source libraries. I beg to differ. Having many alternatives for doing a single task might give you a lot of flexibility, but they might just end up creating a giant mess that leaves programmers confused and angry.

Right now, it is the case regarding Machine Learning Operations (MLOps). It has become such an essential field in recent years that the Python community has risen to the occasion splendidly — by creating as many tools and libraries as possible for performing MLOps disciplines. Now, it is challenging for a regular user to sift through all the information, compare and come up with a final set of tools for solving their business problems.

Even if they do, there is an entirely separate issue of putting them together, making them communicate, and reaching a ready, presentable machine learning solution.

Take deploying models as APIs as an example (this article’s topic). Which set of tools would you choose to accomplish the job? For a moment, let’s forget all the starting fuss like data versioning, model selection, and feature engineering and say you have a ready model.

How do you convert the model into an HTTPS address so users can send requests? There are so many options like Flask, Django, FastAPI, etc. Even after choosing the framework, getting it online isn’t easy. Do you use AWS, Heroku, GCP, or dozens of others who practically claim they offer the “only best way” to do it?

To save you the trouble of answering all these questions, in this article, I will present a solution that should be pretty straightforward and hassle-free compared to other alternatives. You will only need to know scikit-learn and xgboost; the rest I will teach.

In the end, we will build this API:

Image by author

Users can send requests to its classify endpoint to get predictions and embed this API anywhere with a single link.

The stack we will use

Before we jump in, let me outline the list of technologies we will use and explain the reasons I chose them.

  • Pandas, NumPy — no explanation needed.
  • Sklearn and XGBoost — model training. They are only chosen for illustration purposes, and the instructions will work on models from any other framework
  • BentoML — packaging the trained model into a local API (will be explained in detail later)
  • bentoctl— a CLI tool that lets you build Docker images and save them in cloud-compatible format with a single command
  • Terraform — another CLI tool for managing cloud infrastructure. It takes care of creating resources and functions and pushing containers to them without needing to visit half a million pages.
  • AWS — the most popular, reliable, and flexible cloud platform (you can confirm via Google Trends).
  • AWS Lambda — run code on the serverless cloud — highly scalable and comfortable.

So, without further ado, let’s jump in.

What is BentoML and its purpose?

To maximize the business impact of machine learning, the hand-off between data scientists and engineers from model training to deployment should be fast and iterative. However, data scientists often don’t have the skills to package trained models and push them to the engineers, while the engineers find it hard to work with models from dozens of different ML frameworks.

BentoML was created to solve these issues and make the hand-off to production deployment as easy/fast as possible. In the coming sections, you will see how BentoML makes it easy to perform tedious MLOps operations. The examples are:

  • Saving any model of any framework into a unified format to ease collaboration.
  • Create an HTTP API endpoint with a single Python function for faster iteration.
  • Containerize everything the model needs using Docker with a single CLI command.

Read the docs here to learn more about it, or continue reading.

Dataset preparation and model training

The crux of the article is about model deployment, so I want to concentrate all your attention on that area only. For that purpose, I assume you are reading this article with your best-trained model already in hand and want to deploy it as soon as possible.

To simulate that here, we will create a synthetic dataset, train an XGBoost model, and move forward as though you have done all the previous steps of the MLOps life cycle like data cleaning, exploration, feature engineering, model experimentation, hyperparameter tuning, and found the model that performs best on your dataset.

We create a simple dataset with seven features and 10k samples with a binary classification target.

We load it back, train a vanilla XGBoost classifier, and pretend that it is our best-tuned model.

Saving trained models to BentoML format

Saving a trained model into a BentoML-compatible format is done by calling the framework-specific save command:

The returned object is an instance of BentoML Model class with a label called a tag.

The tag consists of two parts — a name given by the user and a version string to differentiate between models saved at different times. Even if an identical model is saved, BentoML will create a new directory and a version string.

BentoML supports almost all essential ML frameworks:

  • Classic: Sklearn, XGBoost, CatBoost, LightGBM
  • Deep learning: TensorFlow, PyTorch, PyTorch Lightning, Keras, Transformers
  • Others: ONNX, MLFlow,, statsmodels, spaCy, h2o, Gluon, etc.

Each of the frameworks has a corresponding bentoml.framework.save_model command.

When a model is saved, it goes into a local directory called BentoML model store. From the last output (the path parameter), we saw that my model store resides in /home/bexgboost/bentoml/models. You can see the list of all your models by calling the bentoml models list command in the terminal:

You can also see models from my other projects.

Note: in BentoML docs and this article, the names “model” and “tag” are used interchangeably to refer to saved models in the model store.

The save_model has other parameters for passing extra information about the model, from metadata to additional user-defined objects (e.g. feature importances of your model as a separate object):

Above, we are saving our XGBoost model with additional data like the author (me), feature importance scores, which you can retrieve with booster.get_score and a dummy metric.

Sharing models

Models in the BentoML model store can be shared as standalone archives using the bentoml models export command:

When you don’t know the exact version string of your tag, you can use the “:latest” suffix to choose the most recent. With the above command, we are exporting the classifier into a .bentomodel archive to the models directory. When a teammate sends you a .bentomodel archive, you can use the import command to send it to your local BentoML model store:

Retrieving saved models

There are a few ways of loading saved models from the model store into your environment. The simplest one is the load_model function. Like save_model, load_model is also framework-specific:

The function will load the model in the same format it was before it was saved, meaning you can use its native methods like predict:

To load the model as a BentoML Model object, you can use the models.get command, which IS NOT framework-specific:

tag = bentoml.models.get("xgb_custom:latest")

The reason you might want to load the model in this format is that now, you can access its add-ons like metadata and labels:

The final and most important way of retrieving models is by loading them as runners:

Runners are special objects of BentoML that are optimized to use system resources in the most efficient way possible based on their framework. Runners are the core components of the APIs we will build in the next section.

Now, we are ready to start building the API!

Organize into scripts

Up until now, we have been using notebooks. We need to switch to Python scripts to build an API service. Let’s organize the code of the previous sections. In file, create a function that saves the synthetic data from the “Dataset Preparation” section:

The full script can be found here.

In a file, create a function that trains our XGBoost classifier and saves it to the BentoML model store:

The full script can be found here.

For completeness, run both scripts in the correct order to generate the dataset and save a new model to the model store.

Creating an API service script

Now, it is time to create the local API. For that, we will only need a simple script that starts like the below:

After loading our model with models.get as a runner, we create an object called svc. It will be an instance of a BentoML Service object. Service is a high-level class that abstractly represents our API.

We add a single endpoint called classify to the service object by creating a function with the same name:

Let’s understand the above snippet line-by-line.

First, we are importing a new class called NumpyNdarray from – Input/Output module. To standardize inputs and outputs, BentoML offers several classes like NumpyNdarray such as, Text, File, Image, PandasDataFrame, etc.

Adding these classes to the input and output arguments of the svc.api decorator ensures that correct datatypes are passed to our API endpoint. In our case, we are making sure that the data passed to our classify function is always a NumPy array. If we were working with image models, our input class could be a File or Image class, while the output would be NumpyNdarray again.

Inside the function, we are using the method of our runner to get a prediction on the input. The method calls the predict method of the XGBoost booster object under the hood.

Here is what the script looks like in the end:

That’s it! By using bentoml serve, we can create a local debug server:

$ bentoml serve --reload

Important: and svc variables in the above command changes based on your script name and the name of the service object. If you had a service object named api in a script called, the command would be bentoml serve --reload. The --reload tags ensures that BentoML detects changes made to your script without needing to restart the server.

Here is a sample output of the command:

Image by author

The GIF shows that the API is live locally on

GIF by author

By using Swagger UI, BentoML shows you interactive documentation of our API. We can already send requests to it to get predictions:

Yay! Our API works — it classifies the given sample as the first class.

Building a Bento

Now, we package our API into a Bento.

Bento is a term introduced by BentoML that refers to an archive that contains everything needed to make our API work — in a unified format. Inside a Bento, there will be instructions on building a Docker image and the dependencies of our API model.

We start building the Bento by creating a bentofile.yaml file (the name should be the same) in the same directory level as our scripts. It will look like below:

The include field is used to list all the scripts needed to run our Bento. In our case, we are just adding all Python scripts. Next, there is the packages field, which should list dependencies and their versions.

In large projects, it is hard to keep track of all the dependencies and their versions, so you can use the pipreqs library to create a requirements.txt file for your project:

$ pip install pipreqs
$ pipreqs --force . # --force overrides existing req file

Then, you can copy the dependencies to the bentofile.yaml.

After you have the bentofile.yaml ready, we call the build command and specify the path to our bentofile.yaml, which is the project root (.) in our case:

$ bentoml build .

If successful, you will see an output like the below:

Image by author

You can also check the Bento by running bentoml list in the CLI:

As you can see, three Bentos are on my system, two of which belong to the current project. Having a ready Bento means we are ready to deploy it online.

Setting up AWS credentials

Since we are deploying our API as an AWS Lambda function, we must set up our AWS credentials. The best way to do that is via the CLI.

First, go to your AWS IAM console and find your credentials. It is usually available via the button “Manage access keys button”:

Image by author

Click on it, and you will arrive here:

Image by author

Create a new set of keys or download an existing one. Then, go to your terminal and run the below commands:

For Linux or osX:

For Windows:

This will set up your AWS credentials ONLY FOR THE CURRENT virtual environment.

Deploying the Bento to AWS Lambda

To deploy our Bento, we will use BentoML’s native CLI package called bentoctl. Install it with the boto3 package (for AWS dependencies) and install the aws-lambda operator. Then, call bentoctl init:

$ pip install bentoctl boto3
$ bentoctl operator install aws-lambda
$ bentoctl init

The command will ask you a few questions. You only have to provide a name for the deployment — xgb_classifier. You can leave the rest as defaults.

The deployment instructions are taken from here on bentoctl docs.

GIF by author

Three files will be created — deployment_config.yaml, bentoctl.tfvars and They will like below:

Image by author
GIF by author

Now, we will install another tool called terraform. Terraform simplifies creating and managing cloud resources right from the CLI. It will use the above three files, mainly the file, which lists all the details of our upcoming AWS Lambda function.

You can find Terraform install instructions from here for Linux. For other systems, follow the instructions here. To check if the installation was successful, use:

$ terraform -h

If it prints a list of commands, the installation is successful.

After that, we build an AWS Lambda-compatible Docker image with bentoctl build command:

$ bentoctl build -b xgb_classifier:latest -f deployment_config.yaml

We add the name and the version of our Bento and specify the build specs via the deployment_config.yaml file. The process should look like the below:

GIF by author

If you receive botocore.exceptions.ClientError, then AWS credentials is not set up properly. Go back to that step again.

If the image is built successfully, you should see an output like the below:

Image by author

Finally, we use Terraform to push the image to an AWS Lambda function. Use the following commands:

$ terraform init
$ terraform apply -var-file=bentoctl.tfvars -auto-approve
GIF by author

In the end, you will see the following message, which shows the URL of our deployed Lambda API:

Image by author

It is going to take a while to show the UI, but once done, the API docs will be visible when you click the link:

At first launch, the link will show an internal server error message. Ignore it.

Image by author

But it will be immediately available for sending requests and getting predictions:


Phew! That was a lot of steps, given that I said the article would be straightforward in the beginning. But, since we have a ready ML solution at our hands, it was all worth it.

Deploying models as APIs is one of the best ways to let users interact with your model. Take the example of famous services like Dalle-E or GPT-3. Both are massive models exposed under a simple API and put inside a website. Using the techniques in this article, you can create the same products, though they probably won’t be billion-parameter models at first.

Thank you for reading!


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: