Original Source Here
Comprehensive Guide to Deploying Any ML Model as APIs With Python And AWS Lambda
People say that the best thing about Python is the abundance of its open-source libraries. I beg to differ. Having many alternatives for doing a single task might give you a lot of flexibility, but they might just end up creating a giant mess that leaves programmers confused and angry.
Right now, it is the case regarding Machine Learning Operations (MLOps). It has become such an essential field in recent years that the Python community has risen to the occasion splendidly — by creating as many tools and libraries as possible for performing MLOps disciplines. Now, it is challenging for a regular user to sift through all the information, compare and come up with a final set of tools for solving their business problems.
Even if they do, there is an entirely separate issue of putting them together, making them communicate, and reaching a ready, presentable machine learning solution.
Take deploying models as APIs as an example (this article’s topic). Which set of tools would you choose to accomplish the job? For a moment, let’s forget all the starting fuss like data versioning, model selection, and feature engineering and say you have a ready model.
How do you convert the model into an HTTPS address so users can send requests? There are so many options like Flask, Django, FastAPI, etc. Even after choosing the framework, getting it online isn’t easy. Do you use AWS, Heroku, GCP, or dozens of others who practically claim they offer the “only best way” to do it?
To save you the trouble of answering all these questions, in this article, I will present a solution that should be pretty straightforward and hassle-free compared to other alternatives. You will only need to know
xgboost; the rest I will teach.
In the end, we will build this API:
Users can send requests to its
classify endpoint to get predictions and embed this API anywhere with a single link.
The stack we will use
Before we jump in, let me outline the list of technologies we will use and explain the reasons I chose them.
- Pandas, NumPy — no explanation needed.
- Sklearn and XGBoost — model training. They are only chosen for illustration purposes, and the instructions will work on models from any other framework
- BentoML — packaging the trained model into a local API (will be explained in detail later)
- bentoctl— a CLI tool that lets you build Docker images and save them in cloud-compatible format with a single command
- Terraform — another CLI tool for managing cloud infrastructure. It takes care of creating resources and functions and pushing containers to them without needing to visit half a million pages.
- AWS — the most popular, reliable, and flexible cloud platform (you can confirm via Google Trends).
- AWS Lambda — run code on the serverless cloud — highly scalable and comfortable.
So, without further ado, let’s jump in.
What is BentoML and its purpose?
To maximize the business impact of machine learning, the hand-off between data scientists and engineers from model training to deployment should be fast and iterative. However, data scientists often don’t have the skills to package trained models and push them to the engineers, while the engineers find it hard to work with models from dozens of different ML frameworks.
BentoML was created to solve these issues and make the hand-off to production deployment as easy/fast as possible. In the coming sections, you will see how BentoML makes it easy to perform tedious MLOps operations. The examples are:
- Saving any model of any framework into a unified format to ease collaboration.
- Create an HTTP API endpoint with a single Python function for faster iteration.
- Containerize everything the model needs using Docker with a single CLI command.
Read the docs here to learn more about it, or continue reading.
Dataset preparation and model training
The crux of the article is about model deployment, so I want to concentrate all your attention on that area only. For that purpose, I assume you are reading this article with your best-trained model already in hand and want to deploy it as soon as possible.
To simulate that here, we will create a synthetic dataset, train an XGBoost model, and move forward as though you have done all the previous steps of the MLOps life cycle like data cleaning, exploration, feature engineering, model experimentation, hyperparameter tuning, and found the model that performs best on your dataset.
We create a simple dataset with seven features and 10k samples with a binary classification target.
We load it back, train a vanilla XGBoost classifier, and pretend that it is our best-tuned model.
Saving trained models to BentoML format
Saving a trained model into a BentoML-compatible format is done by calling the framework-specific
The returned object is an instance of BentoML
Model class with a label called a tag.
The tag consists of two parts — a name given by the user and a version string to differentiate between models saved at different times. Even if an identical model is saved, BentoML will create a new directory and a version string.
BentoML supports almost all essential ML frameworks:
- Classic: Sklearn, XGBoost, CatBoost, LightGBM
- Deep learning: TensorFlow, PyTorch, PyTorch Lightning, Keras, Transformers
- Others: ONNX, MLFlow, fast.ai, statsmodels, spaCy, h2o, Gluon, etc.
Each of the frameworks has a corresponding
When a model is saved, it goes into a local directory called BentoML model store. From the last output (the
path parameter), we saw that my model store resides in
/home/bexgboost/bentoml/models. You can see the list of all your models by calling the
bentoml models list command in the terminal:
You can also see models from my other projects.
Note: in BentoML docs and this article, the names “model” and “tag” are used interchangeably to refer to saved models in the model store.
save_model has other parameters for passing extra information about the model, from metadata to additional user-defined objects (e.g. feature importances of your model as a separate object):
Above, we are saving our XGBoost model with additional data like the author (me), feature importance scores, which you can retrieve with
booster.get_score and a dummy metric.
Models in the BentoML model store can be shared as standalone archives using the
bentoml models export command:
When you don’t know the exact version string of your tag, you can use the “:latest” suffix to choose the most recent. With the above command, we are exporting the classifier into a
.bentomodel archive to the
models directory. When a teammate sends you a
.bentomodel archive, you can use the
import command to send it to your local BentoML model store:
Retrieving saved models
There are a few ways of loading saved models from the model store into your environment. The simplest one is the
load_model function. Like
load_model is also framework-specific:
The function will load the model in the same format it was before it was saved, meaning you can use its native methods like
To load the model as a BentoML
Model object, you can use the
models.get command, which IS NOT framework-specific:
tag = bentoml.models.get("xgb_custom:latest")
The reason you might want to load the model in this format is that now, you can access its add-ons like metadata and labels:
The final and most important way of retrieving models is by loading them as runners:
Runners are special objects of BentoML that are optimized to use system resources in the most efficient way possible based on their framework. Runners are the core components of the APIs we will build in the next section.
Now, we are ready to start building the API!
Organize into scripts
Up until now, we have been using notebooks. We need to switch to Python scripts to build an API service. Let’s organize the code of the previous sections. In
generate_data.py file, create a function that saves the synthetic data from the “Dataset Preparation” section:
generate_data.pyscript can be found here.
train.py file, create a function that trains our XGBoost classifier and saves it to the BentoML model store:
train.pyscript can be found here.
For completeness, run both scripts in the correct order to generate the dataset and save a new model to the model store.
Creating an API service script
Now, it is time to create the local API. For that, we will only need a simple script that starts like the below:
After loading our model with
models.get as a runner, we create an object called
svc. It will be an instance of a BentoML
Service is a high-level class that abstractly represents our API.
We add a single endpoint called
classify to the service object by creating a function with the same name:
Let’s understand the above snippet line-by-line.
First, we are importing a new class called
bentoml.io – Input/Output module. To standardize inputs and outputs, BentoML offers several classes like
NumpyNdarray such as,
Adding these classes to the
output arguments of the
svc.api decorator ensures that correct datatypes are passed to our API endpoint. In our case, we are making sure that the data passed to our
classify function is always a NumPy array. If we were working with image models, our input class could be a
Image class, while the output would be
Inside the function, we are using the
predict.run method of our runner to get a prediction on the input. The method calls the
predict method of the XGBoost booster object under the hood.
Here is what the script looks like in the end:
That’s it! By using
bentoml serve, we can create a local debug server:
$ bentoml serve service.py:svc --reload
svcvariables in the above command changes based on your script name and the name of the service object. If you had a service object named
apiin a script called
api.py, the command would be
bentoml serve api.py:api --reload. The
--reloadtags ensures that BentoML detects changes made to your script without needing to restart the server.
Here is a sample output of the command:
The GIF shows that the API is live locally on http://127.0.0.1:3000:
By using Swagger UI, BentoML shows you interactive documentation of our API. We can already send requests to it to get predictions:
Yay! Our API works — it classifies the given sample as the first class.
Building a Bento
Now, we package our API into a Bento.
Bento is a term introduced by BentoML that refers to an archive that contains everything needed to make our API work — in a unified format. Inside a Bento, there will be instructions on building a Docker image and the dependencies of our API model.
We start building the Bento by creating a
bentofile.yaml file (the name should be the same) in the same directory level as our scripts. It will look like below:
include field is used to list all the scripts needed to run our Bento. In our case, we are just adding all Python scripts. Next, there is the
packages field, which should list dependencies and their versions.
In large projects, it is hard to keep track of all the dependencies and their versions, so you can use the
pipreqs library to create a
requirements.txt file for your project:
$ pip install pipreqs
$ pipreqs --force . # --force overrides existing req file
Then, you can copy the dependencies to the
After you have the
bentofile.yaml ready, we call the
build command and specify the path to our
bentofile.yaml, which is the project root (
.) in our case:
$ bentoml build .
If successful, you will see an output like the below:
You can also check the Bento by running
bentoml list in the CLI:
As you can see, three Bentos are on my system, two of which belong to the current project. Having a ready Bento means we are ready to deploy it online.
Setting up AWS credentials
Since we are deploying our API as an AWS Lambda function, we must set up our AWS credentials. The best way to do that is via the CLI.
First, go to your AWS IAM console and find your credentials. It is usually available via the button “Manage access keys button”:
Click on it, and you will arrive here:
Create a new set of keys or download an existing one. Then, go to your terminal and run the below commands:
For Linux or osX:
This will set up your AWS credentials ONLY FOR THE CURRENT virtual environment.
Deploying the Bento to AWS Lambda
To deploy our Bento, we will use BentoML’s native CLI package called
bentoctl. Install it with the
boto3 package (for AWS dependencies) and install the
aws-lambda operator. Then, call
$ pip install bentoctl boto3
$ bentoctl operator install aws-lambda
$ bentoctl init
The command will ask you a few questions. You only have to provide a name for the deployment —
xgb_classifier. You can leave the rest as defaults.
The deployment instructions are taken from here on
Three files will be created —
main.tf. They will like below:
Now, we will install another tool called
terraform. Terraform simplifies creating and managing cloud resources right from the CLI. It will use the above three files, mainly the
main.tf file, which lists all the details of our upcoming AWS Lambda function.
$ terraform -h
If it prints a list of commands, the installation is successful.
After that, we build an AWS Lambda-compatible Docker image with
bentoctl build command:
$ bentoctl build -b xgb_classifier:latest -f deployment_config.yaml
We add the name and the version of our Bento and specify the build specs via the
deployment_config.yaml file. The process should look like the below:
If you receive
botocore.exceptions.ClientError, then AWS credentials is not set up properly. Go back to that step again.
If the image is built successfully, you should see an output like the below:
Finally, we use Terraform to push the image to an AWS Lambda function. Use the following commands:
$ terraform init
$ terraform apply -var-file=bentoctl.tfvars -auto-approve
In the end, you will see the following message, which shows the URL of our deployed Lambda API:
It is going to take a while to show the UI, but once done, the API docs will be visible when you click the link:
At first launch, the link will show an internal server error message. Ignore it.
But it will be immediately available for sending requests and getting predictions:
Phew! That was a lot of steps, given that I said the article would be straightforward in the beginning. But, since we have a ready ML solution at our hands, it was all worth it.
Deploying models as APIs is one of the best ways to let users interact with your model. Take the example of famous services like Dalle-E or GPT-3. Both are massive models exposed under a simple API and put inside a website. Using the techniques in this article, you can create the same products, though they probably won’t be billion-parameter models at first.
Thank you for reading!
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot