Original Source Here
Pull your ML model out of your Server: The Database Solution
In the previous article, we saw one excellent reason you’d want to use tools like Streamlit and Gradio to deploy fast and deploy many versions of your Machine Learning (ML) application.
We saw the advantages of the model-in-server architecture and why you’d definitely want to go down this road when you’re prototyping. This is the easiest way to get quick feedback from a private circle of trusted testers and evaluate the market viability of your product idea.
However, we concluded that when it’s time to move into production, you need to rethink your design and pull your ML model out of your application server. Several issues, like programming languages, diverse scaling needs, and separate update cycles, make the model-in-server architecture approach a bad idea for production.
So what can we do? If we revisit the simple web application architecture diagram, we see that we have three options:
- Place your model in your database
- Pull your model into its own inference server
- Place your model on the edge (i.e., closer to your client)
In this story, we’ll examine how to make the first option happen, when you’d want to move towards this solution, and the advantages and disadvantages of this approach.
Learning Rate is a newsletter for those who are curious about the world of AI and MLOps. You’ll hear from me on the first Saturday of every month with updates and thoughts on the latest AI news and articles. Subscribe here!
Move your ML Model into the Database
Moving your ML model into your database is kind of a hack. You don’t run any code in your database system. Instead, you periodically run your model offline on new data, save the results locally, and follow a standard ETL (Extract — Transform — Load) procedure to store the predictions in the database.
This is the most simple design you can think of when placing a model into production. You don’t have to maintain the infrastructure your model runs on, you don’t really care about its performance, and you don’t lose sleep over every request’s latency.
So, it seems like the perfect solution, but, as you can imagine, it’s not a silver bullet. This approach is suitable for specific use cases:
- Recommender systems
- Marketing automation
If your recommender system makes only one prediction for each user daily, the design we’re talking about is a perfect fit. Imagine a client coming into a store. Upon their entrance, they receive a recommendation on their smartphone informing them which products match their preferences.
In this use case, you can run your model daily, store the predictions for each user in your database, and be done with it. However, it’s not relevant for every recommender system. The model-in-database architecture won’t cut it if you need to make real-time predictions in a dynamic environment. Having said that, many recommender systems work like this in production today. So, don’t think twice if your case is a good fit.
For marketing automation, such as customer segmentation, moving your model in the database could work even better. For example, consider the case where you want to run some kind of unsupervised learning algorithm to identify groups of users and later target them with promotional campaigns. You can run the algorithm once and store the group identifier for each user in the database. That’s it!
Why should you do it?
So, what are the pros of this approach? First and foremost, it’s simple to implement. We saw that you don’t need to care about things like infrastructure or performance. You could even run your model on your laptop and store the predictions in a CSV file. Then load the CSV into your database.
It also scales easily. You rely on the decades of innovation and engineering put into database systems to make them scale to millions of requests. For the same reason, you (usually) get low latency for your users.
Why shouldn’t you do it?
What about the disadvantages? The first disadvantage is that this approach does not scale to complex input types. What if your users want to get information about an image or a piece of text? There’s simply no way around this problem.
Then, during training, your models are exposed to data points that may not be relevant at inference time. Thus, your users don’t get the most up-to-date predictions, and your models frequently become stale.
The model-in-database approach is one pathway you can take to pull your ML model out of your web server. We saw what types of use cases this approach covers and what are its pros and cons. My advice is that if your case is a candidate for this design, start here. It’s so easy to implement that there is no justification for over-engineering things.
The next article will cover when you should pull your model into its own inference server and how to do it. This is arguably the most common architecture design covering almost every use case.
About the Author
My name is Dimitris Poulopoulos, and I’m a machine learning engineer working for Arrikto. I have designed and implemented AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA.
Opinions expressed are solely my own and do not express the views or opinions of my employer.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot