4 Machine Learning System Architectures

https://miro.medium.com/max/1200/0*TRuxuv-xJ3Vcf8I4

Original Source Here

4 Machine Learning System Architectures

Taking Machine Learning Into Production

Photo by Anders Jildén on Unsplash

I’m a big advocate for learning by doing, and it just so turns out that it’s probably the best way to learn machine learning. If you’re a machine learning engineer (and possibly a Data Scientists), you may never quite feel fulfilled when a project ends at the model evaluation phase of the Machine Learning Workflow, as your typical Kaggle competition would — and no, I have nothing against Kaggle, I think it’s a great platform to improve your modeling skills.

The next step is to put the model into production, which is generally a topic that is left out of most courses on Machine Learning.

Disclaimer: This article was written using notes from Deployment of Machine Learning models Course (Udemy)

Formats To Serve ML Models

Once a Machine Learning model has been trained and exported, the next step is coming up with a method to persist the model. For example, we can serialize the model object with pickle — see the code below.

import pickle
import
pandas as pd
from
sklearn.metrics import accuracy_score
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# loading the data set
dataset = load_breast_cancer(as_frame=True)
df = pd.DataFrame(data= dataset.data)
df["target"] = dataset.target
# Seperating to X and Y
X = df.iloc[:, :-1]
y = df.iloc[:, -1]

# splitting training and test
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.75, shuffle=True, random_state=24)
# training logistic regression model
lr = LogisticRegression(C=0.01)
lr.fit(X_train, y_train)
# serializing the model
lr_dump = pickle.dumps(lr)
# making predictions with the model
clf = pickle.loads(lr_dump)
y_preds = clf.predict(X_test)

Note: when writing code to take a Machine Learning model into production, an engineer modularize the above code into training and inference scripts to abide by software engineering best practices.

The end of the training script is defined by the point at which the model is dumped into a pickle file, whereas the inference script begins once the model has been loaded to make predictions on new instances.

Other methods of serving a model include:

  • MLFlow — MLFlow provides a common serialization format that integrates with various machine learning frameworks in python
  • Language Agnostic exchange formats (i.e. ONNX, PFA, and PMML)

It’s always good to be aware of the other options we have since there are some downsides to the popular pickle(or joblib) formats.

#1 Model Embedded in Application

Example Embedded Architecture; Image By Author
  • Pre-Trained: Yes
  • On-the-fly Predictions: Yes

In this scenario, the trained model is embedded in the application as a dependency. For example, we can install the model into the application with a pip installation or the trained model can be pulled into the application at build time from a file storage (i.e. AWS S3).

An example of this instance is if we had a flask application that we used to predict the value of a property. Our Flask application would serve an HTML page which we could use as an interface to collect information about a property a user would like to know an estimated value for. The Flask application would take those details as inputs, forward them to the model to make a prediction then return them to the client.

In the example above, the predictions will be returned to the user’s browser, however, we can vary this method to embed the model on a mobile device.

This approach is much simpler than other approaches, but there’s a simplicity-flexibility trade-off. For instance, to make a model update the entire application would have to be redeployed (on a mobile device, a new version would need to be released).

#2 Dedicated Model API

Example Dedicated ML Model API
  • Pre-Trained: Yes
  • On-the-Fly Predictions: Yes

In this architecture, the trained machine learning model becomes a dependency of a separate Machine Learning API service. Extending on from the Flask application to predict the value of a property example above, when the form is submitted to the Flask application server, that server makes another call — possibly using REST, gRPC, SOAP, or Messaging (i.e. RabbitMQ) — to a separate microservice that has been dedicated to Machine learning and is exclusively responsible for returning the prediction.

Differing from the embedded model approach, this method compromises simplicity for flexibility. Since we’d have to maintain a separate service there is increased complexity with this architecture, but there is more flexibility since the model deployments are now independent of the main application deployments. Additionally, the model microservice or main server can be scaled separately to deal with higher volumes of traffic or to potentially serve other applications.

#3 Model Published as Data

Example of Streaming Deployment; Image by Author
  • Pre-Trained: Yes
  • On-the-Fly Predictions: Yes

In this architecture, our training process publishes a trained model to a streaming platform (i.e. Apache Kafka) which will be consumed at runtime by the application, instead of build time — eligible to subscribe for any model updates.

The recurring theme of simplicity-flexibility trade off occurs here once again. Maintaining the infrastructure required for this archeticutre demands much more engineering sophistication, however ML models can be updated without any applications needing to be redeployed — this is because the model can be ingested at runtime.

To extend on our predicting value of a property example, the application would be able to consume from a dedicated topic from the designated streaming service (i.e. Apache Kafka).

#4 Offline Predictions

Offline Architecture Example; Image by Author
  • Pre-trained: Yes
  • On-the-fly Predictions: No

This approach is the only asynchronous approach we will be exploring. Predictions are triggered and run asynchronisously either by the application or as a scheduled job. The predictions will be collected and stored — this is what the application uses to serve the predictions via a user interface.

Many in industry have moved away from this architecture, but it’s much more forgiving in a sense that predictions can be inspected before being returned to a user. Therefore, we reduce the risk of our ML system making errors since predictions are not on the fly.

In regards to the simplicity-flexibility tradeoff, this system compromises simplicity for more flexibility.

Wrap Up

Sometimes, merely doing some analysis, building multiple models, and evaluating them can get quite boring. If that is the case for you then learning to put machine learning models into production could be the next step and it’s a formidable skill to have in your toolbox. To emphasize, there is no such thing as a “best” system architecture for your model deployment. There is only the best set of tradeoffs that meets your systems requirements.

Thank You for Reading!

If you enjoyed this article, connect with me by subscribing to my FREE weekly newsletter. Never miss a post I make about Artificial Intelligence, Data Science, and Freelancing.

Related Articles

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: