Original Source Here
Feature engineering is the process of transforming data into features that better represent the underlying problem. This can be tedious and time-consuming. Automating this process can save teams a lot of time.
With Model Selection, we choose the best option from a wide range of possible models. Model performance is not the only criteria. The time it takes to train and how easy it is to explain and understand are crucial factors as well. Model selection does not only include finding the right type of model to use. It can also include an architecture search to find the specific model structure that fits best your problem.
I recommend the Google Research article about model search if you want to learn more about these important topics.
Finding the right hyperparameter can highly impact the performance of the algorithm. We can do that based on intuition or experience, but this requires repeating the process until we are satisfied with the results. Automated Hyperparameter Tuning uses methods such a gradient descent or evolutionary algorithms to search for the best hyperparameters.
We obviously need to train the model. AutoML products should take care of provisioning the right infrastructure and scale in the right manner in order to train your models in the most efficient way. Once the training is completed, we have additional steps like an evaluation to understand how well the model performance on a specific dataset. Explainability to be able to explain how the data releases to the model and the prediction it makes.
Finally, we want to use our model in production that typically involves providing an API endpoint to get predictions, while also taking care of availability and scalability. As soon as we move the model into production, you will need to keep track of your model. By using monitoring we can understand when is the right time to re-train the model.
As you can see, AutoML requires a significant number of steps. All these highly research-driven and complex parts should be hidden from the AutoML user. It should be simple to use.
AutoML: Where and Why
Automated Machine Learning can be applied to a wide range of machine learning use cases like Text, Images and Tabular data.
No matter which use case you choose, machine learning consists of many manual steps like data preprocessing, feature engineering and carefully selecting the suitable model and parameters. These are error-prone and tedious tasks. Using AutoML introduces reproducibility to our machine learning products and significantly reduces the risk of failure.
The most obvious reasons why companies opt for AutoML is lack of manpower or limited machine learning expertise. While this is mostly the case, these are definitely not the only reasons to take the AutoML route. Companies of all sizes and levels of experience are using AutoML today.
We are seeing great success stories or proof of concepts of idea validation, completed just in just a couple of hours. You can build a baseline model that you want to compare with your hand-crafted model as well. It also makes sense in production environments, as it takes away a significant number of steps of your ML workflow. You can quickly launch new products with good quality, while minimizing the maintenance and implementation efforts.
Using products like Google AutoML helps you create and maintain state of the art models. Google is taking care of always optimizing their products and introducing new capabilities like their latest model architectures or other advanced features.
“But We Know Everything About Machine Learning”
This is a statement I hear from time to time. There is still a lot of skepticism about automated machine learning and its benefits. AutoML is not your competitor, and it will certainly not take away your job. Think of it as automation to free up your time to perform more important tasks.
I like to think of AutoML as just another tool in the toolbox.
This tool should be the first one you use when approaching a new machine learning case. You can take your data and train a model in a couple of hours. It might be the one you’re using in production, but it doesn’t have to be.
AutoML Services and Frameworks
Various AutoML services and frameworks are available today to optimize your performance. The features and capabilities differ a lot, and that’s something you need to be aware of. Some of the services focus heavily on UI, while others are more accessible programmatically.
The first things we need to examine are Cloud AutoML services like products from Google, AWS and OS frameworks. Secondly, we need to distinguish between the implementation efforts required to use these AutoML tools. For example, Google AutoML Tables do not require a single line of code to train a model and get a ready-to-use API prediction endpoint. This is not the case with most frameworks, where implementation work is involved.
Google has a comprehensive set of easy-to-use AutoML products. It has a well-thought-out UI, powered by state-of-the-art neural architecture search and transfer learning, that gives excellent results for multiple use cases.
This is not a exhaustive list by any means. There are certainly more capable options out there and new ones will enter the picture for sure.
There is one significant advantage of all these cloud providers is that you don’t have to take care of any infrastructure — whereas when using open-source, you need to make sure you run it in an appropriately scaled environment.
Setting the Right Expectations
What we want to archive with AutoML is a competitive quality. AutoML can outperform handcrafted models but don’t always expect top quality (at least not for now). I often see AutoML outperforming handcrafted models, but is not always the case. This is something that correlates to your available budget and the experience of the people in your team.
Being competitive doesn’t only relate to the model’s performance. How quickly you can put new models into production is even more important.
You can still opt to develop your hand-crafted machine learning model. But is it really necessary? Why not invest a couple of hundred dollars and get a ready-to-use model?
With that in mind, let us do some demos.
Automated Machine Learning Using Google AutoML Tables
In this section, we cover the training of an AutoML model to identify the Higgs boson, also known as the god particle. No further knowledge of the Higgs boson is required to follow the following steps, but I recommend visiting and checking out the official CERN website if you’re generally interested.
The data we are using is tabular data with 30 feature columns. You can get the dataset directly from Kaggle.
Fun fact: The Higgs boson got the nickname god particle because the physicist Leon Ledermann referred to it as “goddamn particle” due to its detection difficulties. Leon meant is as joke, so better call it Higgs boson.
Google AutoML Tables can potentially save you a lot of work. In the following process, we go through the steps for tabular data but the steps are the same for image, text and video. There are some settings that are unique to the type of data, but Google will guide you through everything.
Best Kaggle submissions used Gradient Boosting and achieved an accuracy level of 84%. In the demo with Google AutoML Tables, we can see the final model is a Gradient Boosting Decision Tree, and we archived similar results.
Google AutoML: Best Practices
Take your time and read the documentation. It helps you to understand what AutoML expects to deliver the best results. Following the guidelines can improve your model quality significantly.
I strongly recommend building a baseline model needed for comparison of your subsequent training. The baseline model should be trained on your raw data without any further feature engineering. If you have a baseline model and you train additional models with your own domain knowledge applied you can compare and see how your optimization improved the model.
Avoid already applied transformations. Google is doing some feature engineering for you, so take extra caution with your own feature engineering.
You can find me on Twitter @HeyerSascha and LinkedIn if you have any further questions on AutoML and other related issues.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot