Hyperparameter Optimization For Beginners

https://miro.medium.com/max/1200/0*abbwQ7Vb1wk_wAtg

Original Source Here

Hyperparameter Optimization For Beginners

The Task Data Scientists Hate To Love

Photo by Drew Patrick Miller on Unsplash

Many Data Scientists ignore hyperparameters. Hyperparameter tuning is a highly experimental activity, and such uncertainty can lead to severe discomfort in any normal human being, something we naturally attempt to avert.

“Hyperparameter tuning relies more on experimental results than theory, and thus the best method to determine the optimal settings is to try many different combinations […].” — Will Koehrsen, Hyperparameter tuning the

Unfortunately, it ought to be done. We don’t walk into a store, pick a pair of trainers off the shelf, and buy them. We first select a shoe we believe will solve our problem, whether that’s a wardrobe malfunction or for whatever reason, we’ve lost all our trainers. Next, we tune the hyperparameters such as the size of the shoe and the color we want before we make the purchase.

If we are willing to do this in the real world, it shouldn’t be skipped in Data Science.

Understading Hyperparameters

Hyperparameter optimization is the problem of selecting the optimal set of hyperparameters for a learning algorithm. By determining the right combination of hyperparameters, the model’s performance is maximized — meaning our learning algorithm makes better decisions when provided unseen instances.

Values selected as hyperparameters control the learning process, therefore, they are different from normal parameters since they are selected prior to training a learning algorithm.

Formally, model hyperparameters are parameters that cannot be estimated by the model when provided the data, hence they need to be set beforehand to estimate the model’s parameters. In contrast, model parameters are estimated by the learning model from the provided data.

Approaches

There’s a number of approaches to efficiently perform hyperparameter optimization — see Hyperparameter Optimization on Wikipedia for a full breakdown. Mind Foundry conducted a survey on Twitter to learn the sentiments of practitioners on the platform.

Survey Conducted by Mind Foundry on Twitter

Let’s learn more about each of them and how to perform them in Python.

Bayesian Optimization

Wikipedia describes Bayesian Optimization as “a global optimization method for noisy black-box functions. Applied to hyperparameter optimization, Bayesian optimization builds a probabilistic model of the function mapping from hyperparameter values to the objective evaluated on a validation set. By iteratively evaluating a promising hyperparameter configuration based on the current model, and then updating it, Bayesian optimization aims to gather observations revealing as much information as possible about this function and, in particular, the location of the optimum. It tries to balance exploration (hyperparameters for which the outcome is most uncertain) and exploitation (hyperparameters expected close to the optimum). In practice, Bayesian optimization has been shown to obtain better results in fewer evaluations compared to grid search and random search, due to the ability to reason about the quality of experiments before they are run.” [Source: Wikipedia].

Scikit-Optimize (skopt) is an optimization library that has a Bayes optimization implementation. I’d recommend using this implementation rather than trying to implement your own solution — although there’s value in implementing your own solution if you would like to dive deeper into how Bayesian optimization works. See the code below for an example.

Example of Bayesian Optimization using Python

Grid Search

Grid search was the first technique I learned to perform hyperparameter optimization. It consists of exhaustively searching through a manual subset of specific values of the hyperparameter space in a learning algorithm. Performing Grid search means there must be a performance metric guiding our algorithm.

Instead of implementing Grid search from scratch, it’s highly recommended that you use the Sklearn implementation. See the code below for an example.

Example of Grid Search using Python

Random Search

Instead of exhaustively enumerating through all of the combinations that you list in a Grid search, random search selects combinations at random. When a small number of hyperparameters have an effect on the final model performance, the random search can outperform grid search — despite its low rating in the survey above, random search is still quite an important technique to have in your toolkit.

Like Grid search, Random search has a Scikit Learn implementation that would be better to use rather than your own solution. See the code below.

Example of Random Search using Python

Final Thoughts

From my experience, having a good understanding of the learning algorithm you’re using and how the hyperparameters affect its behavior helps when performing hyperparameter optimization. Although it’s one of the most important tasks, I feel like hyperparameter tuning doesn’t get the recognition it deserves or it could be that I don’t see it enough. Nevertheless, it’s an extremely important part of your project and should never be overlooked.

Thanks for Reading!

If you enjoyed this article, connect with me by subscribing to my FREE weekly newsletter. Never miss a post I make about Artificial Intelligence, Data Science, and Freelancing.

Related Articles

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: