How to evaluate and compare Huggingface NLP models with Sagemaker Experiments

Photo by Jason Dent on Unsplash

What is this about?

As I’m writing this, the model library on Huggingface consists of 11,256 models, and by the time you’re reading this, this number will only have increased. With so many models to choose from, it is no wonder that many get overwhelmed and don’t know any more which model to choose for their NLP tasks.

It’d be great if there was a convenient way to try out different models for the same task and compare those models against each other on a variety of metrics. Sagemaker Experiments does exactly that: It lets you organize, track, compare, and evaluate NLP models very easily. In this article we will pit two NLP models against each other and compare their performances.

All the code is available in this Github repository.


