Original Source Here
What is Hugging Face?
Hugging Face Provides a lot of pre-trained models which are trained on billions of text corpus and a wide variety of NLP tasks. one pre-trained model that you may already know is BERT. There is also a lighter version of BERT which is DSITIL BERT.
All of these Pretrained models are based on transformer architecture, some models use only the encoder part of the transformer like BERT and some other models use only the Decoder or Encoder-Decoder(both).
you can import all of these pre-trained models with simple lines of code and fine-tune them on your custom dataset.
you can check out the hub of hugging face which contains many more pre-trained models for a variety of NLP tasks.
Let’s see look at simple Pipeline object in transformers:
Pipeline performs all pre-processing and post-processing steps on your input text data. it performs some pre-processing steps like converting text into numerical values and post-processing steps like generating sentiment of a text and these steps may vary depending upon the task of a pre-trained model.
you can install using !pip install transformers like many other libraries but this installs a lighter version only. you can also install the development version using !pip install transformers[sentencepiece], which comes up with deep learning frameworks like PyTorch and tensorflow or you can also use google collaboratory.
let’s look at some tasks that pipeline object can perform.
- Sentiment Analysis
you can see that how simple it is to perform sentiment analysis using pipeline, here we are just importing pipeline from transformers library and passing our text to classifier object and it will download the default pre-trained model for sentiment analysis and output the sentiment of our text in dictionary format.
Here’s a list of pipelines that are available in the transformers library
- feature-extraction (get the vector representation of a text)
- ner (named entity recognition)
In this article, we will look at some of these pipelines.
2. Text Generation
The main idea here is that if you provide some incomplete text, it will auto-complete by generating the remaining text.
you can also specify the length of output text with the argument max_length.
3. Named Entity Recognition
Named entity recognition (NER) is a task where the model has to find which parts of the input text correspond to entities such as persons, locations, or organizations.
We pass the option grouped_entities=True in the pipeline creation function to tell the pipeline to regroup together the parts of the sentence that correspond to the same entity. (ex: New Delhi)
Summarization is the task of reducing a text into a shorter text while keeping all (or most) of the important aspects referenced in the text.
Here I have passed Salman khan’s Radhe movie review by one of the movie review websites.
All of these pipelines select the default model from the hub given a task, but we can also choose different models using the model argument in pipeline.
let’s see an example for text generation.
5. Language Translation
It is a task of converting text that belongs to one particular language to another language like french.
I shared a very few tasks that the pipeline can perform, but there are many more tasks that pipeline can do.
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot