Prompting: The new era of Natural Language Processing*45HE5Oh7jfaLQj1z

Original Source Here

Prompting: The new era of Natural Language Processing

Recent progress in Natural Language Processing (NLP) has shown promising results on automatic text generation. Even people involved in developing Machine Learning algorithms get amazed when they look at the text generated by large pre-trained language models (PLMs) like GPT-3 or PaLM. The good quality of those models could be a direct result of being trained with 780 billion tokens, or having around 540 billions of parameters, or a better understanding of the network architecture for NLP (i.e. transformers, adapters, etc.), or all of the above. However, the question that naturally arises is whether we can leverage the complex knowledge captured by PLMs to solve tasks in other domains such as sentiment classification, entity detection, machine translation, text classification, among others. For instance, at Rappi we are interested in automatically detecting whether a dish is vegan or not by only reading the description/ingredients of the product. Notice that here we are not focusing on generating new text using the PLMs but on using the semantics, morphology and syntax learnt by the language model to extend its knowledge to a downstream task (a different task the PLM was not trained for). The good news is that we certainly can use the PLM to improve predictions on downstream tasks. But the answer, different to what you could be thinking, is not fine-tuning. Actually, the contemporary answer is: Prompting.

A few years ago, scientists and developers would have agreed that the best way to solve a downstream task leveraging a PLM is by performing a fine-tuning process. This often means to take a fixed PLM, introduce a set of new parameters (i.e. add a couple of new layers at the top of the model) and fine-tuning the whole model. Even though this approach was very successful for some applications (discriminative fine-tuning), it brings a couple of challenges that are not trivial to solve. First, the gap between the pre-training objective and downstream task objective could be substantial in some scenarios. For example, in the scenario of telling whether a credit card user is prone to default by learning from customer service chats, there is a large gap between the objective the PLM sees in the pre-training stage and the one in the fine-tuning stage. This implies that a bigger set of new parameters will be necessary to successfully adapt the PLM knowledge to the financial default metric. Often, the larger the objectives’ gap the more new parameters are needed to accurately fit the downstream task. But, as you might already know, adding plenty of parameters to deep learning algorithms could degenerate the generalisation power of our model, particularly, when the data available for training is limited. The issue is that downstream tasks are usually very specific and available data is inherently limited, otherwise we would have trained a big model since the beginning. Recalling the credit card default example, usually banks do not have plenty of users that have both customer service chats and default behaviour. Moreover, if you think deeply in the running example, it could be very challenging to the model to understand that we are interested in finding traces of users’ credit behaviour in the customer service chat. The reason is that the model has not been given a context of the question we are interested in answering. So, why not help the model by giving a context? That is when prompt-based learning comes to light!

The goal of prompt-based learning is to make the downstream task easier for the language model by leveraging natural-language prompts as the context of the downstream task. This is done by introducing a textual prompt so the downstream task looks very similar to what the PLM saw in the training stage. For instance, suppose that we have as input the following customer service chat as input: “I lost my job last month. Could you increase the number of instalments of my debt?” we could append at the end the text “Is the user able to pay their debt? _______ ”. It is natural to think the model will complete the sentence with a “No” with a higher probability than “Yes”. Notice that after inserting the textual prompt (“Is the user able to pay their debt? _______ ”) the downstream task has been reformulated as a masked language modelling problem, which is the same the PLM was trained for! As a result, in the prompt-based learning paradigm a text prompt is inserted in the input such that the downstream looks similar to what the PLM solved during training. It looks very easy, right? Yes and no, probably by this point you could already be thinking how can one design an appropriate prompt for a given downstream task? Well, that’s the core of prompt-based learning.

There are many strategies you could use to successfully make the most of prompting, from manual prompting to automated prompting. In manual prompting a human designs an intuitive prompt to probe the PLM while in automated prompting the prompt is optimally found by the algorithm. Manual templates are usually very easy to design but they require some degree of expertise from the designer. In addition, they do not always achieve a good performance (for example), particularly for complex downstream tasks. Consequently, researchers have recently focused on automated prompting. In this field, you can find two main strategies: discrete prompts (also called hard-prompts) and continuous prompts (also called soft-prompts). The former automatically searches for prompts in the discrete space of the natural language while the latter performs the search directly in the embedding space of the language model. You can find several proposed methods for each approach but some of the most famous works for discrete prompting are Prompt Scoring, Prompt Generation, Prompt Mining and Prompt Paraphrasing. Regarding continuous prompting, some of the most relevant works include Prefix Tuning, P-tuning and Prompt-Ensembling.

In summary, prompt-based learning is a powerful framework that is able to adapt pre-trained language models to perform well on a wide-range of downstream tasks. This framework is particularly impressive because it is able to perform few-shot learning (an even zero-shot learning for some scenarios), which is specially important in domains where data is limited or expensive to query, without any additional fine-tuning (in some cases). Furthermore, it has shown a strong flexibility and robustness to adapt a single PLM to solve several different in nature downstream tasks when an appropriate prompt is used. Even though we have covered a brief introduction on prompt-based learning, remember that prompting is an active research area (it started roughly in 2020!) and many approaches are emerging in the field. Stay tuned to NLP top conferences (ACL, EMNLP, NeurIPS) as new prompt-based learning strategies have been one the trend-topics in the last year.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: