Smart Paraphrasing Using Constrained Beam Search in NLP



Original Source Here

Smart Paraphrasing Using Constrained Beam Search in NLP

Paraphrase to retain particular keywords for SEO and copywriting

Copyright-free image from Pixabay

Huggingface recently introduced guiding text generation with constrained beam search in the Transformers library.

You can give guidance about which words need to be included in the decoded output text with constrained beam search.

This has a few interesting use-cases in copywriting and SEO.

Usecase 1: SEO (Search Engine Optimization)

For example, if you are writing an article on “meme marketing”, you want to make sure that the word “meme marketing” is used several times throughout the blog or article so that it ranks higher in Google searches (SEO) and establishes authority on the topic.

This concept is called “keyword stuffing”, and of course needs to be used in moderation not to look spammy. So what this makes us do is use the phrase “meme marketing” without changing it even if we are paraphrasing. So in such scenarios, there is a need to rewrite(paraphrase) any old content keeping the phrase “meme marketing” intact.

This is a perfect use-case for constrained beam, search where you want to paraphrase while keeping a phrase or keyword intact in the paraphrased version.

Usually, SEO experts identify long-tail keywords, that is niche phrases with good search volume but fewer relevant results (eg: best-running shoes for flat feet), and write articles on them. This helps them to rank faster on Google searches and show their website on the first page. Sometimes they want to paraphrase a sentence they have written but want to keep the phrase “best-running shoes for flat feet” etc intact.

Usecase 2: Copywriting

Copywriting is writing content for any marketing material be it blogs, websites, flyers, etc.

Effectively copywriting is understanding human psychology and writing a copy that drives the customer towards an end goal like driving signups, selling a product, etc.

And let’s take a copywriting tip from Marketingexamples.com, where it is advised to use simple language over superfluous one.

For example, if we have the landing page copy as “Supercharge your front-end skills by building real projects”, it is advised that landing page words like supercharge, unleash, exceed, empower can be replaced with simpler ones to sound more natural and real.

So in our case, we would ideally paraphrase the sentence “Supercharge your front-end skills by building real projects” but use the word “improve” to sound more natural.

Again this is a perfect use-case for constrained beam search where we can give “improve” as the force word.

Image by Author

Input and Output

The input to our paraphraser model will be –

Supercharge your data science skills by building real world projects.

The paraphraser output with beam search (unconstrained):

1. By implementing real world projects, you can improve your data science skills. 
2. By implementing real world projects, you can boost your data science skills.
3. By implementing real world projects, you can enhance your data science skills.

If we use the force word “improve”, the paraphraser output with constrained beam, search is:

1. By implementing real world projects, you can improve your data science skills. 
2. By executing real world projects, you can improve your data science skills.
3. Build real world projects to improve your data science skills.

You can see that “improve” is generated in every sentence as we constrained our beam search on it.

Project

Now let’s get to the coding part of the project! Here we will use the copywriting example we discussed above and paraphrase our original sentence using constrained beam search.

You can find the colab notebook here.

  1. Install necessary libraries in Google Colab
!pip install -q sentencepiece
!pip install -q transformers==4.18.0

2. Download the paraphraser model that we trained for our SaaS app Questgen and open-sourced. Load the model in GPU memory.

import torchfrom transformers import AutoTokenizer, AutoModelForSeq2SeqLMmodel = AutoModelForSeq2SeqLM.from_pretrained("ramsrigouthamg/t5-large-paraphraser-diverse-high-quality")
tokenizer = AutoTokenizer.from_pretrained("ramsrigouthamg/t5-large-paraphraser-diverse-high-quality")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print ("device ",device)
model = model.to(device)

Now let’s run beam search and constrained beam search with the input sentence “Supercharge your data science skills by building real world projects.” and force_word “improve”.

# Copy writing example - Use a given word in paraphrasingcontext = "Supercharge your data science skills by building real world projects."
force_words = ["improve"]
text = "paraphrase: "+context + " </s>"input_ids = tokenizer(text,max_length =128, padding=True, return_tensors="pt").input_ids
input_ids = input_ids.to(device)
# Beam searchoutputs = model.generate(
input_ids,
num_beams=10,
num_return_sequences=3,
max_length=128,
early_stopping=True,
no_repeat_ngram_size=1,
remove_invalid_values=True,
)
print ("\nNormal beam search\n")
print ("Original: ",context)
for beam_output in outputs:
sent = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
print (sent)
# Constrained Beam searchforce_words_ids = tokenizer(force_words, add_special_tokens=False).input_idsoutputs = model.generate(
input_ids,
force_words_ids=force_words_ids,
max_length=128,
early_stopping=True,
num_beams=10,
num_return_sequences=3,
no_repeat_ngram_size=1,
remove_invalid_values=True,
)
print ("\nConstrained beam search\n")
print ("Original: ",context)
for beam_output in outputs:
sent = tokenizer.decode(beam_output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
print (sent)

And here is the output:

Normal beam searchOriginal:  Supercharge your data science skills by building real world projects.
paraphrasedoutput: By implementing real world projects, you can improve your data science skills.
paraphrasedoutput: By implementing real world projects, you can boost your data science skills.
paraphrasedoutput: By implementing real world projects, you can enhance your data science skills.
Constrained beam searchOriginal: Supercharge your data science skills by building real world projects.
paraphrasedoutput: By implementing real world projects, you can improve your data science skills.
paraphrasedoutput: By executing real world projects, you can improve your data science skills.
paraphrasedoutput: Build real world projects to improve your data science skills.

You can see that the output from a regular beam search contains “improve” in only one of the generated paraphrases whereas it is present in all the three sentences generated from the constrained beam search.

Conclusion:

Conditional text generation, where we want to direct the output toward generating certain keywords or topics has been an active research area within text generation of language models. With the introduction of constrained beam search in HuggingFace, we have moved a step closer to that goal.

In many text generation applications like translation, image captioning, paraphrasing, etc we often encounter the need to use/preserve a certain keyword/keyphrase in the generated output text.

Whether it is SEO where we need to keep a certain keyword/keyphrase intact while paraphrasing or copywriting where we need to force a certain keyword to be present to suit the tone of the copy intent, we have seen several real-world use cases of constrained beam search in this tutorial.

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: