Original Source Here
In this blog, I have tried summarizing the paper Multi-Sentence Compression: Finding Shortest Paths in Word Graphs as per my understanding. Please feel free to comment your thoughts on the same!
Sentence compression is the task of compressing a long sentence into a shorter one by deleting redundant words. This paper proposes a novel unsupervised multi-sentence compression technique wherein the goal is to represent a set of related sentences by a single sentence in such a way that it preserves the important parts of the main content and is also grammatically correct at the same time.
One of the possible use cases for Sentence Compression (SC) is in Extractive Text Summarization, wherein a typical flow is to rank sentences based on a certain scoring strategy followed by selecting top-k sentences as representative summary. Here, one could apply SC to merge related and redundant sentences for a better user experience. I do have a summary of one of the papers that uses a similar technique for doing multi-document summarization, you can check it out here.
Also, the technique proposed in the paper is easily adaptable to any language of interest with a minimalistic toolset. The only requirement is to have a custom POS tagger and Stopword list in the desired language.
The author proposes a graph-based compression technique wherein the idea is to represent words from the set of related sentences as nodes in the graph and the connections between those nodes are decided based adjacent positioning of words in a sentence. So let’s move further and see the exact steps in more detail—
A word graph is a directed graph where an edge from word A to word B is represented by an adjacency relation between words in a sentence. For example — Let’s consider 4 sentences as shown below:
The wife of a former U.S. president Bill Clinton Hillary Clinton visited China last Monday.
Hillary Clinton wanted to visit China last month but postponed her plans till Monday last week.
Hillary Clinton paid a visit to the People Republic of China on Monday.
Last week the Secretary of State Ms. Clinton visited Chinese officials.
Firstly, we prepend and append “Start” and “End” tokens in every sentence, the idea is to keep track of starting and ending positions as this would be useful while sampling shortest paths.
Once the data formatting is done, the first step is to map the 1st sentence as a graph where all words are nodes in the graph with edges(directed) established between adjacent nodes. After this, when we process the next set of sentences the choice of mapping any duplicate nodes to the existing nodes or representing them as different node all together is made by keeping the below mentioned pointers in mind —
- A word is considered to be a duplicate and is mapped to the same node only when it has the same lowercase word form, same part of speech and also no word from this sentence has already been mapped onto an existing similar node. In all other cases, you spawn a new node for this word.
Also, there are 3 possible node types — Stopwords, non-ambiguous words(for which no candidate node exists in the graph) and ambiguous words. We have already discussed what would you do in case of non-ambiguous words (point 1 above), for stopwords and ambiguous words where more than one candidate mapping is possible, we check the immediate context(the preceding and following words in the sentence and the neighbouring nodes in the graph) and select the candidate which has larger overlap in the context. Also before I forget, one of the edge weight schemes they define is the co-occurrence word count. After following all the steps, you will have a graph that looks similar to fig. 1.
Shortest Path as Compression
The author represents the shortest path of a certain predefined length over word graphs as the likely word sequence representing the compressed information or summary. The central idea over here is to go through the nodes in the graph which represent the most important concepts. Hence, to satisfy these constraints they invert the edge weights(because if co-occurrence count is high then inversion of that will be low making it suitable to run Shortest path algorithms), and search for the shortest path from “Start” to “End” token.
The final path score is set to the summation of all the edge weights for any particular path. Once all the paths have been scored, we filter and get top-k paths and select only those paths which are at least 8 words long and contains Verb. Post this filter, the path with the minimum total weight is selected as the final summary.
So yeah, thats that entire overall idea of the paper. Although, the the paper also talks about other edge weight strategies. I encourage you to read that section if you are interested.
If you wish you can also checkout other research paper summaries that i have written.
Feel free to read the entire paper and say “Hi” to the authors and appreciate their contribution.
Paper Title: Multi-Sentence Compression: Finding Shortest Paths in Word Graphs
Authors: Katja Filippova
Also, in case you enjoyed reading this article, you can choose to buy me a “chai” on https://www.buymeacoffee.com/TechvizCoffee — because I don’t actually drink coffee 🙂 Thank you very much! It’s totally optional and voluntary 🙂
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot