A Review on ‘Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank’

Original Source Here

A Review on ‘Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank’


This paper discusses about different methods of compositionality for words and n-grams and how to predict a binary or a multi-class (5 in this case) fine grained sentiment for phrases or words in a bottom up manner with the help of a tree based representation. The authors propose a dataset ‘Stanford Treebank’ for sentence/ phrase representation and a model ‘Recursive Tensor Neural Network’ for the prediction of labels for fine-grained sentiment for the same.


The authors introduce a dataset ‘Stanford Treebank’, which consists of 11,855 sentences each of which is parsed using ‘Stanford Parser’, resulting into 215,154 phrases, which are labeled using Amazon Mechanical Turk. There are five possible labels for each word (negative, somewhat negative, neutral, positive or somewhat positive). The main motivation for the creation of this dataset was to overcome the inability of bag of words to consider word order, which is helpful while considering classification of hard examples of negation. For the dataset and visual representation, visit here


The idea behind the paper is based on the following concepts:

a) Tree based representation:

A sentence is broken down into words, with each word being a leaf node of the tree. The main idea behind this is to capture the sentiment over a range of words. For example, let us consider a sentence: ‘I dislike rain but I love winter’. When we represent the sentence using a tree, we can see the parent of the word ‘dislike’ getting a negative label, but as we progress through the sentence, the sentiment gets positive because of the word ‘love’. Thus, as word order is preserved in this representation, we can make predictions more accurately.

Tree based representation of a sentence along with sentiment for each node

b) Compositionality function:

Compositionality in simple terms is considering the meaning of words together, or the semantic meaning for a set of words as a whole is a function of semantic meaning of the words itself. This idea is used in this paper, to compute the vector representation of phrases, which are then used as features for their sentiment classification. It makes sense intuitively, because words when considered together can mean different, while words when considered alone, can mean different.

Compositionality function to calculate features for parent nodes in a recursive manner.

c) Recursive nature of model:

The models used for this task are applied in a recursive manner. At first, leaf nodes are

represented using vectors. These vectors are then passed to the compositionality function for their parent in a bottom up manner and are also used as features for classification tasks of each node. Hence, in this way, vectors are created for the parent nodes. These vectors, which have been calculated, are parameters which are updated in the training process. In the end, these features are provided to the softmax classifier which will give us probabilities for each label.


The authors propose a model ‘Recursive Tensor Neural Network’ for this task. The main motivation behind this model arises from two previous works in this field:

a) Recursive Neural Network (RNN) :

Since the order of computation for the data is recursive in nature (parent’s vectors depend on vectors of their children), an RNN is a suitable model for this purpose. A learnable parameter is introduced for finding out the vectors of parents, for each child, and tanh is used as element-wise non-linearity.

b) Matrix-Vector RNN (MV-RNN) :

In this form of RNN, each word is represented with two entities: a) matrix b) vector.

While computing for the parent, the matrix of one child is multiplied by the vector of the other child and vice versa.

c) Recursive Neural Tensor Network (RNTN):

The main motivation behind RNTN was the shortcomings of RNN and Matrix Vector RNN. In RNNs, the relation between input vector and the output vector is calculated through a learnable parameter and applying tanh, however, a multiplicative method would allow for a better interaction. In MV-RNN, since we have a matrix as a representation of a word/ long phrase, the number of parameters become very large and depend on the size of vocabulary. Hence, the authors discuss the use of a single composition function which can perform better as compared to the shortcomings discussed above. Since the tensor in RNTN is multidimensional, it is able to capture different types of composition. Also, if the tensor is set to 0, the output can be directly related to the input. One drawback of RNTN is that it is difficult to optimize the model further if any extra layers are added. The RNTN model is able to structurally learn negation for both positive as well as negative rules.


Thus, RNTN, and Sentiment Treebank allow to capture fine-grained sentiments over a sequence of words. Also, it is able to perform well on contrastive conjunction tasks as compared to MV-RNN and biNB.


title={Recursive deep models for semantic compositionality over a sentiment treebank},
author={Socher, Richard and Perelygin, Alex and Wu, Jean and Chuang, Jason and Manning, Christopher D and Ng, Andrew Y and Potts, Christopher},
booktitle={Proceedings of the 2013 conference on empirical methods in natural language processing},

Hurray !! This comes to the end of the article. Hope this has been a good read 😀.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: