What is a Recurrent Neural Network and Implementation of SimpleRNN, GRU, and LSTM Models in Keras…

https://miro.medium.com/max/1200/0*kBop4AcNBaQKonwV

Original Source Here

Implementation of SimpleRNN, GRU, and LSTM Models in Keras and Tensorflow For an NLP Project

Implementation of Three Types of Recurrent Neural Networks in Tensorflow and Keras for Sentiment Analysis

Recurrent neural networks (RNNs) are one of the states of the art algorithm in deep learning especially good for sequential data. It is used in many high-profile applications including Google’s voice search and Apple’s Siri. The reason it became so popular is its internal memory. RNNs are the first deep learning algorithm that can remember the input. After the invention of LSTM and GRU, RNNs have become really powerful. In this article, we will work on a Natural Language Processing project using all these three types of RNNs.

We will discuss how RNNs work and bring so much efficiency in the deep learning field. There will be a practical implementation of a Simple RNN, GRU, and LSTM for a sentiment analysis task.

I will discuss very briefly how a simple recurrent neural network works for a refresher and then dive into the implementation. If you want feel free to skip this part and go directly to the Tensorflow implementation part.

What is a Recurrent Neural Network?

Because recurrent neural networks have internal memory, they can remember important things from the input and the previous layers. That’s why it can be more efficient than regular neural networks and RNNs are the preferred algorithms for sequential data like text data, time series, financial data, weather forecasting, and much more.

How Recurrent Neural Networks Work?

To really understand how a recurrent neural network works and why it is special, we need to compare it with a regular feed-forward neural network.

Here is a picture demonstration of a regular feed-forward neural network:

Image by Author

In a feed-forward neural network, information moves from one layer to another. We calculate the hidden layer using the information from the input layer. If there are several hidden layers, only the information from the previously hidden layer is used to calculate the next hidden layer. When the output layer is calculated, only the information from the previous layer is used. So, by the time we are calculating the output layer, we forget the input layer or the other layers.

But when we are working with text data, time-series data, or any other sequential data, it is important to remember what was there in the previous layers as well.

In RNN, the information cycles through an internal loop. So, when RNN calculates the output layer, it considers the previous layer and also the layer before. Because it has a short-term memory. Here is a picture demonstration:

Image by Author

As this picture shows, the information gets cycled through the layer. When cycling happens, it has information from the recent past. Here is the unrolled version of the RNN structure that gives a better understanding of how it works:

Image by Author

Here x0, x1, and x2 denote the inputs. H0, H1, and H2 are the neurons in the hidden layer, and y0, y1, and y2 are the outputs.

As shown in the picture above, each timestamp takes the information from the previous neuron and also from the input. The information cycles through in one layer.

This is very important because the previous neuron may contain crucial information about what is coming next. For example, consider this sentence,

The sky is Blue

If I only know the word “is”, I cannot think what is coming next. But If I know two consecutive words “sky is” then we might be able to think of the word “blue”.

But this is also a limitation of Recurrent neural networks. RNNs only have a short-term memory. But short-term memory is not enough all the time to find out what is coming next. For example,

She is Chinese and her language is …

Here, only remembering the previous two or three words does not give us the context to know what the language is. We have to go all the way to remember the word “Chinese”. Only then we will be able to predict the name of the language.

That’s when Long Short Term Memory (LSTM) or Gated Recurrent Unit (GRU)helps. Both of them are more advanced versions of simple RNN. Explaining their mechanisms is out of the scope of this article. My focus for this article is to show how to implement them in TensorFlow.

Dataset

I will use the IMDB dataset which comes with Tensorflow. It is a large movie review dataset. The data is text data and labels are binary. It has 25000 training data and 25000 test data already separated for us. Learn more about this dataset here. This is a very good dataset for practicing some Natural Language Processing tasks. Each row of this dataset contains text data as expected and the label is either 0 or 1. So, it represents either good sentiment or bad sentiment.

Let’s dive into the project.

Here are the imports:

import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

I am importing the IMDB dataset and the info with it:

imdb, info = tfds.load("imdb_reviews",
with_info=True, as_supervised=True)

Set training and test data in separate variables:

train_data, test_data = imdb['train'], imdb['test']

Data Preprocessing

Having all the texts as a list and labels as a separate list will be helpful. So, training sentences and labels and testing sentences and labels are retrieved as lists here:

training_sentences = []
training_labels = []
testing_sentences = []
testing_labels = []
for s,l in train_data:
training_sentences.append(str(s.numpy()))
training_labels.append(l.numpy())
for s,l in test_data:
testing_sentences.append(str(s.numpy()))
testing_labels.append(l.numpy())

Converting the labels as NumPy arrays:

training_labels_final = np.array(training_labels)
testing_labels_final = np.array(testing_labels)

Here I am setting some important parameters necessary for the model. I will explain what they are after this:

vocab_size = 10000
embedding_dim=16
max_length = 120
trunc_type= 'post'
oov_tok="<OOV>"

Here, vocal_size 10000. That means 10000 unique words will be used for this model. If the IMDB dataset has more than 10000 words, extra words will not be used to train the model. So, usually, we are careful about taking this number. Please feel free to try with different vocab_size.

The next parameter is ‘embedding_dim’. It represents the size of the vector that will be used to represent each word. Here embedding_dim is 16 means, a vector of size 16 will be representing each word. You can also try a different number here.

A maximum length of 120 words will be used for each piece of text or to predict a label. This is what is represented by the max_length parameter. If the text is originally bigger than that it will be truncated.

The next parameter trunc_type is set to be ‘post’. That means the text will be truncated at the end.

If there is an unknown word that will be represented by oov_tok.

Data preprocessing is started by tokenizing the texts in NLP projects.

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
tokenizer = Tokenizer(num_words = vocab_size, oov_token=oov_tok)
tokenizer.fit_on_texts(training_sentences)
word_index = tokenizer.word_index
word_index

Here is part of the output for word_index:

{'<OOV>': 1,
'the': 2,
'and': 3,
'a': 4,
'of': 5,
'to': 6,
'is': 7,
'br': 8,
'in': 9,
'it': 10,
'i': 11,
'this': 12,
'that': 13,
'was': 14,
'as': 15,
'for': 16,
'with': 17,
'movie': 18,

So, we have a unique integer value for each word. Here we are arranging our sentences using these integer values instead of words. Also, use padding if the sentences are less than our set max_length 120 words. That way we will have the same size vector for each text.

sequences = tokenizer.texts_to_sequences(training_sentences)
padded = pad_sequences(sequences, maxlen=max_length,
truncating = trunc_type)
testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences, maxlen=max_length)

Data preprocessing is done.

Model Development

This is the fun part.

Simple RNN

The first model will be a simple Recurrent Neural Network model.

In this model, the first layer will be the embedding layer where sentences will be represented as max_length by embedding_dim vectors. The next layer is a simple RNN layer. Then the dense layers. Here is the model

model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
input_length=max_length),
tf.keras.layers.SimpleRNN(32),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.summary()

Output:

Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_8 (Embedding) (None, 120, 16) 160000
_________________________________________________________________
simple_rnn (SimpleRNN) (None, 32) 1568
_________________________________________________________________
dense_16 (Dense) (None, 10) 330
_________________________________________________________________
dense_17 (Dense) (None, 1) 11
=================================================================
Total params: 161,909
Trainable params: 161,909
Non-trainable params: 0
_________________________

Look at the output shape of each layer. The first layer output shape is (120, 16). Remember our max_length for each sentence was 120 and the embedding dimension was 16. Please feel free to change these numbers and check the results.

In the second layer, we put 32 as the parameter in the SimpleRNN layer and the output shape is also 32.

Here we will compile the model using the loss function of binary_crossentropy, ‘adam’ optimizer, and the evaluation metric as the accuracy.

model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])

I will train the model for 30 epochs.

num_epochs=30
history=model.fit(padded, training_labels_final, epochs=num_epochs, validation_data = (testing_padded, testing_labels_final))

Output:

Epoch 1/30
782/782 [==============================] - 17s 20ms/step - loss: 0.6881 - accuracy: 0.5256 - val_loss: 0.6479 - val_accuracy: 0.6355
Epoch 2/30
782/782 [==============================] - 15s 19ms/step - loss: 0.5035 - accuracy: 0.7632 - val_loss: 0.4880 - val_accuracy: 0.7865
Epoch 3/30
782/782 [==============================] - 15s 20ms/step - loss: 0.2917 - accuracy: 0.8804 - val_loss: 0.5812 - val_accuracy: 0.7457
Epoch 4/30
782/782 [==============================] - 16s 20ms/step - loss: 0.1393 - accuracy: 0.9489 - val_loss: 0.6386 - val_accuracy: 0.7952
Epoch 5/30
782/782 [==============================] - 15s 19ms/step - loss: 0.0655 - accuracy: 0.9768 - val_loss: 0.9400 - val_accuracy: 0.7277
Epoch 6/30
782/782 [==============================] - 16s 20ms/step - loss: 0.0360 - accuracy: 0.9880 - val_loss: 0.9493 - val_accuracy: 0.7912
Epoch 7/30
782/782 [==============================] - 15s 20ms/step - loss: 0.0273 - accuracy: 0.9900 - val_loss: 1.1033 - val_accuracy: 0.7491
Epoch 8/30
782/782 [==============================] - 16s 20ms/step - loss: 0.0993 - accuracy: 0.9639 - val_loss: 1.1237 - val_accuracy: 0.5752
Epoch 9/30
782/782 [==============================] - 15s 19ms/step - loss: 0.2071 - accuracy: 0.9136 - val_loss: 1.0613 - val_accuracy: 0.6309
Epoch 10/30
782/782 [==============================] - 15s 20ms/step - loss: 0.0267 - accuracy: 0.9928 - val_loss: 1.4416 - val_accuracy: 0.6720
Epoch 11/30
782/782 [==============================] - 16s 20ms/step - loss: 0.0031 - accuracy: 0.9996 - val_loss: 1.6674 - val_accuracy: 0.6721
Epoch 12/30
782/782 [==============================] - 15s 20ms/step - loss: 5.8072e-04 - accuracy: 1.0000 - val_loss: 1.8338 - val_accuracy: 0.6714
Epoch 13/30
782/782 [==============================] - 15s 19ms/step - loss: 2.5399e-04 - accuracy: 1.0000 - val_loss: 1.8619 - val_accuracy: 0.6824
Epoch 14/30
782/782 [==============================] - 15s 20ms/step - loss: 1.4048e-04 - accuracy: 1.0000 - val_loss: 1.8995 - val_accuracy: 0.6927
Epoch 15/30
782/782 [==============================] - 15s 20ms/step - loss: 8.4974e-05 - accuracy: 1.0000 - val_loss: 1.9867 - val_accuracy: 0.6934
Epoch 16/30
782/782 [==============================] - 15s 20ms/step - loss: 5.2411e-05 - accuracy: 1.0000 - val_loss: 2.0710 - val_accuracy: 0.6940
Epoch 17/30
782/782 [==============================] - 17s 22ms/step - loss: 3.2760e-05 - accuracy: 1.0000 - val_loss: 2.1278 - val_accuracy: 0.6980
Epoch 18/30
782/782 [==============================] - 16s 20ms/step - loss: 2.0648e-05 - accuracy: 1.0000 - val_loss: 2.2035 - val_accuracy: 0.6988
Epoch 19/30
782/782 [==============================] - 15s 19ms/step - loss: 1.3099e-05 - accuracy: 1.0000 - val_loss: 2.2611 - val_accuracy: 0.7031
Epoch 20/30
782/782 [==============================] - 15s 20ms/step - loss: 8.3039e-06 - accuracy: 1.0000 - val_loss: 2.3340 - val_accuracy: 0.7038
Epoch 21/30
782/782 [==============================] - 16s 20ms/step - loss: 5.2835e-06 - accuracy: 1.0000 - val_loss: 2.4453 - val_accuracy: 0.7003
Epoch 22/30
782/782 [==============================] - 16s 20ms/step - loss: 3.3794e-06 - accuracy: 1.0000 - val_loss: 2.4580 - val_accuracy: 0.7083
Epoch 23/30
782/782 [==============================] - 20s 26ms/step - loss: 2.1589e-06 - accuracy: 1.0000 - val_loss: 2.5184 - val_accuracy: 0.7112
Epoch 24/30
782/782 [==============================] - 18s 23ms/step - loss: 1.3891e-06 - accuracy: 1.0000 - val_loss: 2.6400 - val_accuracy: 0.7055
Epoch 25/30
782/782 [==============================] - 20s 25ms/step - loss: 8.9716e-07 - accuracy: 1.0000 - val_loss: 2.6727 - val_accuracy: 0.7107
Epoch 26/30
782/782 [==============================] - 20s 25ms/step - loss: 5.7747e-07 - accuracy: 1.0000 - val_loss: 2.7517 - val_accuracy: 0.7105
Epoch 27/30
782/782 [==============================] - 17s 22ms/step - loss: 3.7458e-07 - accuracy: 1.0000 - val_loss: 2.7854 - val_accuracy: 0.7159
Epoch 28/30
782/782 [==============================] - 18s 23ms/step - loss: 2.4265e-07 - accuracy: 1.0000 - val_loss: 2.8592 - val_accuracy: 0.7158
Epoch 29/30
782/782 [==============================] - 17s 22ms/step - loss: 1.5808e-07 - accuracy: 1.0000 - val_loss: 2.9216 - val_accuracy: 0.7172
Epoch 30/30
782/782 [==============================] - 17s 22ms/step - loss: 1.0345e-07 - accuracy: 1.0000 - val_loss: 2.9910 - val_accuracy: 0.7174

After 30 epochs training accuracy becomes 1.00 or 100%. Perfect, right? But the validation accuracy is 71.74%. Not that bad. But huge overfitting issue.

It will be good to see how accuracies and losses changed with each epoch.

import matplotlib.pyplot as pltdef plot_graphs(history, string):
plt.plot(history.history[string])
plt.plot(history.history['val_'+string])
plt.xlabel("Epochs")
plt.ylabel(string)
plt.legend([string, 'val_'+string])
plt.show()
plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')
Image by Author

Validation accuracy oscillated a lot in the beginning and then got settled at 71.74%. On the other hand, the training accuracy went up steadily to 100% except for a dip.

But the loss curve for validation looks pretty bad. It kept going up. We want the loss to go down.

Gated Recurrent Unit (GRU)

This is an improved version of the RNN model. It is more efficient than SimpleRNN models. It has two gates: reset and update. For this demonstration, I will replace the SimpleRNN layer with a bidirectional GRU layer with the same number of units.

Here is the model

model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim,
input_length=max_length),
tf.keras.layers.Bidirectional(tf.keras.layers.GRU(32)),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.summary()

Output:

Model: "sequential_6"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_6 (Embedding) (None, 120, 16) 160000
_________________________________________________________________
bidirectional_6 (Bidirection (None, 64) 9600
_________________________________________________________________
dense_12 (Dense) (None, 10) 650
_________________________________________________________________
dense_13 (Dense) (None, 1) 11
=================================================================
Total params: 170,261
Trainable params: 170,261
Non-trainable params: 0

As you can see the output shape of all the layers is exactly the same as the previous layer except for the GRU layer. Because it is bidirectional, 32 becomes 64 here.

I will compile the model exactly with the same parameters as before and also train the model for the same number of epochs.

model.compile(loss="binary_crossentropy",
optimizer='adam',
metrics=['accuracy'])
history=model.fit(padded, training_labels_final, epochs=num_epochs,
validation_data = (testing_padded,testing_labels_final))

Output:

Epoch 1/30
782/782 [==============================] - 50s 60ms/step - loss: 0.5664 - accuracy: 0.6990 - val_loss: 0.4313 - val_accuracy: 0.8025
Epoch 2/30
782/782 [==============================] - 43s 56ms/step - loss: 0.3638 - accuracy: 0.8440 - val_loss: 0.3667 - val_accuracy: 0.8394
Epoch 3/30
782/782 [==============================] - 44s 56ms/step - loss: 0.2852 - accuracy: 0.8882 - val_loss: 0.3695 - val_accuracy: 0.8420
Epoch 4/30
782/782 [==============================] - 43s 55ms/step - loss: 0.2330 - accuracy: 0.9120 - val_loss: 0.3979 - val_accuracy: 0.8381
Epoch 5/30
782/782 [==============================] - 42s 54ms/step - loss: 0.1942 - accuracy: 0.9323 - val_loss: 0.4386 - val_accuracy: 0.8334
Epoch 6/30
782/782 [==============================] - 44s 56ms/step - loss: 0.1573 - accuracy: 0.9472 - val_loss: 0.4546 - val_accuracy: 0.8290
Epoch 7/30
782/782 [==============================] - 44s 57ms/step - loss: 0.1223 - accuracy: 0.9612 - val_loss: 0.5259 - val_accuracy: 0.8244
Epoch 8/30
782/782 [==============================] - 44s 56ms/step - loss: 0.0897 - accuracy: 0.9729 - val_loss: 0.6248 - val_accuracy: 0.8234
Epoch 9/30
782/782 [==============================] - 44s 57ms/step - loss: 0.0690 - accuracy: 0.9788 - val_loss: 0.6511 - val_accuracy: 0.8169
Epoch 10/30
782/782 [==============================] - 50s 64ms/step - loss: 0.0514 - accuracy: 0.9847 - val_loss: 0.7230 - val_accuracy: 0.8223
Epoch 11/30
782/782 [==============================] - 45s 57ms/step - loss: 0.0402 - accuracy: 0.9882 - val_loss: 0.8357 - val_accuracy: 0.8167
Epoch 12/30
782/782 [==============================] - 52s 67ms/step - loss: 0.0323 - accuracy: 0.9902 - val_loss: 0.9256 - val_accuracy: 0.8140
Epoch 13/30
782/782 [==============================] - 49s 63ms/step - loss: 0.0286 - accuracy: 0.9915 - val_loss: 0.9685 - val_accuracy: 0.8184
Epoch 14/30
782/782 [==============================] - 45s 58ms/step - loss: 0.0232 - accuracy: 0.9930 - val_loss: 0.8898 - val_accuracy: 0.8146
Epoch 15/30
782/782 [==============================] - 44s 57ms/step - loss: 0.0209 - accuracy: 0.9927 - val_loss: 1.0375 - val_accuracy: 0.8144
Epoch 16/30
782/782 [==============================] - 43s 55ms/step - loss: 0.0179 - accuracy: 0.9944 - val_loss: 1.0408 - val_accuracy: 0.8131
Epoch 17/30
782/782 [==============================] - 45s 57ms/step - loss: 0.0131 - accuracy: 0.9960 - val_loss: 1.0855 - val_accuracy: 0.8143
Epoch 18/30
782/782 [==============================] - 43s 55ms/step - loss: 0.0122 - accuracy: 0.9964 - val_loss: 1.1825 - val_accuracy: 0.8105
Epoch 19/30
782/782 [==============================] - 45s 57ms/step - loss: 0.0091 - accuracy: 0.9972 - val_loss: 1.3037 - val_accuracy: 0.8097
Epoch 20/30
782/782 [==============================] - 44s 56ms/step - loss: 0.0089 - accuracy: 0.9972 - val_loss: 1.2140 - val_accuracy: 0.8125
Epoch 21/30
782/782 [==============================] - 45s 58ms/step - loss: 0.0076 - accuracy: 0.9976 - val_loss: 1.2321 - val_accuracy: 0.8136
Epoch 22/30
782/782 [==============================] - 45s 58ms/step - loss: 0.0119 - accuracy: 0.9962 - val_loss: 1.1726 - val_accuracy: 0.8072
Epoch 23/30
782/782 [==============================] - 46s 59ms/step - loss: 0.0093 - accuracy: 0.9969 - val_loss: 1.2273 - val_accuracy: 0.8029
Epoch 24/30
782/782 [==============================] - 45s 57ms/step - loss: 0.0065 - accuracy: 0.9978 - val_loss: 1.3390 - val_accuracy: 0.8118
Epoch 25/30
782/782 [==============================] - 50s 64ms/step - loss: 0.0053 - accuracy: 0.9984 - val_loss: 1.2323 - val_accuracy: 0.8088
Epoch 26/30
782/782 [==============================] - 44s 56ms/step - loss: 0.0081 - accuracy: 0.9973 - val_loss: 1.2998 - val_accuracy: 0.8123
Epoch 27/30
782/782 [==============================] - 44s 57ms/step - loss: 0.0043 - accuracy: 0.9986 - val_loss: 1.3976 - val_accuracy: 0.8098
Epoch 28/30
782/782 [==============================] - 41s 53ms/step - loss: 0.0039 - accuracy: 0.9989 - val_loss: 1.6791 - val_accuracy: 0.8043
Epoch 29/30
782/782 [==============================] - 42s 54ms/step - loss: 0.0037 - accuracy: 0.9987 - val_loss: 1.4269 - val_accuracy: 0.8101
Epoch 30/30
782/782 [==============================] - 41s 53ms/step - loss: 0.0059 - accuracy: 0.9982 - val_loss: 1.4012 - val_accuracy: 0.8150

As you can see the final accuracy after 30 epochs for the training set is 0.9982 or 99.82% and the accuracy for the validation accuracy is 81.50%. The training accuracy is slightly less than the Simple RNN model but the validation accuracy went up significantly.

There is still an overfitting issue. But in the NLP projects, some overfitting is normal. Because no matter what are the words bank in your training set, validation data may still have some new words.

A plot of accuracy and losses per epoch will show a better picture of how with each epoch accuracies and losses changed. I will use the plot_graphs function from the previous section to plot the graphs.

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')
Image by Author

The training accuracy gradually improved and went over 99% with epochs. But validation accuracy went up to 85% and stayed pretty stable around that.

As you can see, the validation loss is still going up. But if you notice the numbers in the y-ticks, for simple RNN the validation losses went from 0.5 to 3.0. But for this GRU model, it went from 0.4 to 1.7. So, overall validation loss is far less.

Long Short Term Model (LSTM)

I wanted to show the implementation of an LSTM model as well. The main difference between an LSTM model and a GRU model is, LSTM model has three gates (input, output, and forget gates) whereas the GRU model has two gates as mentioned before.

Here I will only replace the GRU layer from the previous model and use an LSTM layer. Here is the model:

model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
tf.keras.layers.Dense(10, activation='relu'),
tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
model.summary()

Output:

Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_5 (Embedding) (None, 120, 16) 160000
_________________________________________________________________
bidirectional_5 (Bidirection (None, 64) 12544
_________________________________________________________________
dense_10 (Dense) (None, 10) 650
_________________________________________________________________
dense_11 (Dense) (None, 1) 11
=================================================================
Total params: 173,205
Trainable params: 173,205
Non-trainable params: 0

The model summary looks almost close to the GRU model in the previous section. The output shapes are exactly the same. But params # is different in the LSTM layer. Because as mentioned before LSTM models have three gates instead of two gates. I will train for 30 epochs again:

num_epochs=30
history = model.fit(padded, training_labels_final, epochs=num_epochs, validation_data = (testing_padded, testing_labels_final))

Output:

Epoch 1/30
782/782 [==============================] - 46s 54ms/step - loss: 0.4740 - accuracy: 0.7665 - val_loss: 0.3732 - val_accuracy: 0.8413
Epoch 2/30
782/782 [==============================] - 38s 48ms/step - loss: 0.3010 - accuracy: 0.8789 - val_loss: 0.3687 - val_accuracy: 0.8397
Epoch 3/30
782/782 [==============================] - 37s 48ms/step - loss: 0.2380 - accuracy: 0.9079 - val_loss: 0.4850 - val_accuracy: 0.8311
Epoch 4/30
782/782 [==============================] - 38s 48ms/step - loss: 0.1890 - accuracy: 0.9306 - val_loss: 0.4697 - val_accuracy: 0.8355
Epoch 5/30
782/782 [==============================] - 41s 53ms/step - loss: 0.1453 - accuracy: 0.9494 - val_loss: 0.4581 - val_accuracy: 0.8305
Epoch 6/30
782/782 [==============================] - 42s 54ms/step - loss: 0.1118 - accuracy: 0.9622 - val_loss: 0.7176 - val_accuracy: 0.8212
Epoch 7/30
782/782 [==============================] - 41s 53ms/step - loss: 0.0952 - accuracy: 0.9676 - val_loss: 0.7107 - val_accuracy: 0.8230
Epoch 8/30
782/782 [==============================] - 41s 52ms/step - loss: 0.0769 - accuracy: 0.9753 - val_loss: 0.6845 - val_accuracy: 0.8205
Epoch 9/30
782/782 [==============================] - 39s 49ms/step - loss: 0.0567 - accuracy: 0.9824 - val_loss: 0.9333 - val_accuracy: 0.8163
Epoch 10/30
782/782 [==============================] - 42s 54ms/step - loss: 0.0564 - accuracy: 0.9826 - val_loss: 0.8228 - val_accuracy: 0.8171
Epoch 11/30
782/782 [==============================] - 42s 54ms/step - loss: 0.0466 - accuracy: 0.9862 - val_loss: 0.9426 - val_accuracy: 0.8177
Epoch 12/30
782/782 [==============================] - 44s 56ms/step - loss: 0.0445 - accuracy: 0.9861 - val_loss: 0.9138 - val_accuracy: 0.8144
Epoch 13/30
782/782 [==============================] - 44s 57ms/step - loss: 0.0343 - accuracy: 0.9897 - val_loss: 0.9876 - val_accuracy: 0.8149
Epoch 14/30
782/782 [==============================] - 42s 53ms/step - loss: 0.0282 - accuracy: 0.9919 - val_loss: 1.0017 - val_accuracy: 0.8152
Epoch 15/30
782/782 [==============================] - 41s 52ms/step - loss: 0.0251 - accuracy: 0.9932 - val_loss: 1.0724 - val_accuracy: 0.8158
Epoch 16/30
782/782 [==============================] - 41s 53ms/step - loss: 0.0267 - accuracy: 0.9912 - val_loss: 1.0648 - val_accuracy: 0.8117
Epoch 17/30
782/782 [==============================] - 43s 55ms/step - loss: 0.0258 - accuracy: 0.9922 - val_loss: 0.9267 - val_accuracy: 0.8109
Epoch 18/30
782/782 [==============================] - 40s 51ms/step - loss: 0.0211 - accuracy: 0.9936 - val_loss: 1.0909 - val_accuracy: 0.8104
Epoch 19/30
782/782 [==============================] - 38s 49ms/step - loss: 0.0114 - accuracy: 0.9967 - val_loss: 1.1444 - val_accuracy: 0.8134
Epoch 20/30
782/782 [==============================] - 38s 48ms/step - loss: 0.0154 - accuracy: 0.9954 - val_loss: 1.1040 - val_accuracy: 0.8124
Epoch 21/30
782/782 [==============================] - 38s 49ms/step - loss: 0.0196 - accuracy: 0.9941 - val_loss: 1.2061 - val_accuracy: 0.8128
Epoch 22/30
782/782 [==============================] - 41s 53ms/step - loss: 0.0093 - accuracy: 0.9971 - val_loss: 1.3365 - val_accuracy: 0.8032
Epoch 23/30
782/782 [==============================] - 42s 54ms/step - loss: 0.0155 - accuracy: 0.9953 - val_loss: 1.2835 - val_accuracy: 0.8123
Epoch 24/30
782/782 [==============================] - 43s 55ms/step - loss: 0.0168 - accuracy: 0.9948 - val_loss: 1.2476 - val_accuracy: 0.8160
Epoch 25/30
782/782 [==============================] - 38s 48ms/step - loss: 0.0108 - accuracy: 0.9967 - val_loss: 1.0810 - val_accuracy: 0.8091
Epoch 26/30
782/782 [==============================] - 39s 50ms/step - loss: 0.0132 - accuracy: 0.9960 - val_loss: 1.3154 - val_accuracy: 0.8118
Epoch 27/30
782/782 [==============================] - 42s 54ms/step - loss: 0.0056 - accuracy: 0.9985 - val_loss: 1.4012 - val_accuracy: 0.8106
Epoch 28/30
782/782 [==============================] - 40s 51ms/step - loss: 0.0046 - accuracy: 0.9986 - val_loss: 1.4809 - val_accuracy: 0.8134
Epoch 29/30
782/782 [==============================] - 40s 51ms/step - loss: 0.0074 - accuracy: 0.9977 - val_loss: 1.4389 - val_accuracy: 0.8104
Epoch 30/30
782/782 [==============================] - 41s 52ms/step - loss: 0.0126 - accuracy: 0.9958 - val_loss: 1.2202 - val_accuracy: 0.8124

As you can see training accuracy and validation accuracy are pretty close to the GRU model. Here is the plot:

plot_graphs(history, 'accuracy')
plot_graphs(history, 'loss')
Image by Author

Validation loss is even better this time.

Conclusion

In this article, I wanted to explain what is Recurrent Neural Network and why it is better than a regular neural network for sequential data. And then Demonstrated the implementation of a Simple RNN, GRU, and LSTM model with the same dataset for a Natural Language Processing task. This was specifically a sentiment analysis project.

This article shows the accuracy results look similar for GRU and LSTM models. But this is not a general conclusion. For a different project, you may see a different result. In fact, if you change some parameters such as the number of neurons, GRU or LSTM units, number of layers, you may see a significant difference in results in this project as well.

More Reading

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: