Understanding the basics of LSTM-units


Original Source Here

Photo by Sonja Langford on Unsplash

Understanding the basics of LSTM-units

Long Short-Term Memory (LSTM) is one of the most successful recurrent neural networks in modern real world applications because of its clever use of gates to keep or discard long and short-term information in its memory.

It was introduced in the Long short-term memory paper by Hochreiter & Schmidhuber (1997), and later on refined in the paper by Gers Felix et al. (2000) with the addition of a gate to the weights, making LSTM cells compatible with different sequence lengths. An LSTM unit as seen in figure 2 differs from a standard RNN unit figure 1 in a lot of ways.

Figure 1: Recurrent hidden unit. Based on graphic from (Goodfellow et al. 2016, p. 378)


Figure 2: Long Short-Term Memory unit. Based on graphic from (Goodfellow et al. 2016, p. 409)

LSTM mitigates the issue a standard RNN has with long-term dependencies, by having dedicated long and short-term in and output.

Figure 2 shows a graphical representation of an LSTM unit. The LSTM unit have several neural network layers inside (dark gray boxes) the layers are all labelled with their activation functions (σ; tanh). The connection is represented by the arrows and the pointwise operations (addition, multiplication) are represented with their respective mathematical sign. A LSTM cell have several outputs, one output h to the layer ahead of it and another output hand cto the next LSTM unit in the temporal space t.

The job of an LSTM unit is to decide what information to remember and what to forget. One can look at an LSTM as a set of steps within the cell. The steps below explains what happens inside the LSTM in figure 2.

There are too many mathematical notations to write directly as text on Medium, so here is a screenshot

Even though, LSTM networks is very successful when used for timeseries applications, the network still suffers from the vanishing and exploding gradients problem explained in LINK TO RNN ARTICLE. This means that the problem must still be addressed when building an LSTM network.


Gers Felix, A., Jurgen, S. & Cummins, F. (2000), `Learning to forget: Continual prediction with lstm’, Neural computation 12(10), 2451{2471.

Hochreiter, S. & Schmidhuber, J. (1997), `Long short-term memory’, Neural computation 9(8), 1735-1780.

Goodfellow, I., Bengio, Y. & Courville, A. (2016), Deep Learning, MIT Press.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: