Original Source Here
Long Short-Term Memory Models
Long Short-Term Memory Models (LSTM) are a variant of recurrent neural networks (RNN). These models are composed of multiple channels that data can pass through to maintain information from previous steps to be used in future steps.
The LSTM cells that are used in the predictive models shown here follow the same structure. At first glance, they look complicated but can be interpreted intuitively with a few observations.
There are two main paths through an LSTM cell; these can be roughly interpreted as short-term and long-term memory. With the short term being the bottom path through the cell and the long-term is the top path through the cell.
For short-term memory, notice how the input is directed into this path. As one would expect with short-term memory, the model interacts with the new data, Xt, and the output from the previous step. This data then has several possible paths through the cell.
The long term memory is the path along the top of the cell. This path has inputs from the previous long-term memory output. Then through the cell, the long-term memory is updated based on the updates from several different channels.
From short-term to long-term, the different paths with the LSTM cell are most clearly distinguished by the activation function used and the combination of intermediary outputs.
It is crucial to notice that a sigmoid function is used for some paths, and the hyperbolic tangent function is used for other paths. This choice is meaningful. The critical observation here is that the output from a sigmoid function is 0 to 1. In contrast, the output of a hyperbolic tangent function is -1 to 1.
Now, why is there a difference?
In the first channel between the short-term and long-term memory paths, the activation is a sigmoid function, meaning the output is between 0 and 1. Thus, this path dictates whether the input from the current step should be considered more important for long-term memory. If this output is a zero, then the final product between this and the previous long-term memory output results in 0. On the other hand, if this output is closer to 1, then the previous long-term memory persists.
The second and third channels use sigmoid and hyperbolic tangent activations, respectively. This combination acts as a two-step process, determining whether the current information is relevant and whether or not it contributed positively or negatively depending on the -1 to 1 output from the hyperbolic tangent function. This portion of the cell essentially answers these two questions. Then the contribution of the current time step to the long-term memory is added to the long-term memory.
The Final path combines the current time step with the current understanding contained within the long-term memory path. The output from this pathway becomes the output for the following LSTM cell.
Setup for an LSTM Model
model = Sequential()model.add(
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot