Understanding the basics of RNNs


Original Source Here

Photo by Katie Montgomery on Unsplash

Understanding the basics of RNNs

A recurrent neural network (RNN) is recurrent in the way it processes the sequences of data. This means that the same operation a is performed on all the data of the sequence input x, returning output ˆy of the specific unit for the specific point in the sequence t.

The figure below shows a simplified RNN, with one hidden layer, folded (left) and unfolded (right) through the dimension of time, where x_t is input for time t, U are the weights for input x to the hidden connections, W are the weights for output passed on in the hidden-to-hidden connection between at and a_(t+1), and V are the weights for output passed on in the hidden-to-output connection a to ˆy .

Recurrent hidden unit. Based on graphic from (Goodfellow et al. 2016, p. 378)

The intuition of the workings of a RNN can be understood from the figure, as it illustrates how the hidden unit at time t receives x_t and W * a_(t􀀀1) as input. Thus, the network can pass on information from the past t_1 to the present t which also are passed on to time t + 1, effectively acting as the memory of the network. The parameters: weights U, V and W and the bias (not shown in the figure), are shared across time and does not change according to the timepoint of input x. Sharing parameters across time have the benefit of enabling an arbitrary temporal size of input without a large increase in computations in the network when working with large temporal sizes. Backpropagation of an RNN is called Backpropagation Through Time (BPTT) and works just as for a traditional neural network, where the errors are computed on the outputs and the gradients are backpropagated and computed on the inputs.

The idea behind recurrent units is also what is used in popular units such as LSTM or GRU. Read about the basics of LSTM units here:

Any neural network struggles when the computational graph becomes too deep. This happens with traditional neural networks when the number of layers is very large or to RNNs when the sequence length of the input data is too big. The reason for the struggle of deep RNNs comes from the repeated multiplication of the parameters, especially weights when they are less or greater than one. Read more about this in this article:


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: