Original Source Here
To kick things off, we need to understand what a Recurrent Neural Network is. To put it simply, a Recurrent Neural Network (RNN) is a type of Artificial Neural Network which is aimed at developing an understanding of sequential data such as time-series data. It essentially, predicts new outcomes based of a new input as well as a previously predicted outcomes. Ultimately, RNNs when utilized in analyzing stock market data will create informed decisions based of previously predicted stock prices along with new stock market data using timesteps (the length of time for each sample), in our case it will be daily intervals.
Feed-forward networks which attempts to take an input directs it through a series of weights and layers (these help with learning) until a output is developed. An RNN in this case performs the same feed-forward action (pushing it from left to right) however when an output is generated, it’s also kept and held in a memory state or state layer. This memory state will then be activated when new input is introduced and will be fed through during each timestep.
Now that you understand some of the context, let’s start coding!
We’ll get our data using the Yahoo-Finance library. Also, let’s import some other libraries:
Then, we’ll get our data using the get_data() importation from the yahoo_finance library. While we’re at it, let’s rename some columns and round the prices to 2 decimal places:
Let’s explore our data first by visualizing the closing price, summary statistics and locate any presence of null values. We’ll use Amazon’s historical prices for this demonstration between 2017–01–01 to 2021–08–23:
It can be observed that Amazon between 2018 to 2019 and 2020 to late-2021 saw strong bullish tendencies and momentum with some stagnation between 2019 to 2020 and 2021. Other interesting analysis shows that Amazon had an all-time high of $3731.41 and an all-time low of $753.67. Also note that the majority 50% of all trades closed between $1445.09 and $2724.11. Thankfully, we don’t have any null values hence it saves us time in our data pre-processing.
Here are the steps in building our model:
- Pre-Process data
- Scaling Data
- Create training and testing data
- Model Building
- Training Model
- Plot diagnostics, loss functions and determine RMSE
- Visualize results: Out-Of-Sample Forecasting approach
Pre-Process and Scaling our Data
Pre-processing our data ensures that we remove any ambiguities, null values and extreme outliers in our data. It also allows us to manipulate the data in a format only compatible with our LSTM model. In this case, the following function filters out the “close” price in or dataset and attempts to scale it between a feature range of 0 and 1. The reason as to why we do this is to ensure that all our closing prices are constrained between specific boundaries, in this case 0 and 1. This helps our model better ascertain the data. The result will be returning the scaler transform and scaled data.
Training and Test Sets
In order for our LSTM model to capture the dynamics and relationships in our models effectively we’ll need to create two sub-categories from the whole dataset. Why do you ask? To prevent overfitting our model. Don’t worry I’ll explain this later. What’s most important is the purpose of these two datasets:
- Train Set: The Training set makes up a larger percentage of our closing prices as it is used to teach and train our model. This helps our model better generalize the overall trend.
- Test Set: The left-over percentage after taking the training set, helps determine whether our model was trained effectively and correctly.
These two functions work in similar ways, essentially they take the scaled data, a timestep (used for monitoring passage of inputs through the network as well as keeping track of sequential data in the array sequence by indexing over a particular interval) and the training_data_length (80% of the total closing price and it’s length). Since the training set will have 80% of the data, logically that would mean that our test set will have the other 20% of the data.
Now for the fun part…
Any network can have a number of inputs, outputs, layers, weights and hidden units/nodes. Each of these layers consist of an activation function which helps convert weighted sums (learning sequences) from one node to another resulting in an output. Keep in mind however that there limitations on how well networks can perform, too many or too little of each of these features could make a very big difference. This is a discipline in itself and does require an extensive knowledge in mathematics and computer science. But again, this article isn’t a research paper, just a taster.
We can also generate a summary of our model using model.summary():
- Params: Measures the number of trainable units in the network during learning.
- dropout: Refers to dropout layers. When data is fed through the network it automatically stops certain weights and nodes. This helps prevent overfitting on the data as well as using up unnecessary weights.
- lstm: relates to the type of neural network used, in this case an LSTM.
- dense: also known as the hidden layer, helps transfer and adjust weights between layers (adjusting learning process of close prices). performs decisions.
Because an LSTM Model only takes a specified format in the form of a 3D matrix we need to reshape or training and test sets:
Using the create_training() and create_test() functions previously, we’ll assign them to two variables each, x_train and y_train as well as x_test and y_test. Lastly, reshape them into a 3D matrix using np.reshape().
To construct our model, we’ll use the summary above as the basis of our programmed LSTM model:
This portion of our code helps develop the model by initializing a neural network using Sequential(). The number of units represents the number of trainable parameters in our model. The first LSTM layer will receive an input shape to feed into the model as well as a return sequence to keep data in memory. Again, Dropout layers are added to remove the chances of overfitting our model. Finally, we’ll add a Dense layer to produce an output, in this case our predicted closing price.
Now that we have our model developed, we can begin compiling and fitting the model, or in other words train and run our model:
The code above trains our model using our training sets x_train and y_train. The “epochs” refers to the number of times our model has completely passed through our training dataset. The “validation_split” describes the portion of our training set to be used for evaluating the loss and other metrics. If you’re wondering what “loss” means, it essentially describes the error over a number of epochs.
Now that we have our model trained, we can now test this using our x_test set which is 20% of our entire close price history:
Now, remember how we scaled our data? We utilize the .inverse_transform() function which converts all these scaled values back to their original and appropriate values (in this case, the price).
Evaluation and Diagnostic Plots:
An important step in the Data Science Development Cycle is to evaluate our models to check whether they performed correctly:
These are the results:
Evaluating the performance of our model, it returns an RMSE of 24.23 which is ok without considering optimization and parameter tuning methods. What is an RMSE you may ask? It stands for the “Root-Mean-Squared Error” which measures the root of the difference between the predicted and actual prices, raised to the power of 2. RMSE is scale-dependent and there is no universal threshold, generally the rule of thumb is the lower the better.
As for the loss functions, they plateau and converge towards zero, stabilizing at some particular point with a small gap between them. This is a good sign that our model didn’t overfit nor underfit. Overfitting occurs when our validation loss plateaus, converging towards a global minimum then suddenly rising up in error. Underfitting occurs when our loss functions shows a large gap between each other, often converging towards a global minimum at a later epoch.
That was a mouthful huh? Let me summarize it into two key points:
- Overfitting means that our model has trained too well, doesn’t generalize the overall trend.
- Underfitting means that our model wasn’t exposed or trained sufficiently due to low amounts of data.
Visualize Results and Predict Next Closing Prices:
We can now visualize our forecasts of the predicted price by using an Out-of-Sample Forecast. This means that we wont be plotting the predicted against the actual prices since because … well … it’s a prediction:
As you can see, the model predicted the orange forecasts in the plot above. Pretty neat huh?
Now to test our model on predicting the next days closing price. We’ll again use Amazon’s historical share prices and attempt to predict the closing price then compare it to the actual closing price. Let’s use data from 2017 up to 2021–08–24:
Using this code, we predict that the closing price from the 24th carrying over to the 25th will be:
The code above now returns the actual closing price on the 25th of August:
Yikes! $62 off. Now you might be thinking “Isn’t this supposed to be a perfect outcome?”. Not quite, the truth is that LSTM or in general Artificial Neural Networks require a significant amount of time to develop, improve, evaluate and optimize to the best of it’s capabilities. Perhaps if we delved deeper, we could have performed optimization methods and tuning to improve our predictions, but I’m guessing you’ve got better things to do and so do I (kind of).
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot