Original Source Here
Now that we have cleaned/trimmed our variables and added our calculated fields, we will need two data structuring/processing steps before implementing our machine learning algorithm.
We first divide our dataset between a training and testing set, then set up the target values for the training dataset.
Here is an example of how to structure the data:
Everything is explained beautifully within this article about the importance of this step for an AI and ML model:
Now that we have understood the importance and the difference between training and a test dataset, we will create a split function to replicate this structure.
We will be using 80% of the data to train and the rest 20% to test. We will create a split parameter that will divide the data frame in an 80–20 ratio.
The proportion to be divided is entirely up to you and the task you face. It is not essential that 80% of the data be for training and the rest for testing.
For instance, if you have a dataset with over 10,000 rows, using a split of 95–5 ratio would be better.
This can be changed as per your choice, but it is advisable to give at least 70% data as train data for good results. But in this case, I would recommend 80% because we will have enough data to train the model and enough data to validate the test.
So, let’s create a “split” function that will catch 80% of the dataset.
For now, this variable is just getting the value of 80% of the total number of rows as an integer. We will use it in the future to determine the last row to take into consideration.
Now it is time to define our output signal (also called target).
Define output signal
The output signal will be based on the percentage return and split into three categories:
- Bear period: Output signal = -1
- Range period: Output signal = 0
- Bull period: Output signal = 1
So let’s execute our lines of code:
And here we go, we assign our output value to the column created called signal.
And we will end this first step by creating the features and values.
Drop unnecessary variables
We will drop the columns `Close`, `Signal`, `Time`, `High`, `Low`, `Volume`, and `Ret` since the algorithm will not be trained on these features. Next, we assign `Signal` to `y`, which is the output variable you will predict using test data.
At this moment, we have split our dataset into two distinct parts. The first one will contain the trend of the share and the final output. And a second one where we will evaluate our model.
Now the dataset is ready to propel and start our live trading.
Now that we have our training set, it is time to find our best parameters. In the next part, we are going to step up again in terms of algorithm deployment.
How to create machine learning from A to Z will be discussed in the third and last parts. The third part will be published on Wednesday, 22nd September 2021. So, think to subscribe on Medium or Youtube to get the update.
I love you all. I hope you like this second part; the third part will be even more exciting.
In the following article …
In the third part, we will develop our machine learning and run our AI robot for trading!
Below, you can get a sample of the final results:
You can access the full video tutorial below, with further explanations if you want to develop it at home:
Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot