Building a Chess AI that Learns from Experience

Original Source Here

Building a Chess AI that Learns from Experience

Photo by Rahul Pabolu on Unsplash

Applying machine learning techniques in rule-defined environments like chess has been the stable for mastering strategy games like chess or go. These techniques usually involve using some sort of data to train the model to make the best moves.

As a challenge, I would like to try creating an engine that does not require any data at all. Even though it would slow down the training process of the model, it would solve a critical problem that data-based models face.

A model is only as good as its data. In this context, a chess model trained on data can only play as well as the source of the data. By removing data from the equation, the theoretical ceiling of the algorithm would be removed, which would give the model (theoretically) infinite potential.


How do I propose to do this? This would be done by linking two techniques that I have used before. The model will consist of two parts: An overarching genetic algorithm and a Monte-Carlo search tree for each agent in the genetic algorithm.

Genetic Algorithm:

Photo by National Cancer Institute on Unsplash

The genetic algorithm would consist of a defined number of agents. The agents would play against each other, and the winner would gain a certain number of points.

The top 20% of players would be retained, while the data of the rest of the players would be wiped. A crossover of the weights between the top 20% of players would happen, resulting in a fresh set of players. This should hopefully result in better and better players.

The reason why a genetic algorithm is used for this project is twofold: the genetic algorithm can be optimized without data and also has a very flexible fitness function. This fitness function can be changed to fine-tune the results of the model and it can also involve abstract non-numerical operations.

Monte-Carlo Search Tree:

Photo by Richard Loader on Unsplash

The Monte-Carlo search tree is used to make each move: The engine randomly plays a set of moves from a given position. The final position is then evaluated by a neural network, that returns a single value. This process is repeated a certain number of times. Aggregating all the evaluations for each of the iterations will result in an analysis of the position. This process is repeated for each of the legal moves from the original position.

The move that has the highest evaluation score would then be played. Theoretically, the moves would be optimal, as the evaluation model improves.

There is a tradeoff that is made when using a Monte-Carlo tree, a tradeoff between flexibility and speed. Theoretically, it would be possible to use a single deep neural network that could evaluate the board directly. However, this model would be much more difficult to optimize, as it would need to “learn” many concepts on its own, which is especially slow for genetic algorithms.

The Monte-Carlo search tree is a neural network with a built-in concept of depth, allowing for training to be faster.


The code that I provide are only snippets and are bound to create errors. The full code for this project can be found on this link on my Github.

1. Fitness Function:

This was the fitness function used to evaluate the fitness of each agent.

Initially, I wanted each agent to play every other agent but found that this would just take too long. Some quick calculations made me conclude that with that setup I would need over 400 hours to get decent results.

A key change to the original algorithm is in the evaluation function. Model(input) is 106x faster than model.predict(input) and saves over 9 hours per generation.

By getting each agent to only play one other agent, it would reduce the time taken for each generation (p+1)/2 times (p being the number of players).

For the sake of simplicity, I just multiplied the agent’s current fitness by a certain value to change their fitness. It would be interesting and possibly produce better results if I decided to make an Elo system. However, I decided against it as implementing it would slow down the algorithm.

2. Evaluation Network

The evaluation network is the brain behind every agent. The code on the left features two model architectures:

The complex_eval defines a complex convolutional model, commonly used in the encoding section of a Pix2Pix GAN. However, the complexity makes it difficult to deploy, as it takes too much time to make predictions.

The simple_eval is a basic proof of concept model that takes about 0.004 seconds to make a prediction. The low number of weights would also result in faster convergence.

The input_shape of the models are (8,8,12) as the chess board is 8 by 8, and there are 12 possible pieces (6 pieces for each side)

3. Genetic Algorithm

The brain of the genetic algorithm is the fitness function and the heart is the crossover and mutation function.

The crossover function is easy to understand but a little bit more difficult to implement.

Two random parent agents, from the top 20% of the population are chosen. The genes of these two parent agents are merged to form a child agent. Here is a step-by-step explanation of

  1. Their weights are flattened so that values can be changed.
  2. A random intersection point is found. This point is where the genetic information of one parent ends, and where the genetic information of one parent begins.
  3. The genes of the parents are joined and a new child agent then holds the weight created by this operation.

This hopefully allows for the good traits of good parents to be passed on to their offspring.

The mutation process is quite similar: The weights of a random agent are flattened, and one of their weights is changed to a random value. Mutations in genetic algorithms are implemented so that favorable traits that were not randomly generated in the first population will eventually form.


This project has been really interesting: the cost for no theoretical limit of the algorithm is paid by the time taken to train the algorithm successfully.

The implementation of my idea is by no means perfect, and is just a framework for me to work on in the future.

Here is a list of things that can be done to improve/change it:

1. Change the fitness function:

Even though a Monte-Carlo search tree is the fastest movement, the most realistic search algorithm is finding which move ends up with a line that has the lowest average evaluation for the opponent.

2. Change the evaluation model:

Although it seems easy, it must still be fast enough to allow the execution times to exist in the realms of possibility

3. Use parallelism:

For a strong enough CPU, parallelism (running scripts simultaneously) could speed up the process by a large factor. If this problem is solved, the limitations on the project would be removed, allowing for much better results.


Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot

%d bloggers like this: