Skip to content

An asynchronous implementation of AlphaZero, a self-play reinforcement learning algorithm.

Notifications You must be signed in to change notification settings

timvvvht/AlphaZero-Connect4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AlphaZero in Connect 4

An asynchronous implementation of the AlphaZero algorithm based on the AlphaZero paper.

AlphaZero is an algorithm that trains a reinforcement learning agent through self-play. The training examples are states of games, while the 'ground truth' labels are value of a state and policy (probability distribution of actions) of a state.

AlphaZero uses a modified version of the Monte Carlo Tree Search (MCTS) which uses the trained network to predict values of states rather than performing rollouts upon traversing to a leaf node.

Training

Training was done with a multiprocessing, asynchronous approach demonstrated here.

The agent was trained for 1 week, and was able to defeat the one-step-look-ahead agent consistently very quickly (at around 3000 epochs).

I then tested the agent against myself. While it was difficult to beat, it is not unbeatable, and as Connect4 is a solved game, this agent should theoretically be able to converge to an optimal policy. I then increased the memory buffer size and started training on it again. Future updates will be reported.

Codebase

The AlphaZero folder contains all of the backend code for this implementation.

The training configuration, ResNet built using tensorflow 2, memory object and game object can be found here.

MCTS related functions can be found here.

The Pit object for evaluating the agent against a one-step-look-ahead agent can be found here.

Gif

Releases

No releases published

Packages

No packages published

Languages