Q-Learning with a Flappy Bird simulator.
FlappyAI is an AI that uses simple Q-Learning trained on a custom Flappy Bird simulator. It is mainly created for learning purposes. If you are looking for Q-Learning libraries that are more efficient and does not mind the steep learning curve, try TensorFlow instead.
FlappyAI contains these main parts:
FlappyGame
- FlappyBird simulatorTrainer
- Q-Learning agentModelInterface
- Interface for the model used by the agentmain.py
- Part where everything connects together
FlappyAI is written in Python. The required libraries include:
pygame
- Required for the graphics- Standard Python2 libraries
Simply create a new interface and inherits from ModelInterface
for your simulator.
from qlearning import Trainer, ModelInterface
class MyInterface(ModelInterface):
# interacts with your simulator
agent = Trainer(MyInterface()) # creates a new agent with your interface
agent.train() # starts the training
Type
python main.py -h
for help. FlappyAI has three modes:
- interactive - No AI involved; Only human input
- train - Start the training process.
- test - Let AI plays the game
FlappyAI automatically looks for qtable.p
in the current directory. It will create a new one if it cannot find one. This file is used to store the trained Q-table (KNAWLEDGE!). The Q-table file included in the repository is already trained, but if you want to retrain the agent, simply remove or rename the file.
During the training mode, press Ctrl+C to stop and store the Q-table to the file.
Assuming the game size is 640x480. Parameters considered by the AI are the following:
- Horizontal distance between bird and nearest pipe divided by 10 (i.e. 0 to 300/10 and 300+)
- Vertical distance between bird and nearest pipe (-480/10 to 480/10)
- Bird velocity (-10/5 to 20/5)
Resulting in a total of 43456 states.
The 2D space is discretized into 10x10 tiles, therefore the framerate of the AI must be reduced by a factor of 10 (i.e. game is running in 60fps and the agent will be running in 6fps.)
Q-Learning is a simple reinforcement learning algorithm that has three parameters:
- α - Learning rate
- γ - Discount factor
- ε - Exploration probability (only in ε-greddy Q-Learning)
Definitions:
- Let S be the set of all states and A be the set of all actions.
- We also define Q: S x A -> R, where R is the set of all real numbers.
- Then the algorithm is as follow:
- Q(s_t, a_t) <- Q(s_t, a_t) + α (r_t + γ maxOverAllActions(Q(s_(t+1), a)) - Q(s_t, a_t))
- Continues until it converges (or converges within epsilon.)