Skip to content

sbeignez/AI-Snake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Snake Competition

snake


1. Presentation

1.1. Problem definition

1.1.1. Origin

Nokia original game:

1.1.2. Rules and Objective

  • Max: Score/Max_Score
    • Eat a maximum number of apples
    • Until end/win the game (max apples = rows * cols - snake.init_len)
  • Min: Step/Score
    • With the minimum number of time-steps

1.1.3. The problem as a graph

Shortest distance in a time dependant graph.

Partly Stochastic: next objective location is unkonwn and random.

1.1.4. The problem as a Markov Decision MDP

States space:
Actions space:
Transitions: Deterministic (vs. Probabilistic). P(s, a, Succ(s,a)) = 1
Rewards:

1.2 The Snake Competition

1.2.1. _

todo

1.2.2. Game variations

The game can be modified for the competition:

Parameters Nokia Snake Competition
Board size _ rows x 15 cols 30 rows x 30 cols
Snake Starting length 3 body parts 1 body part
Snake Starting position Top Center Random
Snake Starting direction Top-Down n/a

1.2.3. Performance indicators

  1. The Game Completion Score (GCS)
    • Avg. score over the maximum score (Score = number of Apples eaten)
    • Obj: Maximise (up to 100%)
  2. The Game Over rate (GOR)
    • Number of game_over / total_games played
    • Objective: Minimize
  3. The Performance rate (PR)
    • Avg. ( number of Steps / score )
    • Obj: Minimise
    • Max: n^2
  4. The Performance Rate at 0% and at 1% (PR0 and PR1)
    • The Performance rate with 0% Game-over rate (all win)
    • The Performance rate with a Game-over rate < 1%

2. Possible AI appoaches

  • A. Operational Research
    • A1. Search
      Pathfinding algorithms, such as: Dijkstra, A*..
      To research: Pathfinding in time-dependent graphs
    • A2. Optimization
      Method: Linear Programming
    • A3. Genetic algorithm
      NEAT algo with NN
  • B. Machine Learning
    • B1. Supervised Learning Deep Learning
      Methods: Neural Network, Deep (DNN), Convo (CNN)
    • B2. Reinforcement Learning (RL)
      Q-value iteration
      Deep Q-Netwrok (DQN)
      Further: Advantage Actor-Critic (A2C), Proximal Policy Optimization (PPO), Monte Carlo Tree Search

3. Current Development

Game features:

  • General
    • Basic game logic
    • Human agent (Keyboard)
    • Create GIU using pygame
    • Refactor code using OOP
    • Separate Engine and Agent logic
    • Refactor Session/Engine class for OpenAI Gym interface Env()
    • Refactor Agent
  • History
    • Record Episodes
  • Replay and Tests
    • Save env states and solutions
    • [ ]
  • Benchmarking
    • [ ]

A1. Pathfinding Algorithms:

  • Add Greedy algo / agent
  • Add A* algo / agent
  • Add Modes: Play and Benchmark
  • Logging history
  • Create snapshot of game configuration/situation/..
  • Replay of snapshots
  • Rewind steps (manual)

A3. Evolution/Genetic algorithm selecting NN:

  • [ ]

B1. Supervised Learning: Deep Learning:

  • Generating Training data

B2. Reinforcement Learning:

  1. Q-Learning
  • [ ]
  1. Deep Q-Learning
  • [ ]

GitHub stars GitHub stars

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published