GitHub - UesugiErii/tf2-PPO-atari: Use tensorflow2 achieve PPO to play atari game

WARNING

This code can't cope environment with continuous action

How to use

train

python3 run.py

test

move pong or breakout's weight to ./logs/weight

modify index to select weight which you want to use

python3 test_and_show.py

Environment

tensorflow-gpu (2.0.0a0)
numpy (1.16.4)
gym (0.13.1)

Speed

1070ti(1607Mhz), ryzen1700(OC 3.7), DDR4 2933MHz(14-17-17-35)

Network	Env	Learning times per second	Step per second(total core)
CNN	breakout	2.186	2238
RNN	breakout	2.525	2586
CNN	pong	2.519	2579
RNN	pong	2.718	2783

Program structure

environment.py used to create a environment that let agent do something , you can also customize the environment , like process state to speed up convergence

agent.py contain a agent , as an intermediary to communicate with brain

brain.py collect data from agent , give data to model after processing like one-hot and calc adv,realv.....

model.py calculate gradient , also record experimental data at the same time

config.py contain all hyperparameters

communication.py generate a master and many children , master and children let agent can communicate with brain(exchange data)

test_and_show.py can load weight and run , you can use this to watch AI play atari game

Algorithm

Proximal Policy Optimization Algorithms

origin paper : https://arxiv.org/abs/1707.06347

There are some different between my code and paper

GAE paper : https://arxiv.org/abs/1506.02438

Why do this ?

1 . As the initial version of the graduation design , as an initial template

2 . baseline is too complicated , I can't fully understand

3 . Stable Baselines' integration is too high , can't be used to graduation design

So I wrote this project that is easy to understand

logs dir explanation

dir	env	optimizers	GAE	Network
20190904-021336	breakout	RMSprop	False	CNN
20190905-000303	breakout	Adam	False	CNN
20190913-154820	pong	Adam	False	CNN
20190914-175108	pong	Adam	True	CNN
20190918-071729	pong	Adam	True	RNN

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
logs		logs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
atari_wrappers.py		atari_wrappers.py
brain.py		brain.py
communication.py		communication.py
config.py		config.py
environment.py		environment.py
model.py		model.py
run.py		run.py
structure.PNG		structure.PNG
test_and_show.py		test_and_show.py
tfb.sh		tfb.sh
util.py		util.py
wrappers.py		wrappers.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WARNING

How to use

train

test

Environment

Speed

Program structure

Algorithm

Why do this ?

logs dir explanation

About

Releases

Packages

Languages

License

UesugiErii/tf2-PPO-atari

Folders and files

Latest commit

History

Repository files navigation

WARNING

How to use

train

test

Environment

Speed

Program structure

Algorithm

Why do this ?

logs dir explanation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages