tensorflow-policy-gradient

Still under construction...

Dependencies

Python 2.7
TensorFlow >= 0.8.0
NumPy >= 1.10.0
openai gym
matplotlib

Quick try

Run

python gym_experiment.py

to train a softmax policy (without bias) using vanilla policy gradient on CartPole task. You can see that the return is stochastically increasing until it reaches the maximum (200).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

tensorflow-policy-gradient

Dependencies

Quick try

Files

README.md

Latest commit

History

README.md

File metadata and controls

tensorflow-policy-gradient

Dependencies

Quick try