PPO-Algorithm

I implemented three versions of the PPO-Algorithm as proposed in John Schulman et al. 'Proximal policy optimization algorithms' (https://arxiv.org/abs/1707.06347).

PPO without clipping or penalty
color: red
PPO with clipped objective
color: orange
PPO with adaptive Kullback-Leibler penalty
color: blue

We test these three versions on the 'CartPole-v1' environment.

We see that the PPO with adpative KL-penalty outperforms the other two algorithms in this example. However, the second plot shows that this alogrithm takes the longest on the other hand , but still outperforms on a relative basis.
PPO with adpative KL-Divergence outperforms also while testing.

Note that the first two plots are smoothed.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
figures		figures
.gitignore		.gitignore
Agent.py		Agent.py
README.md		README.md
config.yaml		config.yaml
main.py		main.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO-Algorithm

Reward per episode:

Relative reward to the time:

Reward per test episode:

About

Languages

alexanderbaumann99/PPO-Algorithms

Folders and files

Latest commit

History

Repository files navigation

PPO-Algorithm

Reward per episode:

Relative reward to the time:

Reward per test episode:

About

Topics

Resources

Stars

Watchers

Forks

Languages