Skip to content

Experiments of the three PPO-Algorithms (PPO, clipped PPO, PPO with KL-penalty) proposed by John Schulman et al. on the 'Cartpole-v1' environment.

Notifications You must be signed in to change notification settings

alexanderbaumann99/PPO-Algorithms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

75 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPO-Algorithm

I implemented three versions of the PPO-Algorithm as proposed in John Schulman et al. 'Proximal policy optimization algorithms' (https://arxiv.org/abs/1707.06347).

  • PPO without clipping or penalty
    color: red
  • PPO with clipped objective
    color: orange
  • PPO with adaptive Kullback-Leibler penalty
    color: blue

We test these three versions on the 'CartPole-v1' environment.

We see that the PPO with adpative KL-penalty outperforms the other two algorithms in this example. However, the second plot shows that this alogrithm takes the longest on the other hand , but still outperforms on a relative basis.
PPO with adpative KL-Divergence outperforms also while testing.

Note that the first two plots are smoothed.

Reward per episode:

alt text

Relative reward to the time:

alt text

Reward per test episode:

alt text

About

Experiments of the three PPO-Algorithms (PPO, clipped PPO, PPO with KL-penalty) proposed by John Schulman et al. on the 'Cartpole-v1' environment.

Topics

Resources

Stars

Watchers

Forks

Languages