[Question] The performance for Hopper-v3 doesn't get converged for PPO #376

cx441000319 · 2023-04-30T16:21:24Z

❓ Question

Hi there,

I ran into such an issue when I trained an agent using PPO in Hopper-v3. Here is the performance for 5 seeds by running: python3 scripts/all_plots.py -a ppo --env Hopper-v3 -f logs/downloaded

The commands are like: python train.py --algo ppo --env Hopper-v3 --seed 500X;

The seeds are from 5000 to 5004 and the default hyper-parameters are used. It always converges to 1K quickly, dramatically decreases to under 100, and then converges to 1K, ... .

I only encountered such an issue for Hopper-v3 (A2C suffers as well). It works well for other environments.

Is there anything I did wrong? Any help is appreciated!

Checklist

I have checked that there is no similar issue in the repo
I have read the SB3 documentation
I have read the RL Zoo documentation
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

cx441000319 · 2023-04-30T16:31:02Z

Sorry, what more information is needed?

qgallouedec · 2023-04-30T18:03:46Z

This is a fairly common result of ppo, here is one thread among many that discusses it:

https://www.reddit.com/r/reinforcementlearning/comments/bqh01v/having_trouble_with_ppo_rewards_crashing/

You can try to decrease the cliping parameter and early stopping the experiment.

cx441000319 · 2023-04-30T19:57:04Z

This is a fairly common result of ppo, here is one thread among many that discusses it:

https://www.reddit.com/r/reinforcementlearning/comments/bqh01v/having_trouble_with_ppo_rewards_crashing/

You can try to decrease the cliping parameter and early stopping the experiment.

Thank you for your suggestions. I will try them.

Based on the experiment results I got, there are another two spot points:

PPO works well for Walker2d-v3 and HalfCheetah-v3 (by working well I mean I can generate comparable performance as the benchmark with the default hyper-parameters)
A2C also fails in Hopper-v3 as below:

In this case, can I regard it as an issue in dealing with Hopper-v3 instead of an issue in PPO?

araffin · 2023-05-01T09:25:12Z

Sorry, what more information is needed?

The hyperparameters used and your system/lib information (os, gym version, mujoco version, sb3 version, ...)

cx441000319 · 2023-05-05T20:12:42Z

Sorry for my late reply.

Hyperparameters:

Hopper-v3:
normalize: "dict(norm_obs=True, norm_reward=False)"
n_envs: 1
policy: 'MlpPolicy'
n_timesteps: !!float 1e6
batch_size: 32
n_steps: 512
gamma: 0.999
learning_rate: 9.80828e-05
ent_coef: 0.00229519
clip_range: 0.2
n_epochs: 5
gae_lambda: 0.99
max_grad_norm: 0.7
vf_coef: 0.835671
policy_kwargs: "dict(
log_std_init=-2,
ortho_init=False,
activation_fn=nn.ReLU,
net_arch=dict(pi=[256, 256], vf=[256, 256])
)"

os: Ubuntu 20.04 LTS
gym: 0.26.2
mujoco_py: 2.1.2.14
sb3: 2.0.0a5

cx441000319 added the question Further information is requested label Apr 30, 2023

araffin added the more information needed Please fill the issue template completely label Apr 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] The performance for Hopper-v3 doesn't get converged for PPO #376

[Question] The performance for Hopper-v3 doesn't get converged for PPO #376

cx441000319 commented Apr 30, 2023

cx441000319 commented Apr 30, 2023

qgallouedec commented Apr 30, 2023 •

edited

Loading

cx441000319 commented Apr 30, 2023 •

edited

Loading

araffin commented May 1, 2023

cx441000319 commented May 5, 2023 •

edited

Loading

[Question] The performance for Hopper-v3 doesn't get converged for PPO #376

[Question] The performance for Hopper-v3 doesn't get converged for PPO #376

Comments

cx441000319 commented Apr 30, 2023

❓ Question

Checklist

cx441000319 commented Apr 30, 2023

qgallouedec commented Apr 30, 2023 • edited Loading

cx441000319 commented Apr 30, 2023 • edited Loading

araffin commented May 1, 2023

cx441000319 commented May 5, 2023 • edited Loading

qgallouedec commented Apr 30, 2023 •

edited

Loading

cx441000319 commented Apr 30, 2023 •

edited

Loading

cx441000319 commented May 5, 2023 •

edited

Loading