benchmark.md agents hyperparameters #38

Scitator · 2019-09-09T13:59:35Z

Hi,

Thanks for amazing lib, open-source RL benchmark is really valuable nowadays.
Nevertheless, I am wondering, where I can find hyperparameters used for benchmarked agents? Like network architecture, optimizator parameters and other important RL stuff ;)

araffin · 2019-09-09T14:20:57Z

Hello,

For each trained agent, you have a config.yml file that contains the hyperparameters (if not specified, then the default ones of stable-baselines were used)

Ex for TD3 on HalfCheetathBulletEnv-v0

Note: this was not present in the early versions of the rl zoo, where you need to look at the yaml files in that case.

Ex for A2C on atari games

Please note that this is not a proper benchmark, in the sense that the reported values correspond to only one seed. This more made to check algorithm (maximal) performance, find potential bugs and also people to have pretrained agents available.

Scitator · 2019-09-09T15:27:20Z

Okay, I see.
Then could you please share the hyperparameters for:
dqn–MsPacmanNoFrameskip-v4
dqn–EnduroNoFrameskip-v4
ddpg-BipedalWalker-v2
sac-BipedalWalker-v2
sac-BipedalWalkerHardcore-v2
?
Currently, I am benchmarking different architectures and would like to reproduce some open-source results (article benchmarks are good, but I trust more open-source solutions).

araffin · 2019-09-10T08:41:40Z

Then could you please share the hyperparameters for:

dqn–MsPacmanNoFrameskip-v4
dqn–EnduroNoFrameskip-v4

Those are present in hyperparams/dqn.yml (the atari key)

ddpg-BipedalWalker-v2
sac-BipedalWalker-v2
sac-BipedalWalkerHardcore-v2

There are config files for each one of those in the corresponding folder.

Note: SACCustomPolicy corresponds to the policy described in the original paper ([256, 256] with ReLU)

Scitator · 2019-09-10T10:52:56Z

So, could you please confirm, that I got all hyperparameters right?

atari-dqn (MsPacmanNoFrameskip and EnduroNoFrameskip)

- nature cnn extractor ([32, 64, 64], relu)
- Adam optimizer with 1e-4 learning rate
- initial buffer size - 10k observations
- buffer size - 10k observations
- batch size - 32 observations
- hard target net update each 1k batches
- exploration: e-greedy from 1.0 to 0.01 for 10% of total number of steps in the environment

ddpg (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

- mlp with [64, 64] hiddens and relu
- Adam optimizer with 1e-4 learning rate for actor and 1e-4 for critic
- initial buffer size - ?
- buffer size - 10k observations
- batch size - 256 observations
- soft target update each batch with tau=0.001
- exploration: adaptive parameter noise with target std=0.287

sac (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

- mlp with [256, 256] hiddens and relu
- Adam optimizer with 3e-4 learning rate for both actor and critic
- initial buffer size - 1000 observations
- buffer size - 10k observations
- batch size - 64 observations
- soft target update each batch with tau=0.005
- exploration: ?

overall

- for benchmarking purposes 150k steps were taken in the environment 
(with used frame skip = 4, it means 600k steps in environment)
- all benchmarks were done with n-step=1 q-learning
- and with single thread run

Thanks!

araffin · 2019-09-10T12:40:50Z

for benchmarking purposes 150k steps were taken in the environment

The benchmark is done only at the end of training. The number of training timesteps is also in the config file, for atari, it is the standard 10M steps (so 40M steps in the real env because of the frame skip), for the others, check the config files.

all benchmarks were done with n-step=1 q-learning

yes

and with single thread run

yes

atari-dqn (MsPacmanNoFrameskip and EnduroNoFrameskip)

Looks good, note that this is a Prioritized Double dueling dqn.

ddpg (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

does not look the ones found in https://github.com/araffin/rl-baselines-zoo/blob/master/trained_agents/ddpg/BipedalWalker-v2/config.yml

Yes, you will need several seeds to have a good one with DDPG. Also, it did not manage to make it work with the HardCore version yet.

sac (BipedalWalker-v2 and BipedalWalkerHardcore-v2)

for SAC, the learning rate for linearly annealed (it helps to avoid catastrophic drop in performance)

buffer size - 10k observations

it is 10e6 in the config file...
https://github.com/araffin/rl-baselines-zoo/blob/master/trained_agents/sac/BipedalWalker-v2/config.yml

exploration: ?

It is done by SAC automatically using the stochastic policy.

Scitator · 2019-09-10T12:50:45Z

Thanks for reply, now it looks much more realistic :) .
Nevertheless, what does n_timesteps in benchmark.md table mean?
number of steps during evaluation?

araffin · 2019-09-10T12:54:16Z

number of steps during evaluation?

yes, I could fix either the number of episodes or the number of steps, I chose the latter.

araffin added the question Further information is requested label Sep 9, 2019

araffin closed this as completed Sep 21, 2019

araffin mentioned this issue Feb 20, 2020

enjoy.py throws error about observation space with HalfCheetahBulletEnv-v0 and own saved model #64

Closed

araffin mentioned this issue Mar 19, 2020

How many training steps used to obtain the pre-trained model? #70

Closed

yycho0108 pushed a commit to yycho0108/rl-baselines-zoo that referenced this issue Feb 2, 2021

Add support for RMSpropTFLike (araffin#38)

c9d3081

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark.md agents hyperparameters #38

benchmark.md agents hyperparameters #38

Scitator commented Sep 9, 2019

araffin commented Sep 9, 2019

Scitator commented Sep 9, 2019

araffin commented Sep 10, 2019

Scitator commented Sep 10, 2019 •

edited

Loading

araffin commented Sep 10, 2019 •

edited

Loading

Scitator commented Sep 10, 2019

araffin commented Sep 10, 2019

benchmark.md agents hyperparameters #38

benchmark.md agents hyperparameters #38

Comments

Scitator commented Sep 9, 2019

araffin commented Sep 9, 2019

Scitator commented Sep 9, 2019

araffin commented Sep 10, 2019

Scitator commented Sep 10, 2019 • edited Loading

araffin commented Sep 10, 2019 • edited Loading

Scitator commented Sep 10, 2019

araffin commented Sep 10, 2019

Scitator commented Sep 10, 2019 •

edited

Loading

araffin commented Sep 10, 2019 •

edited

Loading