-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SB2 vs SB3 - Performance difference #1124
Comments
Hello,
there might be a performance drop because pytorch uses eager evaluation but probably not that much.
are you sure it is due to |
I already use SubprocVecEnv, I edited my post to add the code that setups the env
Good point, might explain at least part of it
I tried to set OMP_NUM_THREADS to lower values but it doesn't make much of a difference since I use the GPU. It only makes a difference if I force pytorch to use the CPU, as expected. |
Related: #90 and #122 (comment) (there I provide colab notebook to compare)
Did you double check that the hyperparameters were equivalents? EDIT: the 1.4x difference seems to match the results I got with colab notebooks |
The only parameter I am not sure about is batch_size, I experimented with different values (256, 512, 1024) and the performance is still lower
|
See conversion for batch size: https://stable-baselines3.readthedocs.io/en/master/guide/migration.html#ppo
yes... |
Reading https://pytorch.org/blog/accelerating-pytorch-vision-models-with-channels-last-on-cpu/, we should probably try channel last memory format, it is just few lines of code to change (https://pytorch.org/blog/tensor-memory-format-matters/) and the shape of the tensors are the same. |
@MatPoliquin all you need to do is apparently: x = x.to(memory_format=torch.channels_last)
model = model.to(memory_format=torch.channels_last) I would be happy to receive your feedback if you give it a try ;) |
So these changes should be made in on_policy_algorithm.py? I modified the code below but not quite sure if it's correct line 102:
|
You are using the experimental branch right?
yes and you need to modify the rollout buffer. I did some quick tests with the RL Zoo (default to 8 envs), what I can recommend:
For instance, with the default command, I get around 800FPS: With subprocess, I get 1100 FPS: You could also try to add support for CNN in the experimental SBX; araffin/sbx#6 and araffin/sbx#4 (SBX PPO is ~2x faster than SB3 PPO but it has less features) |
So, I tested that but didn't help much. What gave me 8% speed boost was to set
|
Small update for that, I have now an experimental SB3 + Jax = SBX version here: https://github.com/araffin/sbx With the proper hyperparameter, SAC can run 20x faster than its PyTorch equivalent =): https://twitter.com/araffin2/status/1590714601754497024 |
❓ Question
EDIT: After doing some more digging I updated the post title and added more details with a newer version of SB3 (1.6.2)
I am using OpenAI gym-retro env to train on games and migrated from SB2 to SB3 1.6.2. I noticed the training FPS reduced by a lot from 1300fps to 900fps.
Using Nvidia Nsight I profiled both versions (you can find the reports in the link to google drive below, you need Nsight to view it):
https://drive.google.com/drive/folders/1Lqxf-qKXTj__Hp8WUXgNHejZaJGy8oct?usp=sharing
Here are the parameters I use for PPO with SB3 (with SB1 I just use the default parameters provided by SB):
PPO(policy=args.nn, env=env, verbose=1, n_steps = 128, n_epochs = 4, batch_size = 256, learning_rate = 2.5e-4, clip_range = 0.2, vf_coef = 0.5, ent_coef = 0.01, max_grad_norm=0.5, clip_range_vf=None)
My specs:
Code I use to wrap the retro env (same for both SB2 and SB3 cases):
Checklist
The text was updated successfully, but these errors were encountered: