Skip to content

Stable-Baselines3 v2.2.1: Support for options at reset, bug fixes and better error messages

Compare
Choose a tag to compare
@araffin araffin released this 17 Nov 23:35
· 35 commits to master since this release
e3dea4b

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Note

Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in GH#1751.
Please use SB3 v2.2.1 and not v2.2.0.

Breaking Changes:

  • Switched to ruff for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
  • Dropped x is False in favor of not x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)

New Features:

  • Improved error message of the env_checker for env wrongly detected as GoalEnv (compute_reward() is defined)
  • Improved error message when mixing Gym API with VecEnv API (see GH#1694)
  • Add support for setting options at reset with VecEnv via the set_options() method. Same as seeds logic, options are reset at the end of an episode (@ReHoss)
  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to on-policy algorithms (A2C and PPO)

Bug Fixes:

  • Prevents using squash_output and not use_sde in ActorCritcPolicy (@PatrickHelm)
  • Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (@PatrickHelm)
  • Moves VectorizedActionNoise into _setup_learn() in OffPolicyAlgorithm (@PatrickHelm)
  • Prevents out of bound error on Windows if no seed is passed (@PatrickHelm)
  • Calls callback.update_locals() before callback.on_rollout_end() in OnPolicyAlgorithm (@PatrickHelm)
  • Fixed replay buffer device after loading in OffPolicyAlgorithm (@PatrickHelm)
  • Fixed render_mode which was not properly loaded when using VecNormalize.load()
  • Fixed success reward dtype in SimpleMultiObsEnv (@NixGD)
  • Fixed check_env for Sequence observation space (@corentinlger)
  • Prevents instantiating BitFlippingEnv with conflicting observation spaces (@kylesayrs)
  • Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically,
    the behavior stay the same for tempfiles (they need to be closed manually),
    the behavior is now consistent when loading/saving replay buffer

SB3-Contrib

  • Added set_options for AsyncEval
  • Added rollout_buffer_class and rollout_buffer_kwargs arguments to TRPO

RL Zoo

  • Removed gym dependency, the package is still required for some pretrained agents.
  • Added --eval-env-kwargs to train.py (@Quentin18)
  • Added ppo_lstm to hyperparams_opt.py (@technocrat13)
  • Upgraded to pybullet_envs_gymnasium>=0.4.0
  • Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
  • Updated docker image, removed support for X server
  • Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)

SBX (SB3 + Jax)

  • Added DDPG and TD3 algorithms

Others:

  • Fixed stable_baselines3/common/callbacks.py type hints
  • Fixed stable_baselines3/common/utils.py type hints
  • Fixed stable_baselines3/common/vec_envs/vec_transpose.py type hints
  • Fixed stable_baselines3/common/vec_env/vec_video_recorder.py type hints
  • Fixed stable_baselines3/common/save_util.py type hints
  • Updated docker images to Ubuntu Jammy using micromamba 1.5
  • Fixed stable_baselines3/common/buffers.py type hints
  • Fixed stable_baselines3/her/her_replay_buffer.py type hints
  • Buffers do no call an additional .copy() when storing new transitions
  • Fixed ActorCriticPolicy.extract_features() signature by adding an optional features_extractor argument
  • Update dependencies (accept newer Shimmy/Sphinx version and remove sphinx_autodoc_typehints)
  • Fixed stable_baselines3/common/off_policy_algorithm.py type hints
  • Fixed stable_baselines3/common/distributions.py type hints
  • Fixed stable_baselines3/common/vec_env/vec_normalize.py type hints
  • Fixed stable_baselines3/common/vec_env/__init__.py type hints
  • Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
  • Fixed stable_baselines3/common/policies.py type hints
  • Switched to mypy only for checking types
  • Added tests to check consistency when saving/loading files

Documentation:

  • Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
  • Fixed various typos and grammar mistakes

Full changelog: v2.1.0...v2.2.1