Skip to content

Stable-Baselines3 v1.8.0: Multi-env HerReplayBuffer, Open RL Benchmark, Improved env checker

Compare
Choose a tag to compare
@araffin araffin released this 08 Apr 16:17
· 104 commits to master since this release
84f5511

Warning

Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Breaking Changes:

  • Removed shared layers in mlp_extractor (@AlexPasqua)
  • Refactored StackedObservations (it now handles dict obs, StackedDictObservations was removed)
  • You must now explicitely pass a features_extractor parameter when calling extract_features()
  • Dropped offline sampling for HerReplayBuffer
  • As HerReplayBuffer was refactored to support multiprocessing, previous replay buffer are incompatible with this new version
  • HerReplayBuffer doesn't require a max_episode_length anymore

New Features:

  • Added repeat_action_probability argument in AtariWrapper.
  • Only use NoopResetEnv and MaxAndSkipEnv when needed in AtariWrapper
  • Added support for dict/tuple observations spaces for VecCheckNan, the check is now active in the env_checker() (@DavyMorgan)
  • Added multiprocessing support for HerReplayBuffer
  • HerReplayBuffer now supports all datatypes supported by ReplayBuffer
  • Provide more helpful failure messages when validating the observation_space of custom gym environments using check_env (@FieteO)
  • Added stats_window_size argument to control smoothing in rollout logging (@jonasreiher)

SB3-Contrib

  • Added warning about potential crashes caused by check_env in the MaskablePPO docs (@AlexPasqua)
  • Fixed sb3_contrib/qrdqn/*.py type hints
  • Removed shared layers in mlp_extractor (@AlexPasqua)

RL Zoo

  • Open RL Benchmark
  • Upgraded to new HerReplayBuffer implementation that supports multiple envs
  • Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.
  • Tuned hyperparameters for RecurrentPPO on Swimmer
  • Documentation is now built using Sphinx and hosted on read the doc
  • Removed use_auth_token for push to hub util
  • Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
  • Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)
  • Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
  • Switched to ruff and pyproject.toml
  • Removed online_sampling and max_episode_length argument when using HerReplayBuffer

Bug Fixes:

  • Fixed Atari wrapper that missed the reset condition (@luizapozzobon)
  • Added the argument dtype (default to float32) to the noise for consistency with gym action (@sidney-tio)
  • Fixed PPO train/n_updates metric not accounting for early stopping (@adamfrly)
  • Fixed loading of normalized image-based environments
  • Fixed DictRolloutBuffer.add with multidimensional action space (@younik)

Deprecations:

Others:

  • Fixed tests/test_tensorboard.py type hint
  • Fixed tests/test_vec_normalize.py type hint
  • Fixed stable_baselines3/common/monitor.py type hint
  • Added tests for StackedObservations
  • Removed Gitlab CI file
  • Moved from setup.cg to pyproject.toml configuration file
  • Switched from flake8 to ruff
  • Upgraded AutoROM to latest version
  • Fixed stable_baselines3/dqn/*.py type hints
  • Added extra_no_roms option for package installation without Atari Roms

Documentation:

  • Renamed load_parameters to set_parameters (@DavyMorgan)
  • Clarified documentation about subproc multiprocessing for A2C (@Bonifatius94)
  • Fixed typo in A2C docstring (@AlexPasqua)
  • Renamed timesteps to episodes for log_interval description (@theSquaredError)
  • Removed note about gif creation for Atari games (@harveybellini)
  • Added information about default network architecture
  • Update information about Gymnasium support