Stable-Baselines3 v1.8.0: Multi-env HerReplayBuffer, Open RL Benchmark, Improved env checker
Warning
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
Starting with v2.0.0, Gymnasium will be the default backend (though SB3 will have compatibility layers for Gym envs).
You can find a migration guide here.
If you want to try the SB3 v2.0 alpha version, you can take a look at PR #1327.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Removed shared layers in
mlp_extractor
(@AlexPasqua) - Refactored
StackedObservations
(it now handles dict obs,StackedDictObservations
was removed) - You must now explicitely pass a
features_extractor
parameter when callingextract_features()
- Dropped offline sampling for
HerReplayBuffer
- As
HerReplayBuffer
was refactored to support multiprocessing, previous replay buffer are incompatible with this new version HerReplayBuffer
doesn't require amax_episode_length
anymore
New Features:
- Added
repeat_action_probability
argument inAtariWrapper
. - Only use
NoopResetEnv
andMaxAndSkipEnv
when needed inAtariWrapper
- Added support for dict/tuple observations spaces for
VecCheckNan
, the check is now active in theenv_checker()
(@DavyMorgan) - Added multiprocessing support for
HerReplayBuffer
HerReplayBuffer
now supports all datatypes supported byReplayBuffer
- Provide more helpful failure messages when validating the
observation_space
of custom gym environments usingcheck_env
(@FieteO) - Added
stats_window_size
argument to control smoothing in rollout logging (@jonasreiher)
SB3-Contrib
- Added warning about potential crashes caused by
check_env
in theMaskablePPO
docs (@AlexPasqua) - Fixed
sb3_contrib/qrdqn/*.py
type hints - Removed shared layers in
mlp_extractor
(@AlexPasqua)
RL Zoo
- Open RL Benchmark
- Upgraded to new HerReplayBuffer implementation that supports multiple envs
- Removed TimeFeatureWrapper for Panda and Fetch envs, as the new replay buffer should handle timeout.
- Tuned hyperparameters for RecurrentPPO on Swimmer
- Documentation is now built using Sphinx and hosted on read the doc
- Removed use_auth_token for push to hub util
- Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see openai/gym#1304)
- Fixed gym-minigrid policy (from MlpPolicy to MultiInputPolicy)
- Replaced deprecated optuna.suggest_loguniform(...) by optuna.suggest_float(..., log=True)
- Switched to ruff and pyproject.toml
- Removed online_sampling and max_episode_length argument when using HerReplayBuffer
Bug Fixes:
- Fixed Atari wrapper that missed the reset condition (@luizapozzobon)
- Added the argument
dtype
(default tofloat32
) to the noise for consistency with gym action (@sidney-tio) - Fixed PPO train/n_updates metric not accounting for early stopping (@adamfrly)
- Fixed loading of normalized image-based environments
- Fixed
DictRolloutBuffer.add
with multidimensional action space (@younik)
Deprecations:
Others:
- Fixed
tests/test_tensorboard.py
type hint - Fixed
tests/test_vec_normalize.py
type hint - Fixed
stable_baselines3/common/monitor.py
type hint - Added tests for StackedObservations
- Removed Gitlab CI file
- Moved from
setup.cg
topyproject.toml
configuration file - Switched from
flake8
toruff
- Upgraded AutoROM to latest version
- Fixed
stable_baselines3/dqn/*.py
type hints - Added
extra_no_roms
option for package installation without Atari Roms
Documentation:
- Renamed
load_parameters
toset_parameters
(@DavyMorgan) - Clarified documentation about subproc multiprocessing for A2C (@Bonifatius94)
- Fixed typo in
A2C
docstring (@AlexPasqua) - Renamed timesteps to episodes for
log_interval
description (@theSquaredError) - Removed note about gif creation for Atari games (@harveybellini)
- Added information about default network architecture
- Update information about Gymnasium support