Release Stable-Baselines3 v2.2.1: Support for options at reset, bug fixes and better error messages · DLR-RM/stable-baselines3

To upgrade:

pip install stable_baselines3 sb3_contrib --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Note

Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in GH#1751.
Please use SB3 v2.2.1 and not v2.2.0.

Breaking Changes:

Switched to ruff for sorting imports (isort is no longer needed), black and ruff version now require a minimum version
Dropped x is False in favor of not x, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)

Improved error message of the env_checker for env wrongly detected as GoalEnv (compute_reward() is defined)
Improved error message when mixing Gym API with VecEnv API (see GH#1694)
Add support for setting options at reset with VecEnv via the set_options() method. Same as seeds logic, options are reset at the end of an episode (@ReHoss)
Added rollout_buffer_class and rollout_buffer_kwargs arguments to on-policy algorithms (A2C and PPO)

Prevents using squash_output and not use_sde in ActorCritcPolicy (@PatrickHelm)
Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (@PatrickHelm)
Moves VectorizedActionNoise into _setup_learn() in OffPolicyAlgorithm (@PatrickHelm)
Prevents out of bound error on Windows if no seed is passed (@PatrickHelm)
Calls callback.update_locals() before callback.on_rollout_end() in OnPolicyAlgorithm (@PatrickHelm)
Fixed replay buffer device after loading in OffPolicyAlgorithm (@PatrickHelm)
Fixed render_mode which was not properly loaded when using VecNormalize.load()
Fixed success reward dtype in SimpleMultiObsEnv (@NixGD)
Fixed check_env for Sequence observation space (@corentinlger)
Prevents instantiating BitFlippingEnv with conflicting observation spaces (@kylesayrs)
Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically,
the behavior stay the same for tempfiles (they need to be closed manually),
the behavior is now consistent when loading/saving replay buffer

Removed gym dependency, the package is still required for some pretrained agents.
Added --eval-env-kwargs to train.py (@Quentin18)
Added ppo_lstm to hyperparams_opt.py (@technocrat13)
Upgraded to pybullet_envs_gymnasium>=0.4.0
Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
Updated docker image, removed support for X server
Replaced deprecated optuna.suggest_uniform(...) by optuna.suggest_float(..., low=..., high=...)

Fixed stable_baselines3/common/callbacks.py type hints
Fixed stable_baselines3/common/utils.py type hints
Fixed stable_baselines3/common/vec_envs/vec_transpose.py type hints
Fixed stable_baselines3/common/vec_env/vec_video_recorder.py type hints
Fixed stable_baselines3/common/save_util.py type hints
Updated docker images to Ubuntu Jammy using micromamba 1.5
Fixed stable_baselines3/common/buffers.py type hints
Fixed stable_baselines3/her/her_replay_buffer.py type hints
Buffers do no call an additional .copy() when storing new transitions
Fixed ActorCriticPolicy.extract_features() signature by adding an optional features_extractor argument
Update dependencies (accept newer Shimmy/Sphinx version and remove sphinx_autodoc_typehints)
Fixed stable_baselines3/common/off_policy_algorithm.py type hints
Fixed stable_baselines3/common/distributions.py type hints
Fixed stable_baselines3/common/vec_env/vec_normalize.py type hints
Fixed stable_baselines3/common/vec_env/__init__.py type hints
Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
Fixed stable_baselines3/common/policies.py type hints
Switched to mypy only for checking types
Added tests to check consistency when saving/loading files

Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
Fixed various typos and grammar mistakes