Release Stable-Baselines3 v1.7.0 : non-shared features extractor, bug fixes and quality of life improvements · DLR-RM/stable-baselines3

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Warning
Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64]
will create separate networks with the same architecture, to be consistent with the off-policy algorithms.

Note
A2C and PPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue #1233

Breaking Changes:

Removed deprecated create_eval_env, eval_env, eval_log_path, n_eval_episodes and eval_freq parameters,
please use an EvalCallback instead
Removed deprecated sde_net_arch parameter
Removed ret attributes in VecNormalize, please use returns instead
VecNormalize now updates the observation space when normalizing images

New Features:

Introduced mypy type checking
Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
Added with_bias argument to create_mlp
Added support for multidimensional spaces.MultiBinary observations
Features extractors now properly support unnormalized image-like observations (3D tensor)
when passing normalize_images=False
Added normalized_image parameter to NatureCNN and CombinedExtractor
Added support for Python 3.10

SB3-Contrib

Fixed a bug in RecurrentPPO where the lstm states where incorrectly reshaped for n_lstm_layers > 1 (thanks @kolbytn)
Fixed RuntimeError: rnn: hx is not contiguous while predicting terminal values for RecurrentPPO when n_lstm_layers > 1

RL Zoo

Added support for python file for configuration
Added monitor_kwargs parameter

Bug Fixes:

Fixed ProgressBarCallback under-reporting (@dominicgkerr)
Fixed return type of evaluate_actions in ActorCritcPolicy to reflect that entropy is an optional tensor (@Rocamonde)
Fixed type annotation of policy in BaseAlgorithm and OffPolicyAlgorithm
Allowed model trained with Python 3.7 to be loaded with Python 3.8+ without the custom_objects workaround
Raise an error when the same gym environment instance is passed as separate environments when creating a vectorized environment with more than one environment. (@Rocamonde)
Fix type annotation of model in evaluate_policy
Fixed Self return type using TypeVar
Fixed the env checker, the key was not passed when checking images from Dict observation space
Fixed normalize_images which was not passed to parent class in some cases
Fixed load_from_vector that was broken with newer PyTorch version when passing PyTorch tensor

Deprecations:

You should now explicitely pass a features_extractor parameter when calling extract_features()
Deprecated shared layers in MlpExtractor (@AlexPasqua)

Others:

Used issue forms instead of issue templates
Updated the PR template to associate each PR with its peer in RL-Zoo3 and SB3-Contrib
Fixed flake8 config to be compatible with flake8 6+
Goal-conditioned environments are now characterized by the availability of the compute_reward method, rather than by their inheritance to gym.GoalEnv
Replaced CartPole-v0 by CartPole-v1 is tests
Fixed tests/test_distributions.py type hints
Fixed stable_baselines3/common/type_aliases.py type hints
Fixed stable_baselines3/common/torch_layers.py type hints
Fixed stable_baselines3/common/env_util.py type hints
Fixed stable_baselines3/common/preprocessing.py type hints
Fixed stable_baselines3/common/atari_wrappers.py type hints
Fixed stable_baselines3/common/vec_env/vec_check_nan.py type hints
Exposed modules in __init__.py with the __all__ attribute (@ZikangXiong)
Upgraded GitHub CI/setup-python to v4 and checkout to v3
Set tensors construction directly on the device (~8% speed boost on GPU)
Monkey-patched np.bool = bool so gym 0.21 is compatible with NumPy 1.24+
Standardized the use of from gym import spaces
Modified get_system_info to avoid issue linked to copy-pasting on GitHub issue

Documentation:

Updated Hugging Face Integration page (@simoninithomas)
Changed env to vec_env when environment is vectorized
Updated custom policy docs to better explain the mlp_extractor's dimensions (@AlexPasqua)
Updated custom policy documentation (@athatheo)
Improved tensorboard callback doc
Clarify doc when using image-like input
Added RLeXplore to the project page (@yuanmingqi)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stable-Baselines3 v1.7.0 : non-shared features extractor, bug fixes and quality of life improvements

Breaking Changes:

New Features:

SB3-Contrib

RL Zoo

Bug Fixes:

Deprecations:

Others:

Documentation:

Contributors