Skip to content

SB3 v1.6.1: Bug fix release

Compare
Choose a tag to compare
@araffin araffin released this 29 Sep 11:09
· 193 commits to master since this release
21300c9

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

  • Switched minimum tensorboard version to 2.9.1

New Features:

  • Support logging hyperparameters to tensorboard (@timothe-chaumont)
  • Added checkpoints for replay buffer and VecNormalize statistics (@anand-bala)
  • Added option for Monitor to append to existing file instead of overriding (@sidney-tio)
  • The env checker now raises an error when using dict observation spaces and observation keys don't match observation space keys

SB3-Contrib

  • Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with RecurrentPPO (@mlodel)

Bug Fixes:

  • Fixed issue where PPO gives NaN if rollout buffer provides a batch of size 1 (@hughperkins)
  • Fixed the issue that predict does not always return action as np.ndarray (@qgallouedec)
  • Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
  • Added multidimensional action space support (@qgallouedec)
  • Fixed missing verbose parameter passing in the EvalCallback constructor (@BurakDmb)
  • Fixed the issue that when updating the target network in DQN, SAC, TD3, the running_mean and running_var properties of batch norm layers are not updated (@honglu2875)
  • Fixed incorrect type annotation of the replay_buffer_class argument in common.OffPolicyAlgorithm initializer, where an instance instead of a class was required (@Rocamonde)
  • Fixed loading saved model with different number of envrionments
  • Removed forward() abstract method declaration from common.policies.BaseModel (already defined in torch.nn.Module) to fix type errors in subclasses (@Rocamonde)
  • Fixed the return type of .load() and .learn() methods in BaseAlgorithm so that they now use TypeVar (@Rocamonde)
  • Fixed an issue where keys with different tags but the same key raised an error in common.logger.HumanOutputFormat (@Rocamonde and @AdamGleave)

Others:

  • Fixed DictReplayBuffer.next_observations typing (@qgallouedec)
  • Added support for device="auto" in buffers and made it default (@qgallouedec)
  • Updated ResultsWriter` (used internally by Monitorwrapper) to automatically create missing directories whenfilename`` is a path (@dominicgkerr)

Documentation:

  • Added an example of callback that logs hyperparameters to tensorboard. (@timothe-chaumont)
  • Fixed typo in docstring "nature" -> "Nature" (@Melanol)
  • Added info on split tensorboard logs into (@Melanol)
  • Fixed typo in ppo doc (@francescoluciano)
  • Fixed typo in install doc(@jlp-ue)
  • Clarified and standardized verbosity documentation
  • Added link to a GitHub issue in the custom policy documentation (@AlexPasqua)
  • Fixed typos (@Akhilez)