Releases · DLR-RM/stable-baselines3

10 Jan 16:52

araffin

v1.7.0

6b8905a

Stable-Baselines3 v1.7.0 : non-shared features extractor, bug fixes and quality of life improvements

SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo

To upgrade:

pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade

or simply (rl zoo depends on SB3 and SB3 contrib):

pip install rl_zoo3 --upgrade

Warning
Shared layers in MLP policy (mlp_extractor) are now deprecated for PPO, A2C and TRPO.
This feature will be removed in SB3 v1.8.0 and the behavior of net_arch=[64, 64]
will create separate networks with the same architecture, to be consistent with the off-policy algorithms.

Note
A2C and PPO models saved with SB3 < 1.7.0 will show a warning about
missing keys in the state dict when loaded with SB3 >= 1.7.0.
To suppress the warning, simply save the model again.
You can find more info in issue #1233

Breaking Changes:

Removed deprecated create_eval_env, eval_env, eval_log_path, n_eval_episodes and eval_freq parameters,
please use an EvalCallback instead
Removed deprecated sde_net_arch parameter
Removed ret attributes in VecNormalize, please use returns instead
VecNormalize now updates the observation space when normalizing images

New Features:

Introduced mypy type checking
Added option to have non-shared features extractor between actor and critic in on-policy algorithms (@AlexPasqua)
Added with_bias argument to create_mlp
Added support for multidimensional spaces.MultiBinary observations
Features extractors now properly support unnormalized image-like observations (3D tensor)
when passing normalize_images=False
Added normalized_image parameter to NatureCNN and CombinedExtractor
Added support for Python 3.10

SB3-Contrib

Fixed a bug in RecurrentPPO where the lstm states where incorrectly reshaped for n_lstm_layers > 1 (thanks @kolbytn)
Fixed RuntimeError: rnn: hx is not contiguous while predicting terminal values for RecurrentPPO when n_lstm_layers > 1

RL Zoo

Added support for python file for configuration
Added monitor_kwargs parameter

Bug Fixes:

Fixed ProgressBarCallback under-reporting (@dominicgkerr)
Fixed return type of evaluate_actions in ActorCritcPolicy to reflect that entropy is an optional tensor (@Rocamonde)
Fixed type annotation of policy in BaseAlgorithm and OffPolicyAlgorithm
Allowed model trained with Python 3.7 to be loaded with Python 3.8+ without the custom_objects workaround
Raise an error when the same gym environment instance is passed as separate environments when creating a vectorized environment with more than one environment. (@Rocamonde)
Fix type annotation of model in evaluate_policy
Fixed Self return type using TypeVar
Fixed the env checker, the key was not passed when checking images from Dict observation space
Fixed normalize_images which was not passed to parent class in some cases
Fixed load_from_vector that was broken with newer PyTorch version when passing PyTorch tensor

Deprecations:

You should now explicitely pass a features_extractor parameter when calling extract_features()
Deprecated shared layers in MlpExtractor (@AlexPasqua)

Others:

Used issue forms instead of issue templates
Updated the PR template to associate each PR with its peer in RL-Zoo3 and SB3-Contrib
Fixed flake8 config to be compatible with flake8 6+
Goal-conditioned environments are now characterized by the availability of the compute_reward method, rather than by their inheritance to gym.GoalEnv
Replaced CartPole-v0 by CartPole-v1 is tests
Fixed tests/test_distributions.py type hints
Fixed stable_baselines3/common/type_aliases.py type hints
Fixed stable_baselines3/common/torch_layers.py type hints
Fixed stable_baselines3/common/env_util.py type hints
Fixed stable_baselines3/common/preprocessing.py type hints
Fixed stable_baselines3/common/atari_wrappers.py type hints
Fixed stable_baselines3/common/vec_env/vec_check_nan.py type hints
Exposed modules in __init__.py with the __all__ attribute (@ZikangXiong)
Upgraded GitHub CI/setup-python to v4 and checkout to v3
Set tensors construction directly on the device (~8% speed boost on GPU)
Monkey-patched np.bool = bool so gym 0.21 is compatible with NumPy 1.24+
Standardized the use of from gym import spaces
Modified get_system_info to avoid issue linked to copy-pasting on GitHub issue

Documentation:

Updated Hugging Face Integration page (@simoninithomas)
Changed env to vec_env when environment is vectorized
Updated custom policy docs to better explain the mlp_extractor's dimensions (@AlexPasqua)
Updated custom policy documentation (@athatheo)
Improved tensorboard callback doc
Clarify doc when using image-like input
Added RLeXplore to the project page (@yuanmingqi)

Contributors

Rocamonde, dominicgkerr, and 6 other contributors

Assets 2

10 Oct 14:53

araffin

v1.6.2

4a558dd

SB3 v1.6.2: Progress bar and RL Zoo3 package

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3: https://github.com/DLR-RM/rl-baselines3-zoo

New Features:

Added progress_bar argument in the learn() method, displayed using TQDM and rich packages
Added progress bar callback

RL Zoo3

The RL Zoo can now be installed as a package (pip install rl_zoo3)

Bug Fixes:

self.num_timesteps was initialized properly only after the first call to on_step() for callbacks
Set importlib-metadata version to ~=4.13 to be compatible with gym=0.21

Deprecations:

Added deprecation warning if parameters eval_env, eval_freq or create_eval_env are used (see #925) (@tobirohrer)

Others:

Fixed type hint of the env_id parameter in make_vec_env and make_atari_env (@AlexPasqua)

Documentation:

Extended docstring of the wrapper_class parameter in make_vec_env (@AlexPasqua)

Contributors

tobirohrer and AlexPasqua

Assets 2

29 Sep 11:09

araffin

v1.6.1

21300c9

SB3 v1.6.1: Bug fix release

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Switched minimum tensorboard version to 2.9.1

New Features:

Support logging hyperparameters to tensorboard (@timothe-chaumont)
Added checkpoints for replay buffer and VecNormalize statistics (@anand-bala)
Added option for Monitor to append to existing file instead of overriding (@sidney-tio)
The env checker now raises an error when using dict observation spaces and observation keys don't match observation space keys

SB3-Contrib

Fixed the issue of wrongly passing policy arguments when using CnnLstmPolicy or MultiInputLstmPolicy with RecurrentPPO (@mlodel)

Bug Fixes:

Fixed issue where PPO gives NaN if rollout buffer provides a batch of size 1 (@hughperkins)
Fixed the issue that predict does not always return action as np.ndarray (@qgallouedec)
Fixed division by zero error when computing FPS when a small number of time has elapsed in operating systems with low-precision timers.
Added multidimensional action space support (@qgallouedec)
Fixed missing verbose parameter passing in the EvalCallback constructor (@BurakDmb)
Fixed the issue that when updating the target network in DQN, SAC, TD3, the running_mean and running_var properties of batch norm layers are not updated (@honglu2875)
Fixed incorrect type annotation of the replay_buffer_class argument in common.OffPolicyAlgorithm initializer, where an instance instead of a class was required (@Rocamonde)
Fixed loading saved model with different number of envrionments
Removed forward() abstract method declaration from common.policies.BaseModel (already defined in torch.nn.Module) to fix type errors in subclasses (@Rocamonde)
Fixed the return type of .load() and .learn() methods in BaseAlgorithm so that they now use TypeVar (@Rocamonde)
Fixed an issue where keys with different tags but the same key raised an error in common.logger.HumanOutputFormat (@Rocamonde and @AdamGleave)

Others:

Fixed DictReplayBuffer.next_observations typing (@qgallouedec)
Added support for device="auto" in buffers and made it default (@qgallouedec)
Updated ResultsWriter` (used internally by Monitorwrapper) to automatically create missing directories whenfilename`` is a path (@dominicgkerr)

Documentation:

Added an example of callback that logs hyperparameters to tensorboard. (@timothe-chaumont)
Fixed typo in docstring "nature" -> "Nature" (@Melanol)
Added info on split tensorboard logs into (@Melanol)
Fixed typo in ppo doc (@francescoluciano)
Fixed typo in install doc(@jlp-ue)
Clarified and standardized verbosity documentation
Added link to a GitHub issue in the custom policy documentation (@AlexPasqua)
Fixed typos (@Akhilez)

Contributors

hughperkins, AdamGleave, and 14 other contributors

Assets 2

12 Jul 20:55

araffin

v1.6.0

c1f1c3d

SB3 v1.6.0: Recurrent PPO (PPO LSTM), better defaults for learning from pixels with SAC/TD3

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Changed the way policy "aliases" are handled ("MlpPolicy", "CnnPolicy", ...), removing the former
register_policy helper, policy_base parameter and using policy_aliases static attributes instead (@Gregwar)
SB3 now requires PyTorch >= 1.11
Changed the default network architecture when using CnnPolicy or MultiInputPolicy with SAC or DDPG/TD3,
share_features_extractor is now set to False by default and the net_arch=[256, 256] (instead of net_arch=[] that was before)

SB3-Contrib

Added Recurrent PPO (PPO LSTM). See Stable-Baselines-Team/stable-baselines3-contrib#53

Bug Fixes:

Fixed saving and loading large policies greater than 2GB (@jkterry1, @ycheng517)
Fixed final goal selection strategy that did not sample the final achieved goal (@qgallouedec)
Fixed a bug with special characters in the tensorboard log name (@quantitative-technologies)
Fixed a bug in DummyVecEnv's and SubprocVecEnv's seeding function. None value was unchecked (@ScheiklP)
Fixed a bug where EvalCallback would crash when trying to synchronize VecNormalize stats when observation normalization was disabled
Added a check for unbounded actions
Fixed issues due to newer version of protobuf (tensorboard) and sphinx
Fix exception causes all over the codebase (@cool-RR)
Prohibit simultaneous use of optimize_memory_usage and handle_timeout_termination due to a bug (@MWeltevrede)
Fixed a bug in kl_divergence check that would fail when using numpy arrays with MultiCategorical distribution

Others:

Upgraded to Python 3.7+ syntax using pyupgrade
Removed redundant double-check for nested observations from BaseAlgorithm._wrap_env (@TibiGG)

Documentation:

Added link to gym doc and gym env checker
Fix typo in PPO doc (@bcollazo)
Added link to PPO ICLR blog post
Added remark about breaking Markov assumption and timeout handling
Added doc about MLFlow integration via custom logger (@git-thor)
Updated Huggingface integration doc
Added copy button for code snippets
Added doc about EnvPool and Isaac Gym support

Contributors

cool-RR, Gregwar, and 9 other contributors

Assets 2

25 Mar 13:57

araffin

v1.5.0

30772aa

SB3 v1.5.0: Bug fixes, early stopping callback

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Switched minimum Gym version to 0.21.0.

New Features:

Added StopTrainingOnNoModelImprovement to callback collection (@caburu)
Makes the length of keys and values in HumanOutputFormat configurable,
depending on desired maximum width of output.
Allow PPO to turn of advantage normalization (see PR #763) @vwxyzjn

SB3-Contrib

coming soon: Cross Entropy Method, see Stable-Baselines-Team/stable-baselines3-contrib#62

Bug Fixes:

Fixed a bug in VecMonitor. The monitor did not consider the info_keywords during stepping (@ScheiklP)
Fixed a bug in HumanOutputFormat. Distinct keys truncated to the same prefix would overwrite each others value,
resulting in only one being output. This now raises an error (this should only affect a small fraction of use cases
with very long keys.)
Routing all the nn.Module calls through implicit rather than explict forward as per pytorch guidelines (@manuel-delverme)
Fixed a bug in VecNormalize where error occurs when norm_obs is set to False for environment with dictionary observation (@buoyancy99)
Set default env argument to None in HerReplayBuffer.sample (@qgallouedec)
Fix batch_size typing in DQN (@qgallouedec)
Fixed sample normalization in DictReplayBuffer (@qgallouedec)

Others:

Fixed pytest warnings
Removed parameter remove_time_limit_termination in off policy algorithms since it was dead code (@Gregwar)

Documentation:

Added doc on Hugging Face integration (@simoninithomas)
Added furuta pendulum project to project list (@Armandpl)
Fix indentation 2 spaces to 4 spaces in custom env documentation example (@Gautam-J)
Update MlpExtractor docstring (@gianlucadecola)
Added explanation of the logger output
Update Directly Accessing The Summary Writer in tensorboard integration (@xy9485)

Full Changelog: v1.4.0...v1.5.0

Contributors

Gregwar, manuel-delverme, and 10 other contributors

Assets 2

19 Jan 10:21

araffin

v1.4.0

21f6a47

SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms

SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib

Breaking Changes:

Dropped python 3.6 support (as announced in previous release)
Renamed mask argument of the predict() method to episode_start (used with RNN policies only)
local variables action, done and reward were renamed to their plural form for offpolicy algorithms (actions, dones, rewards),
this may affect custom callbacks.
Removed episode_reward field from RolloutReturn() type

Warning:

An update to the HER algorithm is planned to support multi-env training and remove the max episode length constrain.
(see PR #704)
This will be a backward incompatible change (model trained with previous version of HER won't work with the new version).

New Features:

Added norm_obs_keys param for VecNormalize wrapper to configure which observation keys to normalize (@kachayev)
Added experimental support to train off-policy algorithms with multiple envs (note: HerReplayBuffer currently not supported)
Handle timeout termination properly for on-policy algorithms (when using TimeLimit)
Added skip option to VecTransposeImage to skip transforming the channel order when the heuristic is wrong
Added copy() and combine() methods to RunningMeanStd

SB3-Contrib

Added Trust Region Policy Optimization (TRPO) (@cyprienc)
Added Augmented Random Search (ARS) (@sgillen)
Coming soon: PPO LSTM, see Stable-Baselines-Team/stable-baselines3-contrib#53

Bug Fixes:

Fixed a bug where set_env() with VecNormalize would result in an error with off-policy algorithms (thanks @cleversonahum)
FPS calculation is now performed based on number of steps performed during last learn call, even when reset_num_timesteps is set to False (@kachayev)
Fixed evaluation script for recurrent policies (experimental feature in SB3 contrib)
Fixed a bug where the observation would be incorrectly detected as non-vectorized instead of throwing an error
The env checker now properly checks and warns about potential issues for continuous action spaces when the boundaries are too small or when the dtype is not float32
Fixed a bug in VecFrameStack with channel first image envs, where the terminal observation would be wrongly created.

Others:

Added a warning in the env checker when not using np.float32 for continuous actions
Improved test coverage and error message when checking shape of observation
Added newline="\n" when opening CSV monitor files so that each line ends with \r\n instead of \r\r\n on Windows while Linux environments are not affected (@hsuehch)
Fixed device argument inconsistency (@qgallouedec)

Documentation:

Add drivergym to projects page (@theDebugger811)
Add highway-env to projects page (@eleurent)
Add tactile-gym to projects page (@ac-93)
Fix indentation in the RL tips page (@cove9988)
Update GAE computation docstring
Add documentation on exporting to TFLite/Coral
Added JMLR paper and updated citation
Added link to RL Tips and Tricks video
Updated BaseAlgorithm.load docstring (@Demetrio92)
Added a note on load behavior in the examples (@Demetrio92)
Updated SB3 Contrib doc
Fixed A2C and migration guide guidance on how to set epsilon with RMSpropTFLike (@thomasgubler)
Fixed custom policy documentation (@IperGiove)
Added doc on Weights & Biases integration

Contributors

kachayev, thomasgubler, and 11 other contributors

Assets 2

23 Oct 15:15

araffin

v1.3.0

7b977d7

SB3 v1.3.0 : Bug fixes and improvements for the user

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021).
We highly recommend you to upgrade to Python >= 3.7.

SB3-Contrib changelog: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/releases/tag/v1.3.0

Breaking Changes:

sde_net_arch argument in policies is deprecated and will be removed in a future version.
_get_latent (ActorCriticPolicy) was removed
All logging keys now use underscores instead of spaces (@timokau). Concretely this changes:
- time/total timesteps to time/total_timesteps for off-policy algorithms (PPO and A2C) and the eval callback (on-policy algorithms already used the underscored version),
- rollout/exploration rate to rollout/exploration_rate and
- rollout/success rate to rollout/success_rate.

New Features:

Added methods get_distribution and predict_values for ActorCriticPolicy for A2C/PPO/TRPO (@cyprienc)
Added methods forward_actor and forward_critic for MlpExtractor
Added sb3.get_system_info() helper function to gather version information relevant to SB3 (e.g., Python and PyTorch version)
Saved models now store system information where agent was trained, and load functions have print_system_info parameter to help debugging load issues.

Bug Fixes:

Fixed dtype of observations for SimpleMultiObsEnv
Allow VecNormalize to wrap discrete-observation environments to normalize reward
when observation normalization is disabled.
Fixed a bug where DQN would throw an error when using Discrete observation and stochastic actions
Fixed a bug where sub-classed observation spaces could not be used
Added force_reset argument to load() and set_env() in order to be able to call learn(reset_num_timesteps=False) with a new environment

Others:

Cap gym max version to 0.19 to avoid issues with atari-py and other breaking changes
Improved error message when using dict observation with the wrong policy
Improved error message when using EvalCallback with two envs not wrapped the same way.
Added additional infos about supported python version for PyPi in setup.py

Documentation:

Add Rocket League Gym to list of supported projects (@AechPro)
Added gym-electric-motor to project page (@wkirgsn)
Added policy-distillation-baselines to project page (@CUN-bjy)
Added ONNX export instructions (@batu)
Update read the doc env (fixed docutils issue)
Fix PPO environment name (@IljaAvadiev)
Fix custom env doc and add env registration example
Update algorithms from SB3 Contrib
Use underscores for numeric literals in examples to improve clarity

Contributors

timokau, cyprienc, and 5 other contributors

Assets 2

08 Sep 10:34

araffin

v1.2.0

f8a0869

SB3 v1.2.0: Hotfix for VecNormalize, training/eval mode support

Breaking Changes:

SB3 now requires PyTorch >= 1.8.1
VecNormalize ret attribute was renamed to returns

Bug Fixes:

Hotfix for VecNormalize where the observation filter was not updated at reset (thanks @vwxyzjn)
Fixed model predictions when using batch normalization and dropout layers by calling train() and eval() (@davidblom603)
Fixed model training for DQN, TD3 and SAC so that their target nets always remain in evaluation mode (@ayeright)
Passing gradient_steps=0 to an off-policy algorithm will result in no gradient steps being taken (vs as many gradient steps as steps done in the environment
during the rollout in previous versions)

Others:

Enabled Python 3.9 in GitHub CI
Fixed type annotations
Refactored predict() by moving the preprocessing to obs_to_tensor() method

Documentation:

Updated multiprocessing example
Added example of VecEnvWrapper
Added a note about logging to tensorboard more often
Added warning about simplicity of examples and link to RL zoo (@MihaiAnca13)

Contributors

vwxyzjn, ayeright, and 2 other contributors

Assets 2

02 Jul 10:07

araffin

v1.1.0

5af35fa

SB3 v1.1.0: Dictionary observation support, timeout handling and refactored HER buffer

Breaking Changes

All customs environments (e.g. the BitFlippingEnv or IdentityEnv) were moved to stable_baselines3.common.envs folder
Refactored HER which is now the HerReplayBuffer class that can be passed to any off-policy algorithm
Handle timeout termination properly for off-policy algorithms (when using TimeLimit)
Renamed _last_dones and dones to _last_episode_starts and episode_starts in RolloutBuffer.
Removed ObsDictWrapper as Dict observation spaces are now supported

  her_kwargs = dict(n_sampled_goal=2, goal_selection_strategy="future", online_sampling=True)
  # SB3 < 1.1.0
  # model = HER("MlpPolicy", env, model_class=SAC, **her_kwargs)
  # SB3 >= 1.1.0:
  model = SAC("MultiInputPolicy", env, replay_buffer_class=HerReplayBuffer, replay_buffer_kwargs=her_kwargs)

Updated the KL Divergence estimator in the PPO algorithm to be positive definite and have lower variance (@09tangriro)
Updated the KL Divergence check in the PPO algorithm to be before the gradient update step rather than after end of epoch (@09tangriro)
Removed parameter channels_last from is_image_space as it can be inferred.
The logger object is now an attribute model.logger that be set by the user using model.set_logger()
Changed the signature of logger.configure and utils.configure_logger, they now return a Logger object
Removed Logger.CURRENT and Logger.DEFAULT
Moved warn(), debug(), log(), info(), dump() methods to the Logger class
.learn() now throws an import error when the user tries to log to tensorboard but the package is not installed

New Features

Added support for single-level Dict observation space (@JadenTravnik)
Added DictRolloutBuffer DictReplayBuffer to support dictionary observations (@JadenTravnik)
Added StackedObservations and StackedDictObservations that are used within VecFrameStack
Added simple 4x4 room Dict test environments
HerReplayBuffer now supports VecNormalize when online_sampling=False
Added VecMonitor and VecExtractDictObs wrappers to handle gym3-style vectorized environments (@vwxyzjn)
Ignored the terminal observation if the it is not provided by the environment
such as the gym3-style vectorized environments. (@vwxyzjn)
Added policy_base as input to the OnPolicyAlgorithm for more flexibility (@09tangriro)
Added support for image observation when using HER
Added replay_buffer_class and replay_buffer_kwargs arguments to off-policy algorithms
Added kl_divergence helper for Distribution classes (@09tangriro)
Added support for vector environments with num_envs > 1 (@benblack769)
Added wrapper_kwargs argument to make_vec_env (@amy12xx)

Bug Fixes

Fixed potential issue when calling off-policy algorithms with default arguments multiple times (the size of the replay buffer would be the same)
Fixed loading of ent_coef for SAC and TQC, it was not optimized anymore (thanks @Atlis)
Fixed saving of A2C and PPO policy when using gSDE (thanks @liusida)
Fixed a bug where no output would be shown even if verbose>=1 after passing verbose=0 once
Fixed observation buffers dtype in DictReplayBuffer (@c-rizz)
Fixed EvalCallback tensorboard logs being logged with the incorrect timestep. They are now written with the timestep at which they were recorded. (@skandermoalla)

Others

Added flake8-bugbear to tests dependencies to find likely bugs
Updated env_checker to reflect support of dict observation spaces
Added Code of Conduct
Added tests for GAE and lambda return computation
Updated distribution entropy test (thanks @09tangriro)
Added sanity check batch_size > 1 in PPO to avoid NaN in advantage normalization

Documentation:

Added gym pybullet drones project (@JacopoPan)
Added link to SuperSuit in projects (@justinkterry)
Fixed DQN example (thanks @ltbd78)
Clarified channel-first/channel-last recommendation
Update sphinx environment installation instructions (@tom-doerr)
Clarified pip installation in Zsh (@tom-doerr)
Clarified return computation for on-policy algorithms (TD(lambda) estimate was used)
Added example for using ProcgenEnv
Added note about advanced custom policy example for off-policy algorithms
Fixed DQN unicode checkmarks
Updated migration guide (@juancroldan)
Pinned docutils==0.16 to avoid issue with rtd theme
Clarified callback save_freq definition
Added doc on how to pass a custom logger
Remove recurrent policies from A2C docs (@bstee615)

Assets 2

17 Mar 14:26

araffin

v1.0

e3875b5

Stable-Baselines3 v1.0

First Major Version

Blog post: https://araffin.github.io/post/sb3/

100+ pre-trained models in the zoo: https://github.com/DLR-RM/rl-baselines3-zoo

Breaking Changes:

Removed stable_baselines3.common.cmd_util (already deprecated), please use env_util instead

Warning

A refactoring of the HER algorithm is planned together with support for dictionary observations (see PR #243 and
#351)
This will be a backward incompatible change (model trained with previous version of HER won't work with the new version).

New Features:

Added support for custom_objects when loading models

Bug Fixes:

Fixed a bug with DQN predict method when using deterministic=False with image space

Documentation:

Fixed examples
Added new project using SB3: rl_reach (@PierreExeter)
Added note about slow-down when switching to PyTorch
Add a note on continual learning and resetting environment
Updated RL-Zoo to reflect the fact that is it more than a collection of trained agents
Added images to illustrate the training loop and custom policies (created with https://excalidraw.com/)
Updated the custom policy section

Assets 2

Releases: DLR-RM/stable-baselines3

Stable-Baselines3 v1.7.0 : non-shared features extractor, bug fixes and quality of life improvements

Breaking Changes:

New Features:

SB3-Contrib

RL Zoo

Bug Fixes:

Deprecations:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.6.2: Progress bar and RL Zoo3 package

New Features:

RL Zoo3

Bug Fixes:

Deprecations:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.6.1: Bug fix release

Breaking Changes:

New Features:

SB3-Contrib

Bug Fixes:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.6.0: Recurrent PPO (PPO LSTM), better defaults for learning from pixels with SAC/TD3

Breaking Changes:

SB3-Contrib

Bug Fixes:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.5.0: Bug fixes, early stopping callback

Breaking Changes:

New Features:

SB3-Contrib

Bug Fixes:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms

Breaking Changes:

New Features:

SB3-Contrib

Bug Fixes:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.3.0 : Bug fixes and improvements for the user

Breaking Changes:

New Features:

Bug Fixes:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.2.0: Hotfix for VecNormalize, training/eval mode support

Breaking Changes:

Bug Fixes:

Others:

Documentation:

Contributors

Uh oh!

SB3 v1.1.0: Dictionary observation support, timeout handling and refactored HER buffer

Breaking Changes

New Features

Bug Fixes

Others

Documentation:

Uh oh!

Stable-Baselines3 v1.0

Breaking Changes: