Release v2.6.0 (#2109)

araffin · web-flow · commit ea913a848242 · 2025-03-24T15:32:43.000+01:00
diff --git a/docs/guide/rl_tips.rst b/docs/guide/rl_tips.rst
@@ -233,9 +233,9 @@ If you want to quickly try a random agent on your environment, you can also do:
 **Why should I normalize the action space?**
 
 
-Most reinforcement learning algorithms rely on a Gaussian distribution (initially centered at 0 with std 1) for continuous actions.
+Most reinforcement learning algorithms rely on a `Gaussian distribution <https://araffin.github.io/post/sac-massive-sim/>`_ (initially centered at 0 with std 1) for continuous actions.
 So, if you forget to normalize the action space when using a custom environment,
-this can harm learning and can be difficult to debug (cf attached image and `issue #473 <https://github.com/hill-a/stable-baselines/issues/473>`_).
+this can `harm learning <https://araffin.github.io/post/sac-massive-sim/>`_ and can be difficult to debug (cf attached image and `issue #473 <https://github.com/hill-a/stable-baselines/issues/473>`_).
 
 .. figure:: ../_static/img/mistake.png
 
diff --git a/docs/misc/changelog.rst b/docs/misc/changelog.rst
@@ -3,9 +3,10 @@
 Changelog
 ==========
 
-Release 2.6.0a2 (WIP)
+Release 2.6.0 (2025-03-24)
 --------------------------
 
+**New ``LogEveryNTimesteps`` callback and ``has_attr`` method, refactored hyperparameter optimization**
 
 Breaking Changes:
 ^^^^^^^^^^^^^^^^^
@@ -22,12 +23,24 @@ Bug Fixes:
 
 `SB3-Contrib`_
 ^^^^^^^^^^^^^^
+- Renamed ``_dump_logs()`` to ``dump_logs()``
+- Fixed issues with ``SubprocVecEnv`` and ``MaskablePPO`` by using ``vec_env.has_attr()`` (pickling issues, mask function not present)
 
 `RL Zoo`_
 ^^^^^^^^^
+- Refactored hyperparameter optimization. The Optuna `Journal storage backend <https://optuna.readthedocs.io/en/stable/reference/generated/optuna.storages.JournalStorage.html>`__ is now supported (recommended default) and you can easily load tuned hyperparameter via the new ``--trial-id`` argument of ``train.py``.
+- Save the exact command line used to launch a training
+- Added support for special vectorized env (e.g. Brax, IsaacSim) by allowing to override the ``VecEnv`` class use to instantiate the env in the ``ExperimentManager``
+- Allow to disable auto-logging by passing ``--log-interval -2`` (useful when logging things manually)
+- Added Gymnasium v1.1 support
+- Fixed use of old HF api in ``get_hf_trained_models()``
 
 `SBX`_ (SB3 + Jax)
 ^^^^^^^^^^^^^^^^^^
+- Updated PPO to support ``net_arch``, and additional fixes
+- Fixed entropy coeff wrongly logged for SAC and derivatives.
+- Fixed PPO ``predict()`` for env that were not normalized (action spaces with limits != [-1, 1])
+- PPO now logs the standard deviation
 
 Deprecations:
 ^^^^^^^^^^^^^
diff --git a/stable_baselines3/version.txt b/stable_baselines3/version.txt
@@ -1 +1 @@
-2.6.0a2
+2.6.0