Skip to content

Commit 429be93

Browse files
authored
Release v2.3.0 (#1879)
* Release v2.3.0 * Fix typos
1 parent 071226d commit 429be93

File tree

2 files changed

+20
-6
lines changed

2 files changed

+20
-6
lines changed

docs/misc/changelog.rst

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,12 @@
33
Changelog
44
==========
55

6-
Release 2.3.0a5 (WIP)
6+
Release 2.3.0 (2024-03-31)
77
--------------------------
88

9+
**New defaults hyperparameters for DDPG, TD3 and DQN**
10+
11+
912
Breaking Changes:
1013
^^^^^^^^^^^^^^^^^
1114
- The defaults hyperparameters of ``TD3`` and ``DDPG`` have been changed to be more consistent with ``SAC``
@@ -19,11 +22,11 @@ Breaking Changes:
1922
2023
.. note::
2124

22-
Two inconsistencies remains: the default network architecture for ``TD3/DDPG`` is ``[400, 300]`` instead of ``[256, 256]`` for SAC (for backward compatibility reasons, see `report on the influence of the network size <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3>`_) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see `W&B report on the influence of the lr <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx>`_)
25+
Two inconsistencies remain: the default network architecture for ``TD3/DDPG`` is ``[400, 300]`` instead of ``[256, 256]`` for SAC (for backward compatibility reasons, see `report on the influence of the network size <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3>`_) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see `W&B report on the influence of the lr <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx>`_)
2326

2427

2528

26-
- The default ``leanrning_starts`` parameter of ``DQN`` have been changed to be consistent with the other offpolicy algorithms
29+
- The default ``learning_starts`` parameter of ``DQN`` have been changed to be consistent with the other offpolicy algorithms
2730

2831

2932
.. code-block:: python
@@ -35,8 +38,7 @@ Breaking Changes:
3538
3639
- For safety, ``torch.load()`` is now called with ``weights_only=True`` when loading torch tensors,
3740
policy ``load()`` still uses ``weights_only=False`` as gymnasium imports are required for it to work
38-
- When using ``huggingface_sb3``, you will now need to set ``TRUST_REMOTE_CODE=True`` when downloading models from the hub,
39-
as ``pickle.load`` is not safe.
41+
- When using ``huggingface_sb3``, you will now need to set ``TRUST_REMOTE_CODE=True`` when downloading models from the hub, as ``pickle.load`` is not safe.
4042

4143

4244
New Features:
@@ -49,9 +51,20 @@ Bug Fixes:
4951

5052
`SB3-Contrib`_
5153
^^^^^^^^^^^^^^
54+
- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to MaskablePPO
55+
- Fixed ``train_freq`` type annotation for tqc and qrdqn (@Armandpl)
56+
- Fixed ``sb3_contrib/common/maskable/*.py`` type annotations
57+
- Fixed ``sb3_contrib/ppo_mask/ppo_mask.py`` type annotations
58+
- Fixed ``sb3_contrib/common/vec_env/async_eval.py`` type annotations
59+
- Add some additional notes about ``MaskablePPO`` (evaluation and multi-process) (@icheered)
60+
5261

5362
`RL Zoo`_
5463
^^^^^^^^^
64+
- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
65+
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
66+
- Added test dependencies to `setup.py` (@power-edge)
67+
- Simplify dependencies of `requirements.txt` (remove duplicates from `setup.py`)
5568

5669
`SBX`_ (SB3 + Jax)
5770
^^^^^^^^^^^^^^^^^^
@@ -60,6 +73,7 @@ Bug Fixes:
6073
- Fix ``train()`` signature and update type hints
6174
- Fix replay buffer device at load time
6275
- Added flatten layer
76+
- Added ``CrossQ``
6377

6478
Deprecations:
6579
^^^^^^^^^^^^^

stable_baselines3/version.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.3.0a5
1+
2.3.0

0 commit comments

Comments
 (0)