You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/misc/changelog.rst
+19-5Lines changed: 19 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -3,9 +3,12 @@
3
3
Changelog
4
4
==========
5
5
6
-
Release 2.3.0a5 (WIP)
6
+
Release 2.3.0 (2024-03-31)
7
7
--------------------------
8
8
9
+
**New defaults hyperparameters for DDPG, TD3 and DQN**
10
+
11
+
9
12
Breaking Changes:
10
13
^^^^^^^^^^^^^^^^^
11
14
- The defaults hyperparameters of ``TD3`` and ``DDPG`` have been changed to be more consistent with ``SAC``
@@ -19,11 +22,11 @@ Breaking Changes:
19
22
20
23
.. note::
21
24
22
-
Two inconsistencies remains: the default network architecture for ``TD3/DDPG`` is ``[400, 300]`` instead of ``[256, 256]`` for SAC (for backward compatibility reasons, see `report on the influence of the network size <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3>`_) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see `W&B report on the influence of the lr <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx>`_)
25
+
Two inconsistencies remain: the default network architecture for ``TD3/DDPG`` is ``[400, 300]`` instead of ``[256, 256]`` for SAC (for backward compatibility reasons, see `report on the influence of the network size <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-Influence-of-policy-net--Vmlldzo2NDg1Mzk3>`_) and the default learning rate is 1e-3 instead of 3e-4 for SAC (for performance reasons, see `W&B report on the influence of the lr <https://wandb.ai/openrlbenchmark/sbx/reports/SBX-TD3-RL-Zoo-v2-3-0a0-vs-SB3-TD3-RL-Zoo-2-2-1---Vmlldzo2MjUyNTQx>`_)
23
26
24
27
25
28
26
-
- The default ``leanrning_starts`` parameter of ``DQN`` have been changed to be consistent with the other offpolicy algorithms
29
+
- The default ``learning_starts`` parameter of ``DQN`` have been changed to be consistent with the other offpolicy algorithms
27
30
28
31
29
32
.. code-block:: python
@@ -35,8 +38,7 @@ Breaking Changes:
35
38
36
39
- For safety, ``torch.load()`` is now called with ``weights_only=True`` when loading torch tensors,
37
40
policy ``load()`` still uses ``weights_only=False`` as gymnasium imports are required for it to work
38
-
- When using ``huggingface_sb3``, you will now need to set ``TRUST_REMOTE_CODE=True`` when downloading models from the hub,
39
-
as ``pickle.load`` is not safe.
41
+
- When using ``huggingface_sb3``, you will now need to set ``TRUST_REMOTE_CODE=True`` when downloading models from the hub, as ``pickle.load`` is not safe.
40
42
41
43
42
44
New Features:
@@ -49,9 +51,20 @@ Bug Fixes:
49
51
50
52
`SB3-Contrib`_
51
53
^^^^^^^^^^^^^^
54
+
- Added ``rollout_buffer_class`` and ``rollout_buffer_kwargs`` arguments to MaskablePPO
55
+
- Fixed ``train_freq`` type annotation for tqc and qrdqn (@Armandpl)
56
+
- Fixed ``sb3_contrib/common/maskable/*.py`` type annotations
57
+
- Fixed ``sb3_contrib/ppo_mask/ppo_mask.py`` type annotations
58
+
- Fixed ``sb3_contrib/common/vec_env/async_eval.py`` type annotations
59
+
- Add some additional notes about ``MaskablePPO`` (evaluation and multi-process) (@icheered)
60
+
52
61
53
62
`RL Zoo`_
54
63
^^^^^^^^^
64
+
- Updated defaults hyperparameters for TD3/DDPG to be more consistent with SAC
65
+
- Upgraded MuJoCo envs hyperparameters to v4 (pre-trained agents need to be updated)
66
+
- Added test dependencies to `setup.py` (@power-edge)
67
+
- Simplify dependencies of `requirements.txt` (remove duplicates from `setup.py`)
0 commit comments