Skip to content

Commit 84f5511

Browse files
authored
Update changelog and cleanup (#1434)
1 parent 12250eb commit 84f5511

File tree

6 files changed

+23
-13
lines changed

6 files changed

+23
-13
lines changed

docs/misc/changelog.rst

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,11 @@ Changelog
44
==========
55

66

7-
Release 1.8.0a14 (WIP)
7+
Release 1.8.0 (2023-04-07)
88
--------------------------
99

10+
**Multi-env HerReplayBuffer, Open RL Benchmark, Improved env checker**
11+
1012
.. warning::
1113

1214
Stable-Baselines3 (SB3) v1.8.0 will be the last one to use Gym as a backend.
@@ -31,23 +33,37 @@ New Features:
3133
- Added support for dict/tuple observations spaces for ``VecCheckNan``, the check is now active in the ``env_checker()`` (@DavyMorgan)
3234
- Added multiprocessing support for ``HerReplayBuffer``
3335
- ``HerReplayBuffer`` now supports all datatypes supported by ``ReplayBuffer``
34-
- Provide more helpful failure messages when validating the ``observation_space`` of custom gym environments using ``check_env``` (@FieteO)
36+
- Provide more helpful failure messages when validating the ``observation_space`` of custom gym environments using ``check_env`` (@FieteO)
3537
- Added ``stats_window_size`` argument to control smoothing in rollout logging (@jonasreiher)
3638

3739

3840
`SB3-Contrib`_
3941
^^^^^^^^^^^^^^
42+
- Added warning about potential crashes caused by ``check_env`` in the ``MaskablePPO`` docs (@AlexPasqua)
43+
- Fixed ``sb3_contrib/qrdqn/*.py`` type hints
44+
- Removed shared layers in ``mlp_extractor`` (@AlexPasqua)
4045

4146
`RL Zoo`_
4247
^^^^^^^^^
48+
- `Open RL Benchmark <https://github.com/openrlbenchmark/openrlbenchmark/issues/7>`_
49+
- Upgraded to new `HerReplayBuffer` implementation that supports multiple envs
50+
- Removed `TimeFeatureWrapper` for Panda and Fetch envs, as the new replay buffer should handle timeout.
51+
- Tuned hyperparameters for RecurrentPPO on Swimmer
52+
- Documentation is now built using Sphinx and hosted on read the doc
53+
- Removed `use_auth_token` for push to hub util
54+
- Reverted from v3 to v2 for HumanoidStandup, Reacher, InvertedPendulum and InvertedDoublePendulum since they were not part of the mujoco refactoring (see https://github.com/openai/gym/pull/1304)
55+
- Fixed `gym-minigrid` policy (from `MlpPolicy` to `MultiInputPolicy`)
56+
- Replaced deprecated `optuna.suggest_loguniform(...)` by `optuna.suggest_float(..., log=True)`
57+
- Switched to `ruff` and `pyproject.toml`
58+
- Removed `online_sampling` and `max_episode_length` argument when using `HerReplayBuffer`
4359

4460
Bug Fixes:
4561
^^^^^^^^^^
4662
- Fixed Atari wrapper that missed the reset condition (@luizapozzobon)
4763
- Added the argument ``dtype`` (default to ``float32``) to the noise for consistency with gym action (@sidney-tio)
4864
- Fixed PPO train/n_updates metric not accounting for early stopping (@adamfrly)
4965
- Fixed loading of normalized image-based environments
50-
- Fixed `DictRolloutBuffer.add` with multidimensional action space (@younik)
66+
- Fixed ``DictRolloutBuffer.add`` with multidimensional action space (@younik)
5167

5268
Deprecations:
5369
^^^^^^^^^^^^^

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@
9090

9191
extra_packages = extra_no_roms + [ # noqa: RUF005
9292
# For atari roms,
93-
"autorom[accept-rom-license]~=0.5.5",
93+
"autorom[accept-rom-license]~=0.6.0",
9494
]
9595

9696

@@ -138,7 +138,7 @@
138138
# For spelling
139139
"sphinxcontrib.spelling",
140140
# Type hints support
141-
"sphinx-autodoc-typehints==1.21.1", # TODO: remove version constraint, see #1290
141+
"sphinx-autodoc-typehints",
142142
# Copy button for code snippets
143143
"sphinx_copybutton",
144144
],

stable_baselines3/common/distributions.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,6 @@ class TanhBijector:
617617
"""
618618
Bijective transformation of a probability distribution
619619
using a squashing function (tanh)
620-
TODO: use Pyro instead (https://pyro.ai/)
621620
622621
:param epsilon: small value to avoid NaN due to numerical imprecision.
623622
"""

stable_baselines3/common/policies.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -337,11 +337,6 @@ def predict(
337337
:return: the model's action and the next hidden state
338338
(used in recurrent policies)
339339
"""
340-
# TODO (GH/1): add support for RNN policies
341-
# if state is None:
342-
# state = self.initial_state
343-
# if episode_start is None:
344-
# episode_start = [False for _ in range(self.n_envs)]
345340
# Switch to eval mode (this affects batch norm / dropout)
346341
self.set_training_mode(False)
347342

stable_baselines3/version.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.8.0a14
1+
1.8.0

tests/test_her.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ def env_fn():
260260
del model.replay_buffer
261261

262262
with pytest.raises(AttributeError):
263-
model.replay_buffer
263+
model.replay_buffer # noqa: B018
264264

265265
# Check that there is no warning
266266
assert len(recwarn) == 0

0 commit comments

Comments
 (0)