Added DQN and DDPG, bug fixes and performance matching for Atari games
Pre-release
Pre-release
Breaking Changes:
AtariWrapper
and other Atari wrappers were updated to match SB2 onessave_replay_buffer
now receives as argument the file path instead of the folder path (@tirafesi)- Refactored
Critic
class forTD3
andSAC
, it is now calledContinuousCritic
and has an additional parametern_critics
SAC
andTD3
now accept an arbitrary number of critics (e.g.policy_kwargs=dict(n_critics=3)
)
instead of only 2 previously
New Features:
- Added
DQN
Algorithm (@Artemis-Skade) - Buffer dtype is now set according to action and observation spaces for
ReplayBuffer
- Added warning when allocation of a buffer may exceed the available memory of the system
whenpsutil
is available - Saving models now automatically creates the necessary folders and raises appropriate warnings (@partiallytyped)
- Refactored opening paths for saving and loading to use strings, pathlib or io.BufferedIOBase (@partiallytyped)
- Added
DDPG
algorithm as a special case ofTD3
. - Introduced
BaseModel
abstract parent forBasePolicy
, which critics inherit from.
Bug Fixes:
- Fixed a bug in the
close()
method ofSubprocVecEnv
, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended) - Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states
- Use
cloudpickle.load
instead ofpickle.load
inCloudpickleWrapper
. (@shwang) - Fixed a bug with orthogonal initialization when
bias=False
in custom policy (@rk37) - Fixed approximate entropy calculation in PPO and A2C. (@AndyShih12)
- Fixed DQN target network sharing feature extractor with the main network.
- Fixed storing correct
dones
in on-policy algorithm rollout collection. (@AndyShih12) - Fixed number of filters in final convolutional layer in NatureCNN to match original implementation.
Others:
- Refactored off-policy algorithm to share the same
.learn()
method - Split the
collect_rollout()
method for off-policy algorithms - Added
_on_step()
for off-policy base class - Optimized replay buffer size by removing the need of
next_observations
numpy array - Optimized polyak updates (1.5-1.95 speedup) through inplace operations (@partiallytyped)
- Switch to
black
codestyle and addedmake format
,make check-codestyle
andcommit-checks
- Ignored errors from newer pytype version
- Added a check when using
gSDE
- Removed codacy dependency from Dockerfile
- Added
common.sb2_compat.RMSpropTFLike
optimizer, which corresponds closer to the implementation of RMSprop from Tensorflow.
Documentation:
- Updated notebook links
- Fixed a typo in the section of Enjoy a Trained Agent, in RL Baselines3 Zoo README. (@blurLake)
- Added Unity reacher to the projects page (@koulakis)
- Added PyBullet colab notebook
- Fixed typo in PPO example code (@joeljosephjin)
- Fixed typo in custom policy doc (@RaphaelWag)