Skip to content

Added DQN and DDPG, bug fixes and performance matching for Atari games

Pre-release
Pre-release
Compare
Choose a tag to compare
@araffin araffin released this 03 Aug 20:42
· 450 commits to master since this release
cceffd5

Breaking Changes:

  • AtariWrapper and other Atari wrappers were updated to match SB2 ones
  • save_replay_buffer now receives as argument the file path instead of the folder path (@tirafesi)
  • Refactored Critic class for TD3 and SAC, it is now called ContinuousCritic
    and has an additional parameter n_critics
  • SAC and TD3 now accept an arbitrary number of critics (e.g. policy_kwargs=dict(n_critics=3))
    instead of only 2 previously

New Features:

  • Added DQN Algorithm (@Artemis-Skade)
  • Buffer dtype is now set according to action and observation spaces for ReplayBuffer
  • Added warning when allocation of a buffer may exceed the available memory of the system
    when psutil is available
  • Saving models now automatically creates the necessary folders and raises appropriate warnings (@partiallytyped)
  • Refactored opening paths for saving and loading to use strings, pathlib or io.BufferedIOBase (@partiallytyped)
  • Added DDPG algorithm as a special case of TD3.
  • Introduced BaseModel abstract parent for BasePolicy, which critics inherit from.

Bug Fixes:

  • Fixed a bug in the close() method of SubprocVecEnv, causing wrappers further down in the wrapper stack to not be closed. (@NeoExtended)
  • Fix target for updating q values in SAC: the entropy term was not conditioned by terminals states
  • Use cloudpickle.load instead of pickle.load in CloudpickleWrapper. (@shwang)
  • Fixed a bug with orthogonal initialization when bias=False in custom policy (@rk37)
  • Fixed approximate entropy calculation in PPO and A2C. (@AndyShih12)
  • Fixed DQN target network sharing feature extractor with the main network.
  • Fixed storing correct dones in on-policy algorithm rollout collection. (@AndyShih12)
  • Fixed number of filters in final convolutional layer in NatureCNN to match original implementation.

Others:

  • Refactored off-policy algorithm to share the same .learn() method
  • Split the collect_rollout() method for off-policy algorithms
  • Added _on_step() for off-policy base class
  • Optimized replay buffer size by removing the need of next_observations numpy array
  • Optimized polyak updates (1.5-1.95 speedup) through inplace operations (@partiallytyped)
  • Switch to black codestyle and added make format, make check-codestyle and commit-checks
  • Ignored errors from newer pytype version
  • Added a check when using gSDE
  • Removed codacy dependency from Dockerfile
  • Added common.sb2_compat.RMSpropTFLike optimizer, which corresponds closer to the implementation of RMSprop from Tensorflow.

Documentation:

  • Updated notebook links
  • Fixed a typo in the section of Enjoy a Trained Agent, in RL Baselines3 Zoo README. (@blurLake)
  • Added Unity reacher to the projects page (@koulakis)
  • Added PyBullet colab notebook
  • Fixed typo in PPO example code (@joeljosephjin)
  • Fixed typo in custom policy doc (@RaphaelWag)