Skip to content

Release v2.9.0 #629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 19, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion docs/misc/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@ Changelog
For download links, please look at `Github release page <https://github.com/hill-a/stable-baselines/releases>`_.


Pre-Release 2.9.0a0 (WIP)
Release 2.9.0 (2019-12-20)
--------------------------

*Reproducible results, automatic `VecEnv` wrapping, env checker and more usability improvements*

Breaking Changes:
^^^^^^^^^^^^^^^^^
- The `seed` argument has been moved from `learn()` method to model constructor
Expand Down Expand Up @@ -59,6 +61,9 @@ Others:
- Add pull request template
- Replaced redundant code in load_results (@jbulow)
- Minor PEP8 fixes in dqn.py (@justinkterry)
- Add a message to the assert in `PPO2`
- Update replay buffer doctring
- Fix `VecEnv` docstrings

Documentation:
^^^^^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@
license="MIT",
long_description=long_description,
long_description_content_type='text/markdown',
version="2.9.0a0",
version="2.9.0",
)

# python setup.py sdist
Expand Down
2 changes: 1 addition & 1 deletion stable_baselines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,4 @@
from stable_baselines.trpo_mpi import TRPO
del mpi4py

__version__ = "2.9.0a0"
__version__ = "2.9.0"
3 changes: 2 additions & 1 deletion stable_baselines/common/vec_env/dummy_vec_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ class DummyVecEnv(VecEnv):
multiprocess or multithread outweighs the environment computation time. This can also be used for RL methods that
require a vectorized environment, but that you want a single environments to train with.

:param env_fns: ([Gym Environment]) the list of environments to vectorize
:param env_fns: ([callable]) A list of functions that will create the environments
(each callable returns a `Gym.Env` instance when called).
"""

def __init__(self, env_fns):
Expand Down
3 changes: 2 additions & 1 deletion stable_baselines/common/vec_env/subproc_vec_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ class SubprocVecEnv(VecEnv):
``if __name__ == "__main__":`` block.
For more information, see the multiprocessing documentation.

:param env_fns: ([Gym Environment]) Environments to run in subprocesses
:param env_fns: ([callable]) A list of functions that will create the environments
(each callable returns a `Gym.Env` instance when called).
:param start_method: (str) method used to start the subprocesses.
Must be one of the methods returned by multiprocessing.get_all_start_methods().
Defaults to 'forkserver' on available platforms, and 'spawn' otherwise.
Expand Down
8 changes: 4 additions & 4 deletions stable_baselines/deepq/replay_buffer.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def __len__(self):

@property
def storage(self):
"""[(np.ndarray, float, float, np.ndarray, bool)]: content of the replay buffer"""
"""[(Union[np.ndarray, int], Union[np.ndarray, int], float, Union[np.ndarray, int], bool)]: content of the replay buffer"""
return self._storage

@property
Expand Down Expand Up @@ -52,10 +52,10 @@ def add(self, obs_t, action, reward, obs_tp1, done):
"""
add a new transition to the buffer

:param obs_t: (Any) the last observation
:param action: ([float]) the action
:param obs_t: (Union[np.ndarray, int]) the last observation
:param action: (Union[np.ndarray, int]) the action
:param reward: (float) the reward of the transition
:param obs_tp1: (Any) the current observation
:param obs_tp1: (Union[np.ndarray, int]) the current observation
:param done: (bool) is the episode done
"""
data = (obs_t, action, reward, obs_tp1, done)
Expand Down
6 changes: 5 additions & 1 deletion stable_baselines/ppo2/ppo2.py
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,11 @@ def learn(self, total_timesteps, callback=None, log_interval=1, tb_log_name="PPO

n_updates = total_timesteps // self.n_batch
for update in range(1, n_updates + 1):
assert self.n_batch % self.nminibatches == 0
assert self.n_batch % self.nminibatches == 0, ("The number of minibatches (`nminibatches`) "
"is not a factor of the total number of samples "
"collected per rollout (`n_batch`), "
"some samples won't be used."
)
batch_size = self.n_batch // self.nminibatches
t_start = time.time()
frac = 1.0 - (update - 1.0) / n_updates
Expand Down