-
Notifications
You must be signed in to change notification settings - Fork 159
Open
Description
in `nes_env.py` in `nes_py`, function `step` return 4 params
Returns:
a tuple of:
- state (np.ndarray): next frame as a result of the given action
- reward (float) : amount of reward returned after given action
- done (boolean): whether the episode has ended
- info (dict): contains auxiliary diagnostic information
but in `core.py` in `gym`, function `step` return 5 params
Returns:
observation (object): this will be an element of the environment's :attr:`observation_space`.
This may, for instance, be a numpy array containing the positions and velocities of certain objects.
reward (float): The amount of reward returned as a result of taking the action.
terminated (bool): whether a `terminal state` (as defined under the MDP of the task) is reached.
In this case further step() calls could return undefined results.
truncated (bool): whether a truncation condition outside the scope of the MDP is satisfied.
Typically a timelimit, but could also be used to indicate agent physically going out of bounds.
Can be used to end the episode prematurely before a `terminal state` is reached.
info (dictionary): `info` contains auxiliary diagnostic information (helpful for debugging, learning, and logging).
This might, for instance, contain: metrics that describe the agent's performance state, variables that are
hidden from observations, or individual reward terms that are combined to produce the total reward.
It also can contain information that distinguishes truncation and termination, however this is deprecated in favour
of returning two booleans, and will be removed in a future version.
EtorArza, DannyLuna17, Montinger, taskinc, ArjunArasappan and 1 more
Metadata
Metadata
Assignees
Labels
No labels