You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -80,7 +80,6 @@ The Parallel API is similar to Gymnasium, with the difference that actions, rewa
80
80
81
81
env.close()
82
82
83
-
84
83
.. note::
85
84
86
85
We can convert between environments with AEC and Parallel API using
@@ -89,6 +88,35 @@ The Parallel API is similar to Gymnasium, with the difference that actions, rewa
89
88
Moreover, we can convert PettingZoo environments in which all agents share the same action and observation spaces to
90
89
a vectorized Gymnasium environment that concatenate all the actions, observations and other infos using `SuperSuit wrappers <https://github.com/Farama-Foundation/SuperSuit/blob/master/supersuit/vector/vector_constructors.py>`_. This way, we can use ML libraries that works with Gymanasium to train distributed multi-agent systems.
91
90
91
+
TorchRL
92
+
-------
93
+
94
+
`TorchRL <https://docs.pytorch.org/rl/stable/index.html>`_ is an open-source Reinforcement Learning (RL) library for PyTorch.
95
+
96
+
TorchRL environments are based on the same Markov Decision Process cycle but with a different API: ``environment.step`` input and output are both dictionaries from `tensordict <https://docs.pytorch.org/tensordict/stable/index.html>`_ that holds actions, observations, rewards, ... in separate keys.
97
+
98
+
TorchRL environment can be constructed from Gymnasium and PettingZoo environments (among others).
99
+
The following is a cycle in TorchRL similar to the previous ones.
100
+
Note that TorchRL policies also operates on dictionaries for tensors (input and output).
101
+
102
+
.. code-block:: python
103
+
104
+
from torchrl.envs import GymEnv
105
+
106
+
environment = GymEnv("MyEnviroment")
107
+
environment.set_seed(0)
108
+
td = environment.reset()
109
+
110
+
for _ inrange(1000):
111
+
td = my_torchrl_policy(td)
112
+
td = environment.step(td)
113
+
114
+
if td['next', 'terminated'] or td['next', 'truncated']:
115
+
td = environment.reset()
116
+
117
+
environment.close()
118
+
119
+
One important difference between PettingZoo and TorchRL environments is that agents can be grouped together. For examples, in an environment with 2 green agents and 2 blue agents (where same-colored agents would share the same type of actions and observations), the dictionary ``td`` in the example above would have keys like ``("green", "next", "observation")`` and ``("blue", "next", "observation")`` that hold tensors with the observation form *both* agents of the same color.
92
120
93
121
Navground
94
122
---------
@@ -166,6 +194,25 @@ By specifying
166
194
- :py:class:`.ControlActionConfig` where the policy outputs a control command
167
195
- :py:class:`.ModulationActionConfig` where the policy outputs parameters of an underlying deterministic navigation behavior.
168
196
197
+
For example, to create a single a single-agent environment:
198
+
199
+
.. code-block:: python
200
+
201
+
import gymnasium as gym
202
+
from navground import sim
203
+
from navground.learning import DefaultObservationConfig, ControlActionConfig
204
+
from navground.learning.rewards import SocialReward
@@ -174,15 +221,56 @@ Similarly, :py:class:`.parallel_env.MultiAgentNavgroundEnv` provides a environme
174
221
:py:func:`.parallel_env.parallel_env` instantiate an environment where different agents may use different configurations (such as action spaces, rewards, ...), while
175
222
:py:func:`.parallel_env.shared_parallel_env` instantiate an environment where all specified agents share the same configuration.
176
223
224
+
.. code-block:: python
225
+
226
+
import gymnasium as gym
227
+
from navground import sim
228
+
from navground.learning.parallel_env import shared_parallel_env
229
+
from navground.learning import DefaultObservationConfig, ControlActionConfig
230
+
from navground.learning.rewards import SocialReward
231
+
232
+
penv = shared_parallel_env(scenario=scenario,
233
+
sensor=sensor,
234
+
action=ControlActionConfig(),
235
+
observation=DefaultObservationConfig(),
236
+
reward=SocialReward(),
237
+
time_step=0.1,
238
+
max_episode_steps=600)
239
+
240
+
177
241
The rest of the functionality is very similar to the Gymnasium Environment (and in fact, they share the same base class), but conform to the PettingZoo API instead.
178
242
243
+
For example, to create a single a multi-agent environment, where all agents share the same configuration:
244
+
245
+
246
+
TorchRL Navground Environment
247
+
-----------------------------
248
+
249
+
Navground and TorchRL both support PettingZoo environments, therefore it is is straightforward to create TorchRL environments with navground components:
250
+
251
+
.. code-block:: python
252
+
253
+
from torchrl.envs.libs.pettingzoo import PettingZooWrapper
254
+
from navground.learning.parallel_env import shared_parallel_env
255
+
from navground.learning.wrappers.name_wrapper import NameWrapper
256
+
257
+
penv = shared_parallel_env(...)
258
+
env = PettingZooWrapper(NameWrapper(penv),
259
+
categorical_actions=False,
260
+
device='cpu',
261
+
seed=0,
262
+
return_state=penv.has_state)
263
+
264
+
:py:class:`.wrappers.name_wrapper.NameWrapper` converts from an environment where agents are indexed by integers to one where they are indexed by strings, which TorchRL requires.
265
+
266
+
Function :py:func:`.utils.benchmarl.make_env` provides the same functionality.
179
267
180
268
Train ML policies in navground
181
269
==============================
182
270
183
271
.. note::
184
272
185
-
Have a look at the tutorials to see the interaction between gymnasium and navground in action and how to use it to train a navigation policy using IL or RL.
273
+
Have a look at the :doc:`tutorials<tutorials/index>` to see the interaction between gymnasium and navground in action and how to use it to train a navigation policy using IL or RL.
186
274
187
275
Imitation Learning
188
276
------------------
@@ -258,6 +346,33 @@ We instantiate the parallel environment using :py:func:`.parallel_env.shared_par
258
346
psac.save("PSAC")
259
347
260
348
349
+
Multi-agent Reinforcement Learning with BenchMARL
350
+
-------------------------------------------------
351
+
352
+
`BenchMARL <https://github.com/facebookresearch/BenchMARL>`_ provides implementation of Multi-agent Reinforcement Learning algorithms that extend behind the parallel Multi-agent Reinforcement Learning family just described. They can tackle problems that features heterogeneous agents, which therefore do not share the same policy and cannot be "stacked" together. Moreover, MARL-specific algorithm are designed to reduce instabilities that arise in multi-agent training, where agents learn policies among other agents that keep evolving their behavior.
353
+
354
+
We provide utilities that simplify training navigation policies with BenchMARL, like for example, using the multi-agent version of SAC:
355
+
356
+
.. code-block:: python
357
+
358
+
from navground.learning.parallel_env import shared_parallel_env
359
+
from benchmarl.algorithms import MasacConfig
360
+
from benchmarl.models.mlp import MlpConfig
361
+
from benchmarl.experiment import ExperimentConfig
362
+
from navground.learning.utils.benchmarl import NavgroundExperiment
that transforms a TorchRL policy to a PyTorch policy.
418
+
419
+
297
420
In practice, we do not need to perform the configuration manually. Instead, we can load it from a YAML file (exported e.g. using :py:func:`.io.export_policy_as_behavior`), like common in navground:
298
421
299
422
.. code-block:: YAML
@@ -350,7 +473,7 @@ Acknowledgement and disclaimer
350
473
351
474
The work was supported in part by `REXASI-PRO <https://rexasi-pro.spindoxlabs.com>`_ H-EU project, call HORIZON-CL4-2021-HUMAN-01-01, Grant agreement no. 101070028.
Copy file name to clipboardExpand all lines: docs/source/tutorials/basics/Gymnasium.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -506,7 +506,7 @@
506
506
"which we can use to define a policy that simply ask the agent to actuate the action that it has already computed. \n",
507
507
"\n",
508
508
"Let us collect the reward from the navground policy in this way. \n",
509
-
"To understand the scale, in this case, the reward assigned at each step in maximal (=0) when the agents move straight towards the goal at optimal speed. When the entire safety margin is violated, it gets a penality of at most -1, the same value it gets if it stay in-place, while moving in the opposite target direction at optimal speed gets a -2. Therefore, we can expect an average reward between -1 and 0, and possibly near to 0 for a well-performing navigation behavior."
509
+
"To understand the scale, in this case, the reward assigned at each step in maximal (=0) when the agents move straight towards the goal at optimal speed. When the entire safety margin is violated, it gets a penalty of at most -1, the same value it gets if it stay in-place, while moving in the opposite target direction at optimal speed gets a -2. Therefore, we can expect an average reward between -1 and 0, and possibly near to 0 for a well-performing navigation behavior."
Copy file name to clipboardExpand all lines: docs/source/tutorials/crossing/Training-MA.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -2825,7 +2825,7 @@
2825
2825
"id": "fad18282-974b-415b-abaf-888baa42461e",
2826
2826
"metadata": {},
2827
2827
"source": [
2828
-
"Training in `penv` requires significanlty more steps than in `env` but takes a much shorter time. A similar number of steps per agent are required (about 30K)."
2828
+
"Training in `penv` requires significantly more steps than in `env` but takes a much shorter time. A similar number of steps per agent are required (about 30K)."
Copy file name to clipboardExpand all lines: docs/source/tutorials/pad/Communication/Comm-SAC-Split.ipynb
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
"source": [
8
8
"# Distributed policy with comm trained using parallel SAC: split model\n",
9
9
"\n",
10
-
"In this notebook, we test a variant of :doc:`Comm-SAC`, where we split the communication and acceleration models in a way that the transmitted communication *does not* depend on the received message but only on the other observations. \n",
10
+
"In this notebook, we test a variant of [the previous notebook](./Comm-SAC.ipynb), where we split the communication and acceleration models in a way that the transmitted communication *does not* depend on the received message but only on the other observations. \n",
11
11
"\n",
12
12
"The agents exchange 1 float and we share part of the reward."
0 commit comments