@@ -10,12 +10,12 @@ The features are available before an official release so that users and collabor
10
10
11
11
---
12
12
13
- ** TorchRL** is an open-source Reinforcement Learning (RL) library for PyTorch.
13
+ ** TorchRL** is an open-source Reinforcement Learning (RL) library for PyTorch.
14
14
15
- It provides pytorch and ** python-first** , low and high level abstractions for RL that are intended to be ** efficient** , ** modular** , ** documented** and properly ** tested** .
15
+ It provides pytorch and ** python-first** , low and high level abstractions for RL that are intended to be ** efficient** , ** modular** , ** documented** and properly ** tested** .
16
16
The code is aimed at supporting research in RL. Most of it is written in python in a highly modular way, such that researchers can easily swap components, transform them or write new ones with little effort.
17
17
18
- This repo attempts to align with the existing pytorch ecosystem libraries in that it has a dataset pillar ([ torchrl/envs] ( torchrl/envs ) ), [ transforms] ( torchrl/envs/transforms ) , [ models] ( torchrl/modules ) , data utilities (e.g. collectors and containers), etc.
18
+ This repo attempts to align with the existing pytorch ecosystem libraries in that it has a dataset pillar ([ torchrl/envs] ( torchrl/envs ) ), [ transforms] ( torchrl/envs/transforms ) , [ models] ( torchrl/modules ) , data utilities (e.g. collectors and containers), etc.
19
19
TorchRL aims at having as few dependencies as possible (python standard library, numpy and pytorch). Common environment libraries (e.g. OpenAI gym) are only optional.
20
20
21
21
On the low-level end, torchrl comes with a set of highly re-usable functionals for [ cost functions] ( torchrl/objectives/costs ) , [ returns] ( torchrl/objectives/returns ) and data processing.
@@ -25,19 +25,19 @@ TorchRL aims at (1) a high modularity and (2) good runtime performance.
25
25
## Features
26
26
27
27
On the high-level end, TorchRL provides:
28
- - [ ` TensorDict ` ] ( torchrl/data/tensordict/tensordict.py ) ,
29
- a convenient data structure<sup >(1)</sup > to pass data from
28
+ - [ ` TensorDict ` ] ( torchrl/data/tensordict/tensordict.py ) ,
29
+ a convenient data structure<sup >(1)</sup > to pass data from
30
30
one object to another without friction.
31
31
` TensorDict ` makes it easy to re-use pieces of code across environments, models and
32
32
algorithms. For instance, here's how to code a rollout in TorchRL:
33
33
<details >
34
34
<summary >Code</summary >
35
-
35
+
36
36
```diff
37
37
- obs, done = env.reset()
38
38
+ tensordict = env.reset()
39
39
policy = TensorDictModule(
40
- model,
40
+ model,
41
41
in_keys=["observation_pixels", "observation_vector"],
42
42
out_keys=["action"],
43
43
)
@@ -57,7 +57,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
57
57
TensorDict abstracts away the input / output signatures of the modules, env, collectors, replay buffers and losses of the library, allowing its primitives
58
58
to be easily recycled across settings.
59
59
Here's another example of an off-policy training loop in TorchRL (assuming that a data collector, a replay buffer, a loss and an optimizer have been instantiated):
60
-
60
+
61
61
```diff
62
62
- for i, (obs, next_obs, action, hidden_state, reward, done) in enumerate(collector):
63
63
+ for i, tensordict in enumerate(collector):
@@ -73,7 +73,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
73
73
optim.zero_grad()
74
74
```
75
75
Again, this training loop can be re-used across algorithms as it makes a minimal number of assumptions about the structure of the data.
76
-
76
+
77
77
TensorDict supports multiple tensor operations on its device and shape
78
78
(the shape of TensorDict, or its batch size, is the common arbitrary N first dimensions of all its contained tensors):
79
79
```python
@@ -96,11 +96,11 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
96
96
</details>
97
97
98
98
Check our [TensorDict tutorial](tutorials/tensordict.ipynb) for more information.
99
-
100
- - An associated [ ` TensorDictModule ` class] ( torchrl/modules/tensordict_module/common.py ) which is [ functorch] ( https://github.com/pytorch/functorch ) -compatible!
99
+
100
+ - An associated [ ` TensorDictModule ` class] ( torchrl/modules/tensordict_module/common.py ) which is [ functorch] ( https://github.com/pytorch/functorch ) -compatible!
101
101
<details >
102
102
<summary>Code</summary>
103
-
103
+
104
104
``` diff
105
105
transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
106
106
+ td_module = TensorDictModule(transformer_model, in_keys=["src", "tgt"], out_keys=["out"])
@@ -111,7 +111,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
111
111
+ td_module(tensordict)
112
112
+ out = tensordict["out"]
113
113
```
114
-
114
+
115
115
The `TensorDictSequential` class allows to branch sequences of `nn.Module` instances in a highly modular way.
116
116
For instance, here is an implementation of a transformer using the encoder and decoder blocks:
117
117
```python
@@ -123,7 +123,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
123
123
assert transformer.in_keys == ["src", "src_mask", "tgt"]
124
124
assert transformer.out_keys == ["memory", "output"]
125
125
```
126
-
126
+
127
127
`TensorDictSequential` allows to isolate subgraphs by querying a set of desired input / output keys:
128
128
```python
129
129
transformer.select_subsequence(out_keys=["memory"]) # returns the encoder
@@ -132,19 +132,19 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
132
132
</details>
133
133
134
134
The corresponding [tutorial](tutorials/tensordictmodule.ipynb) provides more context about its features.
135
-
136
- - a generic [trainer class](torchrl/trainers/trainers.py)<sup>(1)</sup> that
137
- executes the aforementioned training loop. Through a hooking mechanism,
135
+
136
+ - a generic [trainer class](torchrl/trainers/trainers.py)<sup>(1)</sup> that
137
+ executes the aforementioned training loop. Through a hooking mechanism,
138
138
it also supports any logging or data transformation operation at any given
139
139
time.
140
140
141
141
- A common [interface for environments](torchrl/envs)
142
- which supports common libraries (OpenAI gym, deepmind control lab, etc.)<sup>(1)</sup> and state-less execution (e.g. Model-based environments).
142
+ which supports common libraries (OpenAI gym, deepmind control lab, etc.)<sup>(1)</sup> and state-less execution (e.g. Model-based environments).
143
143
The [batched environments](torchrl/envs/vec_env.py) containers allow parallel execution<sup>(2)</sup>.
144
144
A common pytorch-first class of [tensor-specification class](torchrl/data/tensor_specs.py) is also provided.
145
145
<details>
146
146
<summary>Code</summary>
147
-
147
+
148
148
```python
149
149
env_make = lambda: GymEnv("Pendulum-v1", from_pixels=True)
150
150
env_parallel = ParallelEnv(4, env_make) # creates 4 envs in parallel
@@ -154,17 +154,17 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
154
154
```
155
155
</details>
156
156
157
- - multiprocess [data collectors](torchrl/collectors/collectors.py)<sup>(2)</sup> that work synchronously or asynchronously.
158
- Through the use of TensorDict, TorchRL's training loops are made very similar to regular training loops in supervised
157
+ - multiprocess [data collectors](torchrl/collectors/collectors.py)<sup>(2)</sup> that work synchronously or asynchronously.
158
+ Through the use of TensorDict, TorchRL's training loops are made very similar to regular training loops in supervised
159
159
learning (although the "dataloader" -- read data collector -- is modified on-the-fly):
160
160
<details>
161
161
<summary>Code</summary>
162
-
162
+
163
163
```python
164
164
env_make = lambda: GymEnv("Pendulum-v1", from_pixels=True)
165
165
collector = MultiaSyncDataCollector(
166
- [env_make, env_make],
167
- policy=policy,
166
+ [env_make, env_make],
167
+ policy=policy,
168
168
devices=["cuda:0", "cuda:0"],
169
169
total_frames=10000,
170
170
frames_per_batch=50,
@@ -182,7 +182,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
182
182
- efficient<sup>(2)</sup> and generic<sup>(1)</sup> [replay buffers](torchrl/data/replay_buffers/replay_buffers.py) with modularized storage:
183
183
<details>
184
184
<summary>Code</summary>
185
-
185
+
186
186
```python
187
187
storage = LazyMemmapStorage( # memory-mapped (physical) storage
188
188
cfg.buffer_size,
@@ -200,19 +200,19 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
200
200
```
201
201
</details>
202
202
203
- - cross-library [environment transforms](torchrl/envs/transforms/transforms.py)<sup>(1)</sup>,
204
- executed on device and in a vectorized fashion<sup>(2)</sup>,
203
+ - cross-library [environment transforms](torchrl/envs/transforms/transforms.py)<sup>(1)</sup>,
204
+ executed on device and in a vectorized fashion<sup>(2)</sup>,
205
205
which process and prepare the data coming out of the environments to be used by the agent:
206
206
<details>
207
207
<summary>Code</summary>
208
-
208
+
209
209
```python
210
210
env_make = lambda: GymEnv("Pendulum-v1", from_pixels=True)
211
211
env_base = ParallelEnv(4, env_make, device="cuda:0") # creates 4 envs in parallel
212
212
env = TransformedEnv(
213
- env_base,
213
+ env_base,
214
214
Compose(
215
- ToTensorImage(),
215
+ ToTensorImage(),
216
216
ObservationNorm(loc=0.5, scale=1.0)), # executes the transforms once and on device
217
217
)
218
218
tensordict = env.reset()
@@ -237,7 +237,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
237
237
- various [architectures](torchrl/modules/models/) and models (e.g. [actor-critic](torchrl/modules/tensordict_module/actors.py))<sup>(1)</sup>:
238
238
<details>
239
239
<summary>Code</summary>
240
-
240
+
241
241
```python
242
242
# create an nn.Module
243
243
common_module = ConvNet(
@@ -255,7 +255,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
255
255
out_keys=["hidden"],
256
256
)
257
257
# Wrap the policy module in NormalParamsWrapper, such that the output
258
- # tensor is split in loc and scale, and scale is mapped onto a positive space
258
+ # tensor is split in loc and scale, and scale is mapped onto a positive space
259
259
policy_module = NormalParamsWrapper(
260
260
MLP(
261
261
num_cells=[64, 64],
@@ -287,11 +287,11 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
287
287
```
288
288
</details>
289
289
290
- - exploration [wrappers](torchrl/modules/tensordict_module/exploration.py) and
290
+ - exploration [wrappers](torchrl/modules/tensordict_module/exploration.py) and
291
291
[modules](torchrl/modules/models/exploration.py) to easily swap between exploration and exploitation<sup>(1)</sup>:
292
292
<details>
293
293
<summary>Code</summary>
294
-
294
+
295
295
```python
296
296
policy_explore = EGreedyWrapper(policy)
297
297
with set_exploration_mode("random"):
@@ -301,22 +301,22 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
301
301
```
302
302
</details>
303
303
304
- - A series of efficient [loss modules](https://github.com/facebookresearch/rl/blob/main/torchrl/objectives/costs)
305
- and highly vectorized
306
- [functional return and advantage](https://github.com/facebookresearch/rl/blob/main/torchrl/objectives/returns/functional.py)
307
- computation.
304
+ - A series of efficient [loss modules](https://github.com/facebookresearch/rl/blob/main/torchrl/objectives/costs)
305
+ and highly vectorized
306
+ [functional return and advantage](https://github.com/facebookresearch/rl/blob/main/torchrl/objectives/returns/functional.py)
307
+ computation.
308
308
309
309
<details>
310
310
<summary>Code</summary>
311
-
311
+
312
312
### Loss modules
313
313
```python
314
314
from torchrl.objectives.costs import DQNLoss
315
315
loss_module = DQNLoss(value_network=value_network, gamma=0.99)
316
316
tensordict = replay_buffer.sample(batch_size)
317
317
loss = loss_module(tensordict)
318
318
```
319
-
319
+
320
320
### Advantage computation
321
321
```python
322
322
from torchrl.objectives.returns.functional import vec_td_lambda_return_estimate
@@ -325,7 +325,7 @@ algorithms. For instance, here's how to code a rollout in TorchRL:
325
325
326
326
</details>
327
327
328
- - various [recipes](torchrl/trainers/helpers/models.py) to build models that
328
+ - various [recipes](torchrl/trainers/helpers/models.py) to build models that
329
329
correspond to the environment being deployed.
330
330
331
331
# # Examples, tutorials and demos
@@ -339,11 +339,11 @@ A series of [examples](examples/) are provided with an illustrative purpose:
339
339
340
340
and many more to come!
341
341
342
- We also provide [tutorials and demos](tutorials) that give a sense of what the
342
+ We also provide [tutorials and demos](tutorials) that give a sense of what the
343
343
library can do.
344
344
345
345
# # Installation
346
- Create a conda environment where the packages will be installed.
346
+ Create a conda environment where the packages will be installed.
347
347
348
348
```
349
349
conda create --name torch_rl python=3.9
@@ -381,16 +381,16 @@ pip3 install ninja # Makes the build go faster
381
381
pip3 install "git+https://github.com/pytorch/functorch.git "
382
382
```
383
383
384
- If this fails, you can get the latest version of functorch that was marked to be
384
+ If this fails, you can get the latest version of functorch that was marked to be
385
385
compatible with the current torch version:
386
386
```bash
387
387
pip3 install ninja # Makes the build go faster
388
388
PYTORCH_VERSION=`python -c "import torch.version; print(torch.version.git_version)"`
389
389
pip3 install "git+https://github.com/pytorch/pytorch.git@$PYTORCH_VERSION#subdirectory=functorch"
390
390
```
391
391
392
- If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message
393
- ` (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')) ` appears,
392
+ If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message
393
+ ` (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')) ` appears,
394
394
try erasing the previously created build artifacts (` torchrl.egg-info/ ` , ` build/ ` , ` torchrl/_torchsl.so ` )
395
395
or re-clone the library from GitHub, then try
396
396
@@ -405,7 +405,7 @@ You can install the latest release by using
405
405
```
406
406
pip3 install torchrl
407
407
```
408
- This should work on linux and MacOs (not M1). For Windows and M1/M2 machines, one
408
+ This should work on linux and MacOs (not M1). For Windows and M1/M2 machines, one
409
409
should install the library locally (see below).
410
410
411
411
To install extra dependencies, call
@@ -414,9 +414,9 @@ pip3 install "torchrl[atari,dm_control,gym_continuous,rendering,tests,utils]"
414
414
```
415
415
or a subset of these.
416
416
417
- Alternatively, as the library is at an early stage, it may be wise to install
418
- it in develop mode as this will make it possible to pull the latest changes and
419
- benefit from them immediately.
417
+ Alternatively, as the library is at an early stage, it may be wise to install
418
+ it in develop mode as this will make it possible to pull the latest changes and
419
+ benefit from them immediately.
420
420
Start by cloning the repo:
421
421
```
422
422
git clone https://github.com/facebookresearch/rl
@@ -428,14 +428,14 @@ cd /path/to/torchrl/
428
428
python setup.py develop
429
429
```
430
430
431
- If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message
431
+ If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message
432
432
` (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')) ` appears, then try
433
433
434
434
```
435
435
ARCHFLAGS="-arch arm64" python setup.py develop
436
436
```
437
437
438
- To run a quick sanity check, leave that directory (e.g. by executing ` cd ~/ ` )
438
+ To run a quick sanity check, leave that directory (e.g. by executing ` cd ~/ ` )
439
439
and try to import the library.
440
440
```
441
441
python -c "import torchrl"
@@ -444,7 +444,7 @@ This should not return any warning or error.
444
444
445
445
** Optional dependencies**
446
446
447
- The following libraries can be installed depending on the usage one wants to
447
+ The following libraries can be installed depending on the usage one wants to
448
448
make of torchrl:
449
449
```
450
450
# diverse
@@ -454,7 +454,7 @@ pip3 install tqdm tensorboard "hydra-core>=1.1" hydra-submitit-launcher
454
454
pip3 install moviepy
455
455
456
456
# deepmind control suite
457
- pip3 install dm_control
457
+ pip3 install dm_control
458
458
459
459
# gym, atari games
460
460
pip3 install gym[atari] "gym[accept-rom-license]" pygame
@@ -471,19 +471,19 @@ pip3 install wandb
471
471
472
472
** Troubleshooting**
473
473
474
- If a ` ModuleNotFoundError: No module named ‘torchrl._torchrl ` errors occurs,
475
- it means that the C++ extensions were not installed or not found.
476
- One common reason might be that you are trying to import torchrl from within the
477
- git repo location. Indeed the following code snippet should return an error if
474
+ If a ` ModuleNotFoundError: No module named ‘torchrl._torchrl ` errors occurs,
475
+ it means that the C++ extensions were not installed or not found.
476
+ One common reason might be that you are trying to import torchrl from within the
477
+ git repo location. Indeed the following code snippet should return an error if
478
478
torchrl has not been installed in ` develop ` mode:
479
479
```
480
480
cd ~/path/to/rl/repo
481
481
python -c 'from torchrl.envs.libs.gym import GymEnv'
482
482
```
483
483
If this is the case, consider executing torchrl from another location.
484
484
485
- On ** MacOs** , we recommend installing XCode first.
486
- With Apple Silicon M1 chips, make sure you are using the arm64-built python
485
+ On ** MacOs** , we recommend installing XCode first.
486
+ With Apple Silicon M1 chips, make sure you are using the arm64-built python
487
487
(e.g. [ here] ( https://betterprogramming.pub/how-to-install-pytorch-on-apple-m1-series-512b3ad9bc6 ) ). Running the following lines of code
488
488
489
489
```
@@ -505,7 +505,7 @@ To train an algorithm it is therefore advised to use the predefined configuratio
505
505
```
506
506
python examples/ppo/ppo.py --config=examples/ppo/configs/humanoid.txt
507
507
```
508
- Note that using the config files requires the [ configargparse] ( https://pypi.org/project/ConfigArgParse/ ) library.
508
+ Note that using the config files requires the [ configargparse] ( https://pypi.org/project/ConfigArgParse/ ) library.
509
509
510
510
One can also overwrite the config parameters using flags, e.g.
511
511
```
0 commit comments