|
1 |
| -.. currentmodule:: torchrl.collectors |
| 1 | +from torchrl.collectors import SyncDataCollector.. currentmodule:: torchrl.collectors |
2 | 2 |
|
3 | 3 | torchrl.collectors package
|
4 | 4 | ==========================
|
@@ -228,6 +228,36 @@ Using replay buffers that sample trajectories with :class:`~torchrl.collectors.M
|
228 | 228 | isn't currently fully supported as the data batches can come from any worker and in most cases consecutive
|
229 | 229 | batches written in the buffer won't come from the same source (thereby interrupting the trajectories).
|
230 | 230 |
|
| 231 | +Running the Collector Asynchronously |
| 232 | +------------------------------------ |
| 233 | + |
| 234 | +Passing replay buffers to a collector allows us to start the collection and get rid of the iterative nature of the |
| 235 | +collector. |
| 236 | +If you want to run a data collector in the background, simply run :meth:`~torchrl.DataCollectorBase.start`: |
| 237 | + |
| 238 | + >>> collector = SyncDataCollector(..., replay_buffer=rb) # pass your replay buffer |
| 239 | + >>> collector.start() |
| 240 | + >>> # little pause |
| 241 | + >>> time.sleep(10) |
| 242 | + >>> # Start training |
| 243 | + >>> for i in range(optim_steps): |
| 244 | + ... data = rb.sample() # Sampling from the replay buffer |
| 245 | + ... # rest of the training loop |
| 246 | + |
| 247 | +Single-process collectors (:class:`~torchrl.collectors.SyncDataCollector`) will run the process using multithreading, |
| 248 | +so be mindful of Python's GIL and related multithreading restrictions. |
| 249 | + |
| 250 | +Multiprocessed collectors will on the other hand let the child processes handle the filling of the buffer on their own, |
| 251 | +which truly decouples the data collection and training. |
| 252 | + |
| 253 | +Data collectors that have been started with `start()` should be shut down using |
| 254 | +:meth:`~torchrl.DataCollectorBase.async_shutdown`. |
| 255 | + |
| 256 | +.. warning:: Running a collector asynchronously decouples the collection from training, which means that the training |
| 257 | + performance may be drastically different depending on the hardware, load and other factors (although it is generally |
| 258 | + expected to provide significant speed-ups). Make sure you understand how this may affect your algorithm and if it |
| 259 | + is a legitimate thing to do! (For example, on-policy algorithms such as PPO should not be run asynchronously |
| 260 | + unless properly benchmarked). |
231 | 261 |
|
232 | 262 | Single node data collectors
|
233 | 263 | ---------------------------
|
|
0 commit comments