Skip to content

Commit 00b2088

Browse files
author
Vincent Moens
committed
Update
[ghstack-poisoned]
2 parents 2defe3e + bc0f128 commit 00b2088

File tree

1 file changed

+8
-1
lines changed

1 file changed

+8
-1
lines changed

torchrl/envs/custom/llm.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,14 @@ class LLMEnv(EnvBase):
3131
integers representing a sequence of tokens.
3232
The action is also a string or a tensor of integers, which is concatenated to the previous observation to form the
3333
new observation.
34-
Prompts to the language model can be loaded when the environment is ``reset`` if the environment is created via :meth:`~from_dataloader`
34+
35+
By default, this environment is meant to track history for a prompt. Users can append transforms to tailor
36+
this to their use case, such as Chain of Thought (CoT) reasoning or other custom processing.
37+
38+
Users must append a transform to set the "done" condition, which would trigger the loading of the next prompt.
39+
40+
Prompts to the language model can be loaded when the environment is ``reset`` if the environment is created via :meth:`~from_dataloader`
41+
3542
Args:
3643
observation_key (NestedKey, optional): The key in the tensordict where the observation is stored. Defaults to
3744
``"observation"``.

0 commit comments

Comments
 (0)