Skip to content

Commit a6f9111

Browse files
committed
amend
1 parent a171e32 commit a6f9111

File tree

2 files changed

+1
-2
lines changed

2 files changed

+1
-2
lines changed

sota-implementations/grpo/grpo_utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,7 @@ def make_env(cfg: DictConfig, devices: list[int] | None = None):
548548
AddThinkingPrompt(
549549
cond=lambda td: td["reward"] <= reward_threshold
550550
and td["step_count"] < max_steps,
551-
role="user",
551+
role="assistant",
552552
edit_last_turn=False,
553553
zero_reward=False,
554554
undo_done=True,

torchrl/envs/llm/datasets/ifeval.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88

99
import torch
1010
from tensordict import NonTensorData, NonTensorStack, TensorClass, TensorDict
11-
from torchrl._utils import logger as torchrl_logger
1211
from torchrl.data import Composite, NonTensor, Unbounded
1312
from torchrl.envs import StepCounter
1413
from torchrl.envs.llm.chat import DatasetChatEnv

0 commit comments

Comments
 (0)