You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reward_threshold (float): Reward threshold for triggering the transform.
30
+
max_steps (int): Maximum number of steps allowed.
31
+
zero_reward (bool): Whether to zero out the reward when the prompt is added.
32
+
undo_done (bool): Whether to undo the done flag when the prompt is added.
33
+
"""
34
+
35
+
# Different prompt strategies for different scenarios
36
+
PROMPT_STRATEGIES= {
37
+
"user_guidance": [
38
+
"I notice your response doesn't follow the required format. Please provide your thinking between <think> and </think> tags, and your final answer between <answer> and </answer> tags.",
39
+
"Your response needs to be structured properly. First think through the problem in <think> tags, then give your answer in <answer> tags.",
40
+
"Please reconsider your response. Remember to use <think> tags for your reasoning and <answer> tags for your final response.",
41
+
],
42
+
"format_reminder": [
43
+
"Remember to use the correct format: <think>your reasoning</think><answer>your answer</answer>",
44
+
"Please structure your response with <think> and <answer> tags as instructed.",
45
+
"Your response should follow this format: <think>...</think><answer>...</answer>",
46
+
],
47
+
"quality_hint": [
48
+
"Let me help you improve your response. Think about this more carefully and provide a better answer.",
49
+
"Your response could be better. Take a moment to reconsider and provide a more thoughtful answer.",
50
+
"I think you can do better. Please think through this more carefully.",
51
+
],
52
+
"thinking": [
53
+
"But wait, let me think about this more carefully...",
54
+
"Actually, let me reconsider this...",
55
+
"Let me think about it step by step...",
56
+
"Wait, I need to double-check my reasoning...",
57
+
],
58
+
"step_by_step": [
59
+
"Let me break this down step by step and think more carefully...",
60
+
"I should approach this systematically. Let me think through each part...",
61
+
"Let me reconsider this by going through it step by step...",
0 commit comments