-
Couldn't load subscription status.
- Fork 15
Open
Labels
Description
Goal
increase the speed and accuracy of browser-use agent.
Issues
- analyze three different type of prompts
- Input size(Time-to-First token) affects thinking time
- Output size(Time-to-Last token) affects total response time.
| Prompt | Size | Visible Output Fields | Purpose |
|---|---|---|---|
| system_prompt | 5k+ tokens | {"thinking","evaluation_previous_goal", "memory", "next_goal", "action"} | Full reasoning/debug mode |
| system_prompt_no_thinking | 4.5k tokens | {"evaluation_previous_goal", "memory", "next_goal", "action"} | Production-safe with reasoning hidden |
| flash_mode | Tiny (<300 tokens) | {"memory", "action"} | High-speed runtime mode |
latency
| Mode | Input Tokens | Output Tokens | Time-to-First-Token | Time-to-Last-Token | Total |
|---|---|---|---|---|---|
| system_prompt | ~5000 | ~200 | 🐢 3× slower | 🐢 2× slower | 🐢 3-5× baseline |
| system_prompt_no_thinking | ~4500 | ~100 | ⚙️ 2× slower | ⚙️ 1.5× slower | ⚙️ ~2× baseline |
| flash_mode | ~300 | ~60 | ⚡ ~1× | ⚡ ~1× | ⚡ baseline |
original agent input
<input>
At every step, your input will consist of:
1. <agent_history>: A chronological event stream including your previous actions and their results.
2. <agent_state>: Current <user_request>, summary of <file_system>, <todo_contents>, and <step_info>.
3. <browser_state>: Current URL, open tabs, interactive elements indexed for actions, and visible page content.
4. <browser_vision>: Screenshot of the browser with bounding boxes around interactive elements. If you used screenshot before, this will contain a screenshot.
5. <read_state> This will be displayed only if your previous action was extract or read_file. This data is only shown in the current step.
</input> Q: When should we use flash_mode?
-
understand the biggest bottleneck
- speed: 5x faster by
system_prompt->flash_mode - accuracy: ??
- speed: 5x faster by
-
LLM's response time
About my first approach: "reduce each LLM call response time"
-
prompt(input tokens) costs linearly.
-
output generation:
-
where is this coming from??
# 4. Create agent with all speed optimizations
agent = Agent(
task=task,
llm=llm,
flash_mode=True, # Disables thinking in the LLM output for maximum speed
browser_profile=browser_profile,
extend_system_message=SPEED_OPTIMIZATION_PROMPT,
)3 types of agent from browser-use
def _load_prompt_template(self) -> None:
"""Load the prompt template from the markdown file."""
try:
# Choose the appropriate template based on flash_mode and use_thinking settings
if self.flash_mode:
template_filename = 'system_prompt_flash.md'
elif self.use_thinking:
template_filename = 'system_prompt.md'
else:
template_filename = 'system_prompt_no_thinking.md'