Skip to content

Test-Step 4: Choose the best agent system_prompt #95

@zk1tty

Description

@zk1tty

Goal

increase the speed and accuracy of browser-use agent.

Issues

  1. analyze three different type of prompts
  • Input size(Time-to-First token) affects thinking time
  • Output size(Time-to-Last token) affects total response time.
Prompt Size Visible Output Fields Purpose
system_prompt 5k+ tokens {"thinking","evaluation_previous_goal", "memory", "next_goal", "action"} Full reasoning/debug mode
system_prompt_no_thinking 4.5k tokens {"evaluation_previous_goal", "memory", "next_goal", "action"} Production-safe with reasoning hidden
flash_mode Tiny (<300 tokens) {"memory", "action"} High-speed runtime mode

latency

Mode Input Tokens Output Tokens Time-to-First-Token Time-to-Last-Token Total
system_prompt ~5000 ~200 🐢 3× slower 🐢 2× slower 🐢 3-5× baseline
system_prompt_no_thinking ~4500 ~100 ⚙️ 2× slower ⚙️ 1.5× slower ⚙️ ~2× baseline
flash_mode ~300 ~60 ⚡ ~1× ⚡ ~1× ⚡ baseline

original agent input

<input>
At every step, your input will consist of: 
1. <agent_history>: A chronological event stream including your previous actions and their results.
2. <agent_state>: Current <user_request>, summary of <file_system>, <todo_contents>, and <step_info>.
3. <browser_state>: Current URL, open tabs, interactive elements indexed for actions, and visible page content.
4. <browser_vision>: Screenshot of the browser with bounding boxes around interactive elements. If you used screenshot before, this will contain a screenshot.
5. <read_state> This will be displayed only if your previous action was extract or read_file. This data is only shown in the current step.
</input> 

Q: When should we use flash_mode?

  1. understand the biggest bottleneck

    • speed: 5x faster by system_prompt -> flash_mode
    • accuracy: ??
  2. LLM's response time

About my first approach: "reduce each LLM call response time"

  • prompt(input tokens) costs linearly.

  • output generation:

  • where is this coming from??

	# 4. Create agent with all speed optimizations
	agent = Agent(
		task=task,
		llm=llm,
		flash_mode=True,  # Disables thinking in the LLM output for maximum speed
		browser_profile=browser_profile,
		extend_system_message=SPEED_OPTIMIZATION_PROMPT,
	)

https://github.com/browser-use/browser-use/blob/515f8e735d931149370ea5be33863a9bb6e22770/examples/getting_started/05_fast_agent.py

3 types of agent from browser-use

	def _load_prompt_template(self) -> None:
		"""Load the prompt template from the markdown file."""
		try:
			# Choose the appropriate template based on flash_mode and use_thinking settings
			if self.flash_mode:
				template_filename = 'system_prompt_flash.md'
			elif self.use_thinking:
				template_filename = 'system_prompt.md'
			else:
				template_filename = 'system_prompt_no_thinking.md'

at browser_Use/agent/prompts.py#L40-L49

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions