feat: add token usage tracking for tool calls #3327

lllllyh01 · 2025-10-24T09:06:54Z

Description

Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).
Fixes #3219
Currently, ChatAgent only records token usage on step level. This PR adds finer-grained token usage tracking for each tool calls in one step.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

I have read the CONTRIBUTION guide (required)
I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
I have updated the tests accordingly (required for a bug fix or a new feature)
I have updated the documentation if needed:
I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

coderabbitai · 2025-10-24T09:07:34Z

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch lyh_tool_token

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Wendong-Fan

thanks @lllllyh01 for the PR, the calculation logic needs to be fixed

Wendong-Fan · 2025-10-25T15:41:01Z

camel/utils/tool_cost_calculator.py

+        output_tokens = self._count_tokens(output_text)
+
+        # Calculate total tokens
+        total_tokens = profile.base_tokens + input_tokens + output_tokens


It seems the token calculation might be unreliable. Could you clarify the reasoning for setting the base token count to 100?

Thanks Wendong. The base_token was originally introduced to represent potential differences in system prompts and schemas between different tools. However, after reviewing this implementation, I think it might be over complex and hard to track. Therefore, I removed the base_token is the latest version, and rely on input and tool response to represent token usage.

Wendong-Fan · 2025-10-25T15:41:23Z

camel/utils/tool_cost_calculator.py

+            self._initialize_default_profiles()
+        )
+
+    def _initialize_default_profiles(self) -> Dict[str, ToolCostProfile]:


i think this part is not necessary and hard to maintain

Thank you for pointing it out. I've removed this part in the latest version.

Wendong-Fan · 2025-10-25T15:42:18Z

camel/utils/tool_cost_calculator.py

+            # Extract text from common result fields
+            text_fields = ["content", "text", "message", "result", "output"]


would this be able to extract all text information?

I've switched to token_counter.count_tokens_from_messages for token counting and this part is removed.

Wendong-Fan · 2025-10-25T15:43:05Z

camel/utils/tool_cost_calculator.py

+            try:
+                return len(self.token_counter.encode(text))
+            except Exception as e:
+                logger.error(f"Error counting tokens: {e}")
+                pass
+
+        return len(text.split())


why not use token_counter.count_tokens_from_messages

Thank you for pointing it out. I omitted this function before. I've updated it to token_counter.count_tokens_from_messages in the latest version.

Besides, I see there's another token counting method in ChatAgent named _get_token_count. They both use token_counter of the backend model, but their input formats and implementations are slightly different. Can you provide some contexts about their differences and their respective suitable cases?

Wendong-Fan · 2025-10-25T15:43:56Z

camel/utils/tool_cost_calculator.py

+class ToolCategory(Enum):
+    """Categories of tools based on their typical token consumption."""
+
+    # Low-cost tools (typically < 100 tokens)
+    SEARCH_API = "search_api"  # Google Search, Bing Search, etc.
+    SIMPLE_UTILITY = (
+        "simple_utility"  # Basic file operations, simple calculations
+    )
+
+    # Medium-cost tools (100-1000 tokens)
+    CODE_EXECUTION = "code_execution"  # Python execution, shell commands
+    DOCUMENT_PROCESSING = "document_processing"  # PDF parsing, text analysis
+    API_CALLS = "api_calls"  # REST API calls, webhooks
+
+    # High-cost tools (1000+ tokens)
+    BROWSER_AUTOMATION = (
+        "browser_automation"  # Browser interactions, screenshots
+    )
+    MULTIMODAL_PROCESSING = (
+        "multimodal_processing"  # Image analysis, audio processing
+    )
+    LLM_CALLS = "llm_calls"  # Sub-agent calls, complex reasoning
+
+
+@dataclass
+class ToolCostProfile:
+    """Cost profile for a specific tool type."""
+
+    category: ToolCategory
+    base_tokens: int  # Base token consumption
+
+
+class ToolCostInfo(TypedDict):


i think these classes may not necessary

Indeed, thank you for pointing it out. I've removed this part in the latest version.

waleedalzarooni · 2025-10-29T10:35:07Z

camel/utils/tool_cost_calculator.py

+                logger.error(f"Error counting tokens: {e}")
+                pass
+
+        return len(text.split())


What's the justification for using len(text.split()) as a fallback measure for token counting, are there any references showing the relation between word count and token count

Thank you for reviewing. It refers ChatAgent._get_token_count.
In the meantime, I see there's another token counting method token_counter.count_tokens_from_messages. They both use token_counter of the backend model, but their input formats and implementations are slightly different. Can you provide some contexts about their differences and their respective suitable cases?

waleedalzarooni · 2025-10-29T10:39:21Z

camel/agents/chat_agent.py


-        # Record information about this tool call
+        # Calculate tool cost and token usage
+        cost_info = self._tool_cost_calculator.estimate_tool_cost(


Since the token cost calculation happens after every tool call, could be worth including some error handling in case exceptions are returned

Thanks Waleed. I've added error handling in _calculate_tool_cost method in the latest version.

waleedalzarooni · 2025-10-29T10:41:36Z

camel/utils/tool_cost_calculator.py

+    def _initialize_default_profiles(self) -> Dict[str, ToolCostProfile]:
+        """Initialize default cost profiles for common tool types."""
+        return {
+            "search_google": ToolCostProfile(


Could you provide the justification for the base_token estimates

Thanks Waleed. The base_token was originally introduced to represent potential differences in system prompts and schemas between different tools. However, after reviewing this implementation, I think it might be over complex and hard to track. Therefore, I removed the base_token is the latest version, and rely on input and tool response to represent token usage.

yh 林 added 3 commits October 24, 2025 01:29

init detailed tool calling token trajactore

e959bab

remove useless fields

df2b175

add comments

989560a

github-actions bot added the Review Required PR need to be reviewed label Oct 24, 2025

lllllyh01 mentioned this pull request Oct 24, 2025

[Feature Request] Tracking token usage for each tool call in a single step of ChatAgent #3219

Open

Wendong-Fan requested review from Saedbhati and waleedalzarooni October 25, 2025 14:44

Wendong-Fan requested changes Oct 25, 2025

View reviewed changes

Wendong-Fan assigned lllllyh01 Oct 25, 2025

Wendong-Fan added Waiting for Update PR has been reviewed, need to be updated based on review comment and removed Review Required PR need to be reviewed labels Oct 25, 2025

Wendong-Fan modified the milestones: Sprint 41, Sprint 40 Oct 27, 2025

waleedalzarooni reviewed Oct 29, 2025

View reviewed changes

format

80334e3

lllllyh01 force-pushed the lyh_tool_token branch from a23b7e4 to 80334e3 Compare October 30, 2025 17:23

clean debug

2510e06

		# Extract text from common result fields
		text_fields = ["content", "text", "message", "result", "output"]

feat: add token usage tracking for tool calls #3327

Are you sure you want to change the base?

feat: add token usage tracking for tool calls #3327

Conversation

lllllyh01 commented Oct 24, 2025

Description

Checklist

Uh oh!

coderabbitai bot commented Oct 24, 2025

Review skipped

Uh oh!

Wendong-Fan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lllllyh01 Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lllllyh01 Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lllllyh01 Nov 1, 2025 •

edited

Loading

lllllyh01 Nov 1, 2025 •

edited

Loading