-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: add token usage tracking for tool calls #3327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @lllllyh01 for the PR, the calculation logic needs to be fixed
camel/utils/tool_cost_calculator.py
Outdated
| output_tokens = self._count_tokens(output_text) | ||
|
|
||
| # Calculate total tokens | ||
| total_tokens = profile.base_tokens + input_tokens + output_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the token calculation might be unreliable. Could you clarify the reasoning for setting the base token count to 100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Wendong. The base_token was originally introduced to represent potential differences in system prompts and schemas between different tools. However, after reviewing this implementation, I think it might be over complex and hard to track. Therefore, I removed the base_token is the latest version, and rely on input and tool response to represent token usage.
camel/utils/tool_cost_calculator.py
Outdated
| self._initialize_default_profiles() | ||
| ) | ||
|
|
||
| def _initialize_default_profiles(self) -> Dict[str, ToolCostProfile]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this part is not necessary and hard to maintain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing it out. I've removed this part in the latest version.
camel/utils/tool_cost_calculator.py
Outdated
| # Extract text from common result fields | ||
| text_fields = ["content", "text", "message", "result", "output"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be able to extract all text information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched to token_counter.count_tokens_from_messages for token counting and this part is removed.
camel/utils/tool_cost_calculator.py
Outdated
| try: | ||
| return len(self.token_counter.encode(text)) | ||
| except Exception as e: | ||
| logger.error(f"Error counting tokens: {e}") | ||
| pass | ||
|
|
||
| return len(text.split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use token_counter.count_tokens_from_messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing it out. I omitted this function before. I've updated it to token_counter.count_tokens_from_messages in the latest version.
Besides, I see there's another token counting method in ChatAgent named _get_token_count. They both use token_counter of the backend model, but their input formats and implementations are slightly different. Can you provide some contexts about their differences and their respective suitable cases?
camel/utils/tool_cost_calculator.py
Outdated
| class ToolCategory(Enum): | ||
| """Categories of tools based on their typical token consumption.""" | ||
|
|
||
| # Low-cost tools (typically < 100 tokens) | ||
| SEARCH_API = "search_api" # Google Search, Bing Search, etc. | ||
| SIMPLE_UTILITY = ( | ||
| "simple_utility" # Basic file operations, simple calculations | ||
| ) | ||
|
|
||
| # Medium-cost tools (100-1000 tokens) | ||
| CODE_EXECUTION = "code_execution" # Python execution, shell commands | ||
| DOCUMENT_PROCESSING = "document_processing" # PDF parsing, text analysis | ||
| API_CALLS = "api_calls" # REST API calls, webhooks | ||
|
|
||
| # High-cost tools (1000+ tokens) | ||
| BROWSER_AUTOMATION = ( | ||
| "browser_automation" # Browser interactions, screenshots | ||
| ) | ||
| MULTIMODAL_PROCESSING = ( | ||
| "multimodal_processing" # Image analysis, audio processing | ||
| ) | ||
| LLM_CALLS = "llm_calls" # Sub-agent calls, complex reasoning | ||
|
|
||
|
|
||
| @dataclass | ||
| class ToolCostProfile: | ||
| """Cost profile for a specific tool type.""" | ||
|
|
||
| category: ToolCategory | ||
| base_tokens: int # Base token consumption | ||
|
|
||
|
|
||
| class ToolCostInfo(TypedDict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think these classes may not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, thank you for pointing it out. I've removed this part in the latest version.
camel/utils/tool_cost_calculator.py
Outdated
| logger.error(f"Error counting tokens: {e}") | ||
| pass | ||
|
|
||
| return len(text.split()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the justification for using len(text.split()) as a fallback measure for token counting, are there any references showing the relation between word count and token count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing. It refers ChatAgent._get_token_count.
In the meantime, I see there's another token counting method token_counter.count_tokens_from_messages. They both use token_counter of the backend model, but their input formats and implementations are slightly different. Can you provide some contexts about their differences and their respective suitable cases?
camel/agents/chat_agent.py
Outdated
|
|
||
| # Record information about this tool call | ||
| # Calculate tool cost and token usage | ||
| cost_info = self._tool_cost_calculator.estimate_tool_cost( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the token cost calculation happens after every tool call, could be worth including some error handling in case exceptions are returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Waleed. I've added error handling in _calculate_tool_cost method in the latest version.
camel/utils/tool_cost_calculator.py
Outdated
| def _initialize_default_profiles(self) -> Dict[str, ToolCostProfile]: | ||
| """Initialize default cost profiles for common tool types.""" | ||
| return { | ||
| "search_google": ToolCostProfile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you provide the justification for the base_token estimates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Waleed. The base_token was originally introduced to represent potential differences in system prompts and schemas between different tools. However, after reviewing this implementation, I think it might be over complex and hard to track. Therefore, I removed the base_token is the latest version, and rely on input and tool response to represent token usage.
a23b7e4 to
80334e3
Compare
Description
Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).
Fixes #3219Currently,
ChatAgentonly records token usage on step level. This PR adds finer-grained token usage tracking for each tool calls in one step.Checklist
Go over all the following points, and put an
xin all the boxes that apply.Fixes #issue-numberin the PR description (required)pyproject.tomlanduv lockIf you are unsure about any of these, don't hesitate to ask. We are here to help!