Skip to content

Conversation

@lllllyh01
Copy link
Collaborator

Description

Describe your changes in detail (optional if the linked issue already contains a detailed description of the changes).
Fixes #3219
Currently, ChatAgent only records token usage on step level. This PR adds finer-grained token usage tracking for each tool calls in one step.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have read the CONTRIBUTION guide (required)
  • I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • I have checked if any dependencies need to be added or updated in pyproject.toml and uv lock
  • I have updated the tests accordingly (required for a bug fix or a new feature)
  • I have updated the documentation if needed:
  • I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

@github-actions github-actions bot added the Review Required PR need to be reviewed label Oct 24, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 24, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lyh_tool_token

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @lllllyh01 for the PR, the calculation logic needs to be fixed

output_tokens = self._count_tokens(output_text)

# Calculate total tokens
total_tokens = profile.base_tokens + input_tokens + output_tokens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the token calculation might be unreliable. Could you clarify the reasoning for setting the base token count to 100?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Wendong. The base_token was originally introduced to represent potential differences in system prompts and schemas between different tools. However, after reviewing this implementation, I think it might be over complex and hard to track. Therefore, I removed the base_token is the latest version, and rely on input and tool response to represent token usage.

self._initialize_default_profiles()
)

def _initialize_default_profiles(self) -> Dict[str, ToolCostProfile]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this part is not necessary and hard to maintain

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out. I've removed this part in the latest version.

Comment on lines 187 to 188
# Extract text from common result fields
text_fields = ["content", "text", "message", "result", "output"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this be able to extract all text information?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to token_counter.count_tokens_from_messages for token counting and this part is removed.

Comment on lines 201 to 207
try:
return len(self.token_counter.encode(text))
except Exception as e:
logger.error(f"Error counting tokens: {e}")
pass

return len(text.split())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use token_counter.count_tokens_from_messages

Copy link
Collaborator Author

@lllllyh01 lllllyh01 Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing it out. I omitted this function before. I've updated it to token_counter.count_tokens_from_messages in the latest version.

Besides, I see there's another token counting method in ChatAgent named _get_token_count. They both use token_counter of the backend model, but their input formats and implementations are slightly different. Can you provide some contexts about their differences and their respective suitable cases?

Comment on lines 26 to 58
class ToolCategory(Enum):
"""Categories of tools based on their typical token consumption."""

# Low-cost tools (typically < 100 tokens)
SEARCH_API = "search_api" # Google Search, Bing Search, etc.
SIMPLE_UTILITY = (
"simple_utility" # Basic file operations, simple calculations
)

# Medium-cost tools (100-1000 tokens)
CODE_EXECUTION = "code_execution" # Python execution, shell commands
DOCUMENT_PROCESSING = "document_processing" # PDF parsing, text analysis
API_CALLS = "api_calls" # REST API calls, webhooks

# High-cost tools (1000+ tokens)
BROWSER_AUTOMATION = (
"browser_automation" # Browser interactions, screenshots
)
MULTIMODAL_PROCESSING = (
"multimodal_processing" # Image analysis, audio processing
)
LLM_CALLS = "llm_calls" # Sub-agent calls, complex reasoning


@dataclass
class ToolCostProfile:
"""Cost profile for a specific tool type."""

category: ToolCategory
base_tokens: int # Base token consumption


class ToolCostInfo(TypedDict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think these classes may not necessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, thank you for pointing it out. I've removed this part in the latest version.

@Wendong-Fan Wendong-Fan added Waiting for Update PR has been reviewed, need to be updated based on review comment and removed Review Required PR need to be reviewed labels Oct 25, 2025
@Wendong-Fan Wendong-Fan modified the milestones: Sprint 41, Sprint 40 Oct 27, 2025
logger.error(f"Error counting tokens: {e}")
pass

return len(text.split())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the justification for using len(text.split()) as a fallback measure for token counting, are there any references showing the relation between word count and token count

Copy link
Collaborator Author

@lllllyh01 lllllyh01 Nov 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing. It refers ChatAgent._get_token_count.
In the meantime, I see there's another token counting method token_counter.count_tokens_from_messages. They both use token_counter of the backend model, but their input formats and implementations are slightly different. Can you provide some contexts about their differences and their respective suitable cases?


# Record information about this tool call
# Calculate tool cost and token usage
cost_info = self._tool_cost_calculator.estimate_tool_cost(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the token cost calculation happens after every tool call, could be worth including some error handling in case exceptions are returned

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Waleed. I've added error handling in _calculate_tool_cost method in the latest version.

def _initialize_default_profiles(self) -> Dict[str, ToolCostProfile]:
"""Initialize default cost profiles for common tool types."""
return {
"search_google": ToolCostProfile(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide the justification for the base_token estimates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Waleed. The base_token was originally introduced to represent potential differences in system prompts and schemas between different tools. However, after reviewing this implementation, I think it might be over complex and hard to track. Therefore, I removed the base_token is the latest version, and rely on input and tool response to represent token usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Waiting for Update PR has been reviewed, need to be updated based on review comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants