Skip to content

Conversation

amit-timalsina
Copy link

What does the PR do?

This PR adds support for Hermes-style tool calling functionality to the Triton Inference Server OpenAI API frontend. The implementation introduces a new HermesToolParser that can parse tool calls in the Hermes format using <tool_call> tags, enabling tool calling capabilities for Hermes-compatible models including Qwen family models like Qwen 2.5 1.5B Instruct and Qwen 3 8B Instruct.

Key Changes:

  1. New Hermes Tool Call Parser: Implements HermesToolParser class that handles both streaming and non-streaming tool call parsing for Hermes-style format
  2. Streaming Support: Provides robust streaming token parsing with buffering for partial tool call tokens
  3. Error Handling: Includes fallback to AST parsing when JSON parsing fails, ensuring robustness
  4. Test Integration: Updates test infrastructure to support hermes parser validation
  5. Module Integration: Properly integrates with existing ToolParserManager registration system

Checklist

  • I have read the Contribution guidelines and signed the Contributor License Agreement
  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • I ran pre-commit locally (pre-commit install, pre-commit run --all)
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging
  • All template sections are filled out.

Commit Type:

  • feat

Related PRs:

None

Where should the reviewer start?

  1. Core Implementation: python/openai/openai_frontend/engine/utils/tool_call_parsers/hermes_tool_call_parser.py - Main parser implementation
  2. Module Integration: python/openai/openai_frontend/engine/utils/tool_call_parsers/__init__.py - Registration and exports
  3. Test Integration: python/openai/tests/conftest.py - Test configuration updates

Test plan:

  1. Pre-commit Validation: All pre-commit hooks pass (isort, black, flake8, codespell, etc.)
  2. Existing Tests: Current OpenAI test suite continues to pass with hermes parser available
  3. Parser Registration: HermesToolParser properly registers with ToolParserManager under "hermes" key
  4. Integration Test: Can be tested using TEST_TOOL_CALL_PARSER=hermes environment variable
  5. Tool Call Parsing: Tests validate both streaming and non-streaming tool call parsing functionality

Caveats:

  • Requires models that support Hermes-style tool calling format with <tool_call> tags
  • Currently tested with Qwen family models (Qwen 2.5 1.5B Instruct, Qwen 3 8B Instruct)
  • Depends on existing partial-json-parser dependency already in requirements

Background

Hermes is a popular tool calling format used by several model families, particularly the Qwen series. The format uses XML-style tags (<tool_call> and </tool_call>) to wrap JSON tool call definitions. This implementation enables Triton to serve Hermes-compatible models through the OpenAI API with full tool calling support, expanding the range of models that can be used for function calling applications.

The implementation follows the established patterns used by existing tool parsers (llama3, mistral) and is adapted from the vLLM project's Hermes parser with Triton-specific modifications.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Relates to expanding tool calling support for additional model formats in Triton OpenAI API

Files Changed:

  • python/openai/openai_frontend/engine/utils/tool_call_parsers/hermes_tool_call_parser.py (new)
  • python/openai/openai_frontend/engine/utils/tool_call_parsers/__init__.py (modified)
  • python/openai/tests/conftest.py (modified)

Model Compatibility:

  • Qwen 2.5 1.5B Instruct
  • Qwen 3 8B Instruct
  • Other models supporting Hermes-style tool calling format

- Implement HermesToolParser class supporting Hermes-style tool calling format
- Add support for <tool_call> tags in model responses
- Handle both streaming and non-streaming tool call parsing
- Support JSON and AST-based function call parsing for robustness
- Integrate with existing ToolParserManager registration system
- Adapted from vLLM implementation with Triton-specific modifications

This parser enables tool calling functionality for Hermes-compatible models
through the OpenAI API frontend, following the established tool parser
architecture used by other parsers (llama, mistral, etc.).
- Add HermesToolParser to __init__.py exports for proper module integration
- Update test configuration to support 'hermes' tool call parser option
- Add hermes parser support to infer_test_environment() and infer_test_model_repository()
- Enables testing hermes tool calling functionality via TEST_TOOL_CALL_PARSER=hermes

This allows the hermes parser to be tested alongside existing llama3 and mistral parsers
through the established OpenAI test suite infrastructure.
@amit-timalsina amit-timalsina changed the title Feature/hermes tool parser feat: Add Hermes tool call parser for OpenAI API Oct 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant