feat: Add Hermes tool call parser for OpenAI API #8456

amit-timalsina · 2025-10-12T05:14:06Z

What does the PR do?

This PR adds support for Hermes-style tool calling functionality to the Triton Inference Server OpenAI API frontend. The implementation introduces a new HermesToolParser that can parse tool calls in the Hermes format using <tool_call> tags, enabling tool calling capabilities for Hermes-compatible models including Qwen family models like Qwen 2.5 1.5B Instruct and Qwen 3 8B Instruct.

Key Changes:

New Hermes Tool Call Parser: Implements HermesToolParser class that handles both streaming and non-streaming tool call parsing for Hermes-style format
Streaming Support: Provides robust streaming token parsing with buffering for partial tool call tokens
Error Handling: Includes fallback to AST parsing when JSON parsing fails, ensuring robustness
Test Integration: Updates test infrastructure to support hermes parser validation
Module Integration: Properly integrates with existing ToolParserManager registration system

Checklist

Commit Type:

feat

Related PRs:

None

Where should the reviewer start?

Core Implementation: python/openai/openai_frontend/engine/utils/tool_call_parsers/hermes_tool_call_parser.py - Main parser implementation
Module Integration: python/openai/openai_frontend/engine/utils/tool_call_parsers/__init__.py - Registration and exports
Test Integration: python/openai/tests/conftest.py - Test configuration updates

Test plan:

Pre-commit Validation: All pre-commit hooks pass (isort, black, flake8, codespell, etc.)
Existing Tests: Current OpenAI test suite continues to pass with hermes parser available
Parser Registration: HermesToolParser properly registers with ToolParserManager under "hermes" key
Integration Test: Can be tested using TEST_TOOL_CALL_PARSER=hermes environment variable
Tool Call Parsing: Tests validate both streaming and non-streaming tool call parsing functionality

Caveats:

Requires models that support Hermes-style tool calling format with <tool_call> tags
Currently tested with Qwen family models (Qwen 2.5 1.5B Instruct, Qwen 3 8B Instruct)
Depends on existing partial-json-parser dependency already in requirements

Background

Hermes is a popular tool calling format used by several model families, particularly the Qwen series. The format uses XML-style tags (<tool_call> and </tool_call>) to wrap JSON tool call definitions. This implementation enables Triton to serve Hermes-compatible models through the OpenAI API with full tool calling support, expanding the range of models that can be used for function calling applications.

The implementation follows the established patterns used by existing tool parsers (llama3, mistral) and is adapted from the vLLM project's Hermes parser with Triton-specific modifications.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to expanding tool calling support for additional model formats in Triton OpenAI API

Files Changed:

python/openai/openai_frontend/engine/utils/tool_call_parsers/hermes_tool_call_parser.py (new)
python/openai/openai_frontend/engine/utils/tool_call_parsers/__init__.py (modified)
python/openai/tests/conftest.py (modified)

Model Compatibility:

Qwen 2.5 1.5B Instruct
Qwen 3 8B Instruct
Other models supporting Hermes-style tool calling format

- Implement HermesToolParser class supporting Hermes-style tool calling format - Add support for <tool_call> tags in model responses - Handle both streaming and non-streaming tool call parsing - Support JSON and AST-based function call parsing for robustness - Integrate with existing ToolParserManager registration system - Adapted from vLLM implementation with Triton-specific modifications This parser enables tool calling functionality for Hermes-compatible models through the OpenAI API frontend, following the established tool parser architecture used by other parsers (llama, mistral, etc.).

- Add HermesToolParser to __init__.py exports for proper module integration - Update test configuration to support 'hermes' tool call parser option - Add hermes parser support to infer_test_environment() and infer_test_model_repository() - Enables testing hermes tool calling functionality via TEST_TOOL_CALL_PARSER=hermes This allows the hermes parser to be tested alongside existing llama3 and mistral parsers through the established OpenAI test suite infrastructure.

amit-timalsina added 2 commits October 12, 2025 10:49

amit-timalsina changed the title ~~Feature/hermes tool parser~~ feat: Add Hermes tool call parser for OpenAI API Oct 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Hermes tool call parser for OpenAI API #8456

feat: Add Hermes tool call parser for OpenAI API #8456

Uh oh!

amit-timalsina commented Oct 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

feat: Add Hermes tool call parser for OpenAI API #8456

Are you sure you want to change the base?

feat: Add Hermes tool call parser for OpenAI API #8456

Uh oh!

Conversation

amit-timalsina commented Oct 12, 2025

What does the PR do?

Key Changes:

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Files Changed:

Model Compatibility:

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant