feat: Add Hermes tool call parser for OpenAI API #8456
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does the PR do?
This PR adds support for Hermes-style tool calling functionality to the Triton Inference Server OpenAI API frontend. The implementation introduces a new
HermesToolParser
that can parse tool calls in the Hermes format using<tool_call>
tags, enabling tool calling capabilities for Hermes-compatible models including Qwen family models like Qwen 2.5 1.5B Instruct and Qwen 3 8B Instruct.Key Changes:
HermesToolParser
class that handles both streaming and non-streaming tool call parsing for Hermes-style formatChecklist
<commit_type>: <Title>
pre-commit install, pre-commit run --all
)Commit Type:
Related PRs:
None
Where should the reviewer start?
python/openai/openai_frontend/engine/utils/tool_call_parsers/hermes_tool_call_parser.py
- Main parser implementationpython/openai/openai_frontend/engine/utils/tool_call_parsers/__init__.py
- Registration and exportspython/openai/tests/conftest.py
- Test configuration updatesTest plan:
TEST_TOOL_CALL_PARSER=hermes
environment variableCaveats:
<tool_call>
tagspartial-json-parser
dependency already in requirementsBackground
Hermes is a popular tool calling format used by several model families, particularly the Qwen series. The format uses XML-style tags (
<tool_call>
and</tool_call>
) to wrap JSON tool call definitions. This implementation enables Triton to serve Hermes-compatible models through the OpenAI API with full tool calling support, expanding the range of models that can be used for function calling applications.The implementation follows the established patterns used by existing tool parsers (llama3, mistral) and is adapted from the vLLM project's Hermes parser with Triton-specific modifications.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Files Changed:
python/openai/openai_frontend/engine/utils/tool_call_parsers/hermes_tool_call_parser.py
(new)python/openai/openai_frontend/engine/utils/tool_call_parsers/__init__.py
(modified)python/openai/tests/conftest.py
(modified)Model Compatibility: