Skip to content

[Bugfix] improve regex for hermes tool detection #20474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,8 @@ def __init__(self, tokenizer: AnyTokenizer):
self.tool_call_end_token: str = "</tool_call>"

self.tool_call_regex = re.compile(
r"<tool_call>(.*?)</tool_call>|<tool_call>(.*)", re.DOTALL)
r"<tool_call>\s*(.*?)\s*(?:<tool_call>\s*|</tool_call>\s*|$)",
re.DOTALL)
self.scratch_pad_regex = re.compile(
r"<scratch_pad>(.*?)</scratch_pad>", re.DOTALL)

Expand Down Expand Up @@ -80,15 +81,18 @@ def extract_tool_calls(
# tag and end-of-string so the result of
# findall is an array of tuples where one is a function call and
# the other is None
function_call_tuples = (
self.tool_call_regex.findall(model_output))

# load the JSON, and then use it to build the Function and
# Tool Call
raw_function_calls = [
json.loads(match[0] if match[0] else match[1])
for match in function_call_tuples
]
matches = self.tool_call_regex.findall(model_output)
raw_function_calls = []
for match in matches:
if not match:
continue
try:
parsed = json.loads(match.strip())
raw_function_calls.append(parsed)
except json.JSONDecodeError as e:
logger.warning(
"Skipping malformed tool_call block: %s", e)

tool_calls = [
ToolCall(
type="function",
Expand All @@ -99,9 +103,9 @@ def extract_tool_calls(
ensure_ascii=False)))
for function_call in raw_function_calls
]

content = model_output[:model_output.
find(self.tool_call_start_token)]
tool_call_start = model_output.find(self.tool_call_start_token)
content = (model_output[:tool_call_start]
if tool_call_start >= 0 else None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition is sort of a duplicate of the condition on line 112.

The one on line 112 is a bit clearer, can we keep that one?

Copy link

@oOraph oOraph Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @hmellor :), I don't understand what you mean, can you explain ?

test = 'whatever'
lookup = 'sthnotfound'
print(test[:test.find(lookup)]) # print(test[:-1]) -> 'whateve' -> not None, so if we remove this check this will be the content value no ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I've annotated this block to show which conditions I'm referring to:

                # content will either be a non-empty string or None
                content = model_output[:tool_call_start] if tool_call_start >= 0 else None
                return ExtractedToolCallInformation(
                    tools_called=True,
                    tool_calls=tool_calls,
                    # Therefore, this condition is not necessary because content is already a non-empty string or None
                    content=content if content else None)

In my first comment I was suggesting that we instead have:

                content = model_output[:tool_call_start]
                return ExtractedToolCallInformation(
                    tools_called=True,
                    tool_calls=tool_calls,
                    content=content if content else None)

Because it's a little easier to read.

return ExtractedToolCallInformation(
tools_called=True,
tool_calls=tool_calls,
Expand Down