Skip to content

[Bugfix] Mistral tool parser streaming update #19425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

avigny
Copy link

@avigny avigny commented Jun 10, 2025

Purpose

Fixes #13622
Fixes #17585
Probably fixes #20028

This PR is similar to #16096 (hermes tool parser)

Future improvements

Test Plan

I've added a test file tests/tool_use/test_mistral_tool_parser.py for easy and fast testing. This file works as the existing tests/tool_use/test_jamba_tool_parser.py.

Use pytest tests/tool_use/test_mistral_tool_parser.py to run this test file.

Test Result

The following results are from the added tests in tests/tool_use/test_mistral_tool_parser.py

Before the fix:

=========================================================== short test summary info ===========================================================
FAILED tests/tool_use/test_mistral_tool_parser.py::test_extract_tool_calls_streaming[single_tool_add] - AssertionError: got $name='add' arguments='{"a":}'
FAILED tests/tool_use/test_mistral_tool_parser.py::test_extract_tool_calls_streaming[argument_before_name] - partial_json_parser.core.exceptions.MalformedJSON: string index out of range
FAILED tests/tool_use/test_mistral_tool_parser.py::test_extract_tool_calls_streaming[argument_before_name_and_name_in_argument] - partial_json_parser.core.exceptions.MalformedJSON: string index out of range
FAILED tests/tool_use/test_mistral_tool_parser.py::test_extract_tool_calls_streaming[multiple_tools] - AssertionError: got $name='add' arguments='{"a":, "b": 4{"a": 3.5, "b": 4}}'
=================================================== 4 failed, 8 passed, 2 warnings in 1.86s ===================================================

After the fix

============================================================= test session starts =============================================================
platform darwin -- Python 3.12.10, pytest-8.3.3, pluggy-1.5.0
rootdir: /Users/avigny/devenv/vllm
configfile: pyproject.toml
plugins: asyncio-0.24.0, hypothesis-6.131.0, buildkite-test-collector-0.1.9, subtests-0.14.1, shard-0.1.2, rerunfailures-14.0, schemathesis-3.39.15, timeout-2.3.1, mock-3.14.0, forked-1.6.0, anyio-4.6.2.post1
asyncio: mode=Mode.STRICT, default_loop_scope=None
collected 12 items                                                                                                                            
Running 12 items in this shard

tests/tool_use/test_mistral_tool_parser.py ............                                                                                 [100%]

============================================================== warnings summary ===============================================================
.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305
  /Users/avigny/devenv/vllm/.venv/lib/python3.12/site-packages/schemathesis/generation/coverage.py:305: DeprecationWarning: jsonschema.exceptions.RefResolutionError is deprecated as of version 4.18.0. If you wish to catch potential reference resolution errors, directly catch referencing.exceptions.Unresolvable.
    ref_error: type[Exception] = jsonschema.RefResolutionError,

tests/tool_use/test_mistral_tool_parser.py::test_extract_tool_calls_no_tools
  /Users/avigny/devenv/vllm/tests/tool_use/test_mistral_tool_parser.py:22: FutureWarning: It is strongly recommended to run mistral models with `--tokenizer-mode "mistral"` to ensure correct encoding and decoding.
    return get_tokenizer(tokenizer_name=MODEL)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================= 12 passed, 2 warnings in 1.75s ========================================================

(Optional) Documentation Update

I believe no documentation update is needed

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @avigny, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses an issue related to streaming tool calls for Mistral models by replacing the previous partial_json_parser-based implementation with a more robust, custom stateful parser. This new approach aims to accurately extract tool call information, including names and arguments, as tokens are streamed, improving the reliability of tool use functionality in streaming mode. The changes include a significant rewrite of the parsing logic and the addition of comprehensive test coverage.

Highlights

  • Refactor Streaming Parser: The core logic for parsing Mistral tool calls during streaming has been completely rewritten.
  • Removed Dependency: The dependency on the partial_json_parser library for streaming tool call extraction has been removed.
  • New Parsing Mechanism: Introduced a custom stateful parsing mechanism using regex and json.JSONDecoder.raw_decode to incrementally extract tool call names and arguments from the raw token stream.
  • Comprehensive Tests: Added a new, extensive test file (tests/tool_use/test_mistral_tool_parser.py) with various test cases covering both streaming and non-streaming scenarios for Mistral tool calls, including single and multiple tool calls with different argument structures.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configureGemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the streaming tool call parsing logic for Mistral models and adds a comprehensive test suite. The core change involves replacing partial_json_parser with a custom regex and json.raw_decode-based approach for more fine-grained control over the streaming process. The new tests cover a variety of scenarios. The review includes stylistic suggestions for the tests and points for consideration regarding complexity and state management in the new parsing logic.

Comment on lines 177 to 286
# select as the current tool call the one we're on the state at

current_tool_call: dict = tool_call_arr[self.current_tool_id] \
if len(tool_call_arr) > 0 else {}

# case -- if no tokens have been streamed for the tool, e.g.
# only the array brackets, stream nothing
if len(tool_call_arr) == 0:
return None

# case: we are starting a new tool in the array
# -> array has > 0 length AND length has moved past cursor
elif (len(tool_call_arr) > 0
and len(tool_call_arr) > self.current_tool_id + 1):

# if we're moving on to a new call, first make sure we
# haven't missed anything in the previous one that was
# auto-generated due to JSON completions, but wasn't
# streamed to the client yet.
if self.current_tool_id >= 0:
diff: Union[str, None] = current_tool_call.get("arguments")

if diff:
diff = json.dumps(diff, ensure_ascii=False).replace(
self.streamed_args_for_tool[self.current_tool_id],
"")
delta = DeltaMessage(tool_calls=[
DeltaToolCall(index=self.current_tool_id,
function=DeltaFunctionCall(
arguments=diff).model_dump(
exclude_none=True))
])
self.streamed_args_for_tool[
self.current_tool_id] += diff
else:
delta = None
else:
delta = None
# re-set stuff pertaining to progress in the current tool
self.current_tool_id = len(tool_call_arr) - 1
self.current_tool_name_sent = False
self.streamed_args_for_tool.append("")
logger.debug("starting on new tool %d", self.current_tool_id)
return delta

# case: update an existing tool - this is handled below

# if the current tool name hasn't been sent, send if available
# - otherwise send nothing
if not self.current_tool_name_sent:
function_name = current_tool_call.get("name")
if function_name:

delta = DeltaMessage(tool_calls=[
function_name, name_end_index = self._extracted_complete_name(raw_current_tool_call, self.current_attribute_start_index)
except IndexError:
# name value has not started being generated
return self._none_or_additional_content(additional_content)
if function_name == "":
return self._none_or_additional_content(additional_content)
else:
assert name_end_index is not None # because the function name was successfully retrieved
self.current_tool_name_finished = True
self.current_element_streaming = None
self.current_attribute_start_index = -1
self.previous_attribute_end_index = name_end_index
delta = DeltaMessage(
content=additional_content,
tool_calls=[
DeltaToolCall(index=self.current_tool_id,
type="function",
id=MistralToolCall.generate_random_id(),
function=DeltaFunctionCall(
name=function_name).model_dump(
exclude_none=True))
])
return delta
if self.current_element_streaming == "arguments":
try:
diff, arguments_end_index = self._extract_argument_fragment(raw_current_tool_call, self.current_attribute_start_index, delta_text)
self.current_tool_arguments_finished = arguments_end_index != -1
if self.current_tool_arguments_finished:
self.current_element_streaming = None
self.current_attribute_start_index = -1
self.previous_attribute_end_index = arguments_end_index
delta = DeltaMessage(
content=additional_content,
tool_calls=[
DeltaToolCall(index=self.current_tool_id,
function=DeltaFunctionCall(
arguments=diff).
model_dump(exclude_none=True))
])
self.current_tool_name_sent = True
else:
delta = None
return delta
except IndexError:
# arguments value has not started being generated
return self._none_or_additional_content(additional_content)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The extract_tool_calls_streaming method is complex. Consider simplifying state transitions or adding detailed comments to improve maintainability.

avigny added 4 commits June 11, 2025 10:12
Tests are similar as the ones added for Jamba models in vllm-project#9154

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
@avigny avigny force-pushed the mistral-tool-parser-streaming-update branch from c468495 to d6d17c1 Compare June 11, 2025 08:13
@avigny avigny marked this pull request as ready for review June 11, 2025 09:25
@avigny avigny requested a review from aarnphm as a code owner June 11, 2025 09:25
@avigny
Copy link
Author

avigny commented Jun 11, 2025

@hibukipanim I did run the test you provided in your issue description #17585 (comment) and got the following output:

ChoiceDeltaToolCall(index=0, id='j6OY9szTS', function=ChoiceDeltaToolCallFunction(arguments=None, name='mcp_confluence'), type='function')
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='{"', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='query', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='co', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='ffee', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='",', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' "', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='limit', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='":', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments=' ', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='1', name=None), type=None)
ChoiceDeltaToolCall(index=0, id=None, function=ChoiceDeltaToolCallFunction(arguments='}', name=None), type=None)

It seems to fix your issue.
Please let me know If I missed something.

@avigny avigny changed the title Mistral tool parser streaming update [Bugfix] Mistral tool parser streaming update Jun 11, 2025
@PedroMiolaSilva
Copy link

@avigny hey!

I've being trying to test your solution but with no success. This is what I'm doing:

source ../.env
export MODEL_ID=unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8
export MODEL_ID_PORT=8000
export MODEL_ID_GPU=0

docker run \
--runtime nvidia \
-e VLLM_USE_V1=1 \
--gpus all \
--ipc=host \
-p "${MODEL_ID_PORT}:8000" \
--env "HUGGING_FACE_HUB_TOKEN=${HUGGING_FACE_HUB_TOKEN}" \
--env "HF_HUB_OFFLINE=0" \
-v "${HF_HOME}:/root/.cache/huggingface" \
-v "./mistral_tool_parser.py:/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py" \
vllm/vllm-openai:latest \
-v "$(pwd):/app" \
--model ${MODEL_ID} \
--tool-call-parser mistral \
--chat-template /app/template.jinja
--enable-auto-tool-choice \
--limit-mm-per-prompt 'image=1' \
--tokenizer_mode mistral \
--config_format mistral \
--load_format mistral \
--max-model-len 64000 \
--gpu-memory-utilization 0.8

Where template.jinja is this one and mistral_tool_parser.py is the one that you've created.

I'm using this test request:

curl -X POST \
   http://localhost:8000/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
   "model": "unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8",
   "messages": [
    {"role":"system","content":"You have access to the weather tool. You should call this tool when you think it makes sense"},
     {"role": "user", "content": "What'\''s the weather in New York?"}
   ],
   "tools": [
     {
       "type": "function",
       "function": {
         "name": "get_weather",
         "description": "Get the current weather in a given location",
         "parameters": {
           "type": "object",
           "properties": {
             "location": {
               "type": "string",
               "description": "The city and state, e.g. San Francisco, CA"
             }
           },
           "required": ["location"]
         }
       }
     }
   ]
 }'

When I set stream to false, I'm getting this response:

{"id":"chatcmpl-0dc2b75406114cbcb4f95735ccfdb094","object":"chat.completion","created":1751490167,"model":"unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"[TOOL_CALLS]get_weather{\"location\": \"New York, NY\"}","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":112,"total_tokens":127,"completion_tokens":15,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}

And this error:

ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] Error in extracting tool call from response.
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] Traceback (most recent call last):
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 131, in extract_tool_calls
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     function_call_arr = json.loads(tool_content)
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]                         ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/lib/python3.12/json/__init__.py", line 346, in loads
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     return _default_decoder.decode(s)
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/lib/python3.12/json/decoder.py", line 338, in decode
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/lib/python3.12/json/decoder.py", line 356, in raw_decode
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     raise JSONDecodeError("Expecting value", s, err.value) from None
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] 
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] During handling of the above exception, another exception occurred:
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] 
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] Traceback (most recent call last):
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 137, in extract_tool_calls
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]     raw_tool_call = self.tool_call_regex.findall(tool_content)[0]
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160]                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
ERROR 07-02 14:00:22 [mistral_tool_parser.py:160] IndexError: list index out of range

When I set stream=true, I dont receive any errors, but the response does not have tool calls:

data: {"id":"chatcmpl-028934e8ee754938943457f631313546","object":"chat.completion.chunk","created":1751490269,"model":"unsloth/Mistral-Small-3.2-24B-Instruct-2506-FP8","choices":[{"index":0,"delta":{"role":"assistant","content":""},"logprobs":null,"finish_reason":null}]}

Am I doing something wrong here?

@rdlh
Copy link

rdlh commented Jul 3, 2025

Looks liks this PR unfortunately don't fix issues on Mistral Small 3.2.

API Call :

{
    "stream": false,
    "temperature": 0.15,
    "top_p": 1.0,
    "tool_choice": "auto",
    "model": "mistralai/Mistral-Small-3.2-24B-Instruct-2506",
    "messages": [
        {
            "role": "user",
            "content": "Hi ! What's the result of 95478415 / 4571 ?"
        }
    ],
    "tools": [
        {
            "type":"function",
            "function": {
            "name":"calculator",
            "description":"Perform a basic calculation using ruby syntax for arithmetic operations.",
            "parameters": {
                "type":"object",
                "properties": {
                "calculation": {
                    "type":"string",
                    "description":"A basic arithmetic calculation in python language (e.g., \"2+2\", \"10*3\", \"45/9\").",
                    "required":["calculation"]
                }
                },
                "required":["calculation"]
            }
            }
        }
    ]
}

Still have this error :

ERROR 07-03 01:55:20 [mistral_tool_parser.py:166] Error in extracting tool call from response.
ERROR 07-03 01:55:20 [mistral_tool_parser.py:166] Traceback (most recent call last):
ERROR 07-03 01:55:20 [mistral_tool_parser.py:166]     function_call_arr = json.loads(tool_content)

Here are some logs :

=== model_output ===
[TOOL_CALLS]calculator{"calculation": "95478415 / 4571"}
=== tool_content ===
calculator{"calculation": "95478415 / 4571"}

Please note that this issue is NOT happening when using "tool_choice": "required".

@avigny
Copy link
Author

avigny commented Jul 3, 2025

Yes you're both right!
I believe I did branch out and started working on my fix before the changes introduced by #19193 which introduced the use of fn_name_regex from the model tokenizer.
I'll try to port this to the extract_tool_calls_streaming method.

Thanks for finding this!

@gaby
Copy link

gaby commented Jul 4, 2025

Any update on getting this merge?

@DarkLight1337
Copy link
Member

cc @aarnphm

@sjuxax
Copy link
Contributor

sjuxax commented Jul 4, 2025

So I did more complete testing and found this wasn't working that well after all -- I was getting the same errors reported above. Not sure what happened on initial testing. But, I've since taken it and have a working implementation, for streaming at least, at https://github.com/sjuxax/vllm/tree/Mistral3.2-tool-call-fix. I'm going to cherry-pick it onto #20471 in a sec. Then using that branch should work with quantized HF models and tool calling.

Copy link

@PedroMiolaSilva PedroMiolaSilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think replacing lines 127:139 with this below will fix it for non-streaming:

            #First, use the tool call token to split, and we discard the first item, because it is empty
            raw_tool_calls = model_output.split(self.bot_token)[1:] 
            function_call_arr = []
            for raw_tool_call in raw_tool_calls:
                tool_name = raw_tool_call.split("{")[0]
                tool_arguments_begin = raw_tool_call.find("{")
                tool_arguments = raw_tool_call[tool_arguments_begin:]
                function_call_arr.append({
                                        "name": tool_name,
                                        "arguments": json.loads(tool_arguments)
                })

sjuxax pushed a commit to sjuxax/vllm that referenced this pull request Jul 4, 2025
avigny added 3 commits July 6, 2025 21:14
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
avigny added 3 commits July 8, 2025 18:53
… in the model output

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Comment on lines +70 to +79
assert tools is not None
assistant_msg = AssistantMessage(tool_calls=[
ToolCall(function=FunctionCall(
name=name,
arguments=arg,
)) for (name, arg) in tools
], )
request = InstructRequest(messages=[assistant_msg], )
all_token_ids = mistral_tokenizer.instruct.encode_instruct(
request).tokens
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find an other way to have a good tokenization of my model output.

I need to have the BOT token [TOOL_CALLS] as a single token and not split into multiple tokens ([, TOOL, _, ... for example)

Comment on lines 245 to 259
# (
# '''[TOOL_CALLS]add{"a": 3.5, "b": 4}[TOOL_CALLS]multiply{"a": 3, "b": 6}''', # noqa: E501
# [
# ToolCall(function=FunctionCall(name="add",
# arguments=json.dumps({
# "a": 3.5,
# "b": 4
# }))),
# ToolCall(function=FunctionCall(name="multiply",
# arguments=json.dumps({
# "a": 3,
# "b": 6
# })))
# ],
# None) # Was already broken
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is broken in the current released version of vllm:

mistral tool calls with a v11 tokenizer are broken if multiple tool calls are generated in the same output

with an add and a multiply tools,
I got mistralai/Mistral-Small-3.2-24B-Instruct-2506 to generate the following output : [TOOL_CALLS]add{"a": 4, "b": 5}[TOOL_CALLS]multiply{"a": 6, "b": 7}

and got the following errors (line number is off by 1 because I added a print statement)

ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] Error in extracting tool call from response.
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] Traceback (most recent call last):
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]   File "/home/avigny/vllm-venv/lib/python3.12/site-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 158, in extract_tool_calls
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]     "arguments": json.loads(args)
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]                  ^^^^^^^^^^^^^^^^
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]   File "/home/avigny/.pyenv/versions/3.12.11/lib/python3.12/json/__init__.py", line 346, in loads
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]     return _default_decoder.decode(s)
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]   File "/home/avigny/.pyenv/versions/3.12.11/lib/python3.12/json/decoder.py", line 341, in decode
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]     raise JSONDecodeError("Extra data", s, end)
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] json.decoder.JSONDecodeError: Extra data: line 1 column 17 (char 16)
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] 
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] During handling of the above exception, another exception occurred:
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] 
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] Traceback (most recent call last):
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]   File "/home/avigny/vllm-venv/lib/python3.12/site-packages/vllm/entrypoints/openai/tool_parsers/mistral_tool_parser.py", line 167, in extract_tool_calls
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]     raw_tool_call = self.tool_call_regex.findall(tool_content)[0]
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190]                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^
ERROR 07-08 17:10:24 [mistral_tool_parser.py:190] IndexError: list index out of range

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked into this yet

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be repaired by 138cef3

Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
return DeltaMessage(content=delta_text)

# if the tool call token ID IS in the tokens generated so far, that
# means we're parsing as tool calls now
if _is_fn_name_regex_support(self.model_tokenizer):
Copy link
Author

@avigny avigny Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming of this function could be modified to be more readable as the streaming function does not use this regex.

Basically this is used here to know if the tool calls should be parsed with :

  • the old parser (tool call like [TOOL_CALLS][{"name":"add" , "arguments":{"a": 3, "b": 4} } ])
  • the new parser (tool call like [TOOL_CALLS]add{"a": "3", "b": "4"})

avigny added 2 commits July 8, 2025 20:09
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
sjuxax pushed a commit to sjuxax/vllm that referenced this pull request Jul 11, 2025
sjuxax pushed a commit to sjuxax/vllm that referenced this pull request Jul 13, 2025
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
6 participants