You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m deploying the Mistral Small 3.1-2503 model using llama.cpp, and I noticed that the default chat template doesn’t include the tool annotation. As a result, llama-server deployments cannot properly use the function-calling feature, unlike Ollama.
To address this, I modified the vLLM example and tested it with the following setup:
ghcr.io/ggml-org/llama.cpp:server-cuda-b5391
Model files from unsloth/Mistral-Small-3.1-24B-Instruct-2503-GGUF:
Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf
mmproj-F16.gguf
Below is the final version of my modified chat template, along with the execution parameters. With these changes, it can correctly handle chat completions, function calls, and interleave multiple images in the chat history.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone,
I’m deploying the Mistral Small 3.1-2503 model using llama.cpp, and I noticed that the default chat template doesn’t include the tool annotation. As a result, llama-server deployments cannot properly use the function-calling feature, unlike Ollama.
To address this, I modified the vLLM example and tested it with the following setup:
ghcr.io/ggml-org/llama.cpp:server-cuda-b5391
unsloth/Mistral-Small-3.1-24B-Instruct-2503-GGUF
:Mistral-Small-3.1-24B-Instruct-2503-Q4_K_M.gguf
mmproj-F16.gguf
Below is the final version of my modified chat template, along with the execution parameters. With these changes, it can correctly handle chat completions, function calls, and interleave multiple images in the chat history.
https://gist.github.com/Phate334/dd633561879f41a8c4affc4031df1c7f
I’d appreciate any feedback on whether this approach is correct or if there are any missing steps.
Beta Was this translation helpful? Give feedback.
All reactions