-
Notifications
You must be signed in to change notification settings - Fork 97
Function calling support for Kimi-K2 #628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this! People have been asking for function calling support, but that is not something I'm very familiar with.
LGTM, but I would appreciate at least one other person testing.
I see your location is Leipzig. Have fond memories of this place, having spent 11 years there studying physics, doing a PhD, and staying for my first postdoc position.
Thanks! I've done the basic tests, but the model loads too slow from my hdd, so I will test different use cases over the weekend.
I live in a beautiful city, thanks! I've been living here for 3 years and have absolutely no regrets! |
Oh hej this is exciting! I believe we have a PR open for this #407 (comment) where some folks were trying to use a reverse proxy / wrapper to handle it similar to claude-code-proxy perhaps. I don't use tool calling myself, but did notice when adding Kimi-K2-Instruct PR that I left out one section for the chat endpoint for the So if it expects llama-server to handle the template internally that |
@iSevenDays This seems relevant:
|
This is very exciting! I would much rather use a native function calling! |
I took a look at how llama.cpp implements tool calling support and the task is much more complicated than I thought. Especially, the streaming part. |
That would be really amazing! ik_llama + tool calling will be a dream come true for me! |
- Add new chat.h/chat.cpp and chat-parser.h/chat-parser.cpp for better chat handling - Improve function calls parsing with fallback to llama.cpp builder pattern - Add string utility functions (starts_with, ends_with, find_partial_stop) - Update README with function calls testing instructions - Enhance Kimi K2 parser and function calls documentation - Add comprehensive test suite for function calls - Update CMakeLists.txt and Makefile for new components
- Fix streaming content cleanup to prevent function syntax in output - Unify content extraction patterns with llama.cpp approach - Improve Kimi K2 parser robustness and partial content handling - Add comprehensive test coverage for function call scenarios - Optimize chat message parsing and diff computation
- Add compile-time constants for all token format markers - Add compile-time constants for XML format markers - Add compile-time constants for simple format patterns - Replace all hardcoded string literals with named constants - Use compile-time length calculation to avoid manual counting - Improve maintainability and reduce magic numbers throughout parser
- Remove duplicate implementation from chat-parser.cpp - Keep single implementation in chat.cpp following llama.cpp patterns - Resolves linker error: multiple definition of common_chat_parse
I had to port llama.cpp function tool calls support. Here is branch of Claude Proxy that you can use with ik_llama.cpp and Claude code. Steps to test this PR
I'm doing more tests in the meantime. |
- Add proper validation that 'function' field is an object before accessing nested keys - Handle missing 'arguments' field gracefully with default "{}" - Prevents crash when parsing malformed tool call JSON structures
- Implement Qwen3 XML parser with <tool_call>{"name": "func", "arguments": {...}}</tool_call> format - Add model detection and routing for Qwen3 vs Kimi-K2 formats - Create 8 comprehensive unit tests covering parsing, streaming, error handling - Fix token format cleaning bug in kimi_k2_parser.hpp processing order - Remove progressive parsing code and related utilities - Add tool injection support for Qwen3 format in server utils
I added Qwen3 tool calling support. |
@ikawrakow I have backported tool calling support. I'm not sure if I can make PR smaller, because the feature in llama.cpp is quite complicated. I suggest using Kimi-K2 model with Claude Code using these steps #628 (comment) It seems to work fine, at least it can call tools when I explicitly ask for it. |
I think there was a lot of interest for this, so hopefully we will have a few people testing the PR. Hopefully today, so I can merge before going on vacation tomorrow. |
@ikawrakow I'll be happy to work on your requests for this PR to get it merged. |
Looking forward to qwen3's tool call |
- Implement complete DeepSeek R1 tool call parsing in common_chat_parser.cpp - Add DeepSeek R1 model detection and tool injection in deepseek_r1_tools.hpp - Update function_calls.hpp with DeepSeek R1 integration and content extraction - Update documentation to reflect support for Kimi-K2, Qwen3, and DeepSeek R1 models - Add comprehensive unit tests for DeepSeek R1 reasoning, tool calls, and integration - Port exact implementation patterns from original llama.cpp for compatibility Key features: - Native DeepSeek R1 format: <|tool▁calls▁begin|>function<|tool▁sep|>name```json{}```<|tool▁call▁end|><|tool▁calls▁end|> - Reasoning content extraction from <think>...</think> tags - Multiple tool calls support with separate call blocks - Model detection for deepseek-r1, deepseek_r1 naming patterns - Integration with incremental parsing and streaming support
I have added DeepSeek-R1 tool calling support.
|
@iSevenDays json-partial.h |
- json-partial.h/cpp: JSON partial parsing functionality - regex-partial.h/cpp: Regex partial parsing functionality
@xldistance thanks for the feedback, the files are there and can be compiled successfully. For those who are testing with Claude Code, here are my suggestions: |
@iSevenDays I use qwen3-coder-480b on top of ccr code |
@xldistance just make sure to set the correct name of LLM in env and in llama-server. I'll check qwen3-coder-480b that was recently uploaded https://huggingface.co/ubergarm/Qwen3-Coder-480B-A35B-Instruct-GGUF/tree/main/IQ2_KS |
- Add test_qwen3_format_chat_integration() to validate tool injection pipeline - Test tool injection conditions and system message enhancement - Verify JSON formatting and anti-preamble instructions - Add comprehensive test documentation Tests confirm tool injection works correctly - conversational preamble issue is not in ik_llama.cpp but likely in UI configuration.
Server was not passing model name to parse_chat_message_incremental(), causing Qwen3 to fall back to Kimi-K2 parser and return tool calls as content instead of proper tool_calls array.
Non-streaming responses were hardcoded to use Kimi-K2 format, causing Qwen3 XML tool calls to be returned as content instead of proper tool_calls array. Now uses same model detection as streaming path for consistency.
Well, I'll just merge it then. |
The implementation adds support for tool calls.
The reason why I think the feature is important is that it allows users of ik_llama.cpp to use this backend with apps like Claude Code that requires tool calls.
By using simple proxy like this one https://github.com/1rgs/claude-code-proxy (I just found it in github), I could connect Claude Code to ik_llama.cpp using Kimi-K2 Q2 LLM provided by ubergarm.
In claude-code-proxy you just have to change .env
OPENAI_API_BASE="http://192.168.0.24:8080/v1"
I had to port llama.cpp function tool calls support. The most difficult part was to port streaming and json healing.