mtmd : Support jinja in libmtmd (Only for QwenVL and Qwen Omni) #14730
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
That code is part of a private repo I’ve been working on. It provides essential support for Jinja in a multi-model setup.
The PR adds two new optional metadata fields for GGUF:
tokenizer.ggml.image_token_id
:For the image token, if it exists.tokenizer.ggml.audio_token_id
: For the audio token, if it exists.If these tokens do not exist, a fallback is used, similar to the FIM lookup. The current tokens used for images are
<|IMAGE|>
and<IMAGE>
For the MTMD tokenizer, I maintained backward compatibility and updated the split function to support multiple delimiters, allowing it to work with both the old marker and the preserved tokens.
One final change (only for Qwen models): I removed the
image_start
andimage_end
tokens as the model has its own special tokens already.