Special token support #3466
Dampfinchen
started this conversation in
Ideas
Replies: 2 comments
-
This might be better suggested as an issue? If you use llama cpp python, I think you should be able to get over the issue by using the actual hf tokeniser using something like huggingface/tokenizers or simply transformers itself, but it would be nice to have a solution to this baked into llama.cpp imo. |
Beta Was this translation helpful? Give feedback.
0 replies
-
This has already been noticed, and couple other problems with mistral-openorca too, see #3454, #3455 and #346 Quick and dirty solution and "proper" prompt format parameters: #3455 (comment) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A few days ago, Open Orca released a new model called Mistral-7B-Openorca. This uses the ChatML format which has <|im_end|> as a special EOS token that is currently not recognized by llama.cpp
This is the format in question.
https://github.com/openai/openai-python/blob/main/chatml.md
model
https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/commit/17572416df27482d71dda9ea6bdea1733d8cee5d
Beta Was this translation helpful? Give feedback.
All reactions