Special token support #3466

Dampfinchen · 2023-10-04T06:15:24Z

Dampfinchen
Oct 4, 2023

A few days ago, Open Orca released a new model called Mistral-7B-Openorca. This uses the ChatML format which has <|im_end|> as a special EOS token that is currently not recognized by llama.cpp

This is the format in question.

https://github.com/openai/openai-python/blob/main/chatml.md

model

https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca/commit/17572416df27482d71dda9ea6bdea1733d8cee5d

TechnotechGit · 2023-10-04T09:36:23Z

TechnotechGit
Oct 4, 2023

This might be better suggested as an issue? If you use llama cpp python, I think you should be able to get over the issue by using the actual hf tokeniser using something like huggingface/tokenizers or simply transformers itself, but it would be nice to have a solution to this baked into llama.cpp imo.

0 replies

staviq · 2023-10-04T11:09:42Z

staviq
Oct 4, 2023

This has already been noticed, and couple other problems with mistral-openorca too, see #3454, #3455 and #346

Quick and dirty solution and "proper" prompt format parameters: #3455 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Special token support #3466

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Special token support #3466

Uh oh!

Uh oh!

Dampfinchen Oct 4, 2023

Replies: 2 comments

Uh oh!

TechnotechGit Oct 4, 2023

Uh oh!

staviq Oct 4, 2023

Dampfinchen
Oct 4, 2023

TechnotechGit
Oct 4, 2023

staviq
Oct 4, 2023