Replies: 1 comment
-
Because it is the first word with umlaut that appears in the WikiText-2 dataset that we use to test tokenizers and at that time it was causing the BPE tokenizers to fail, so I added it to the list, similar to all other failing words. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
https://github.com/ggerganov/llama.cpp/blob/84ec8a58f7b6aad6887bbfbd1321f3ff417341a5/convert_hf_to_gguf_update.py#L279
And if you want another option:
#11600
Beta Was this translation helpful? Give feedback.
All reactions