Simple WordLevel tokenizer #8282
Unanswered
iyubondyrev
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I’m trying to get a basic word-level tokenizer to work with a smaller version of the Phi3ForCasualML model, which only has 2 layers and 4 heads. The vocabulary is pretty small too, only 382 words.
Here's how I've set up my SentencePiece tokenizer:
It basically splits sentences into words and assigns numbers to them. But I’m stuck trying to convert this setup into a .gguf format. I have a tokenizer.model from SentencePiece, but it’s not working right now. Here’s what happens when I try to load the model:
Any ideas on how to make this tokenizer work with the transformer model or how to fix the loading issue? Thanks for any help you can offer!
Beta Was this translation helpful? Give feedback.
All reactions