ai.djl.engine.EngineException: Out of range: Invalid id at SentencePieceLibrary.decode() problem #3553
JeeDevUser
started this conversation in
General
Replies: 1 comment
-
Can you use |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I am using the
ai.djl.sentencepiece.SpProcessor.encode()
method in order to generate the tokenized input, from the source text.
This is a input to the google-t5/t5-large , which should generate text based on a given beginning
-So, this is what I am using as inputText, to be completed by the model:
String text = "generate: Once upon a time, in a land far away,";
-during inference, the model returns the logits - which represent the probabilities for each word in the vocabulary
-now, by using simple Greedy search, I find the highest probability token, and so I build a sequence of tokens to be decoded, something like that:
[0, 32099, 3, 9, 1322, 623, 550, 6, 16, ...]
-the vocabulary size for the T5 large is : 32128
-but, when trying to decode given output array, by using
ai.djl.sentencepiece.SpProcessor.decode()
I am getting:
What's bothering me is why the token 32099 can't be decoded?
it has a smaller value than the vocabulary size (32128), what's the problem?
Beta Was this translation helpful? Give feedback.
All reactions