Replies: 1 comment
-
it seems it's in the scheduler update class, it generates batches of output for the various prompt under process, but the sequence is generated in blocks, not one token at a time, so there may be some waste if a sequence doesn't match the constraints |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Just dug into the code and looks like the way to obtain forward logits is via the
llm.llm_engine.step()
. I am trying to integrate this api into outlines. But I feel I would need to manage things like unfinished requests and the scheduler.Is there a simple way to just call
model.forward()
ormodel.__call__()
equivalent in hf transformers?Beta Was this translation helpful? Give feedback.
All reactions