Replies: 1 comment 1 reply
-
@vladfaust What type of decode-only model are you working with? For general use you can do something like this: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How can we tell a decode-only model a "token budget" when running inference loop so it doesn't just stop mid-way once the limit is reached, but somehow "plans ahead" to fit the response into the budget? Thanks for any tips.
Beta Was this translation helpful? Give feedback.
All reactions