Replies: 1 comment
-
Unfortunately, there is no cancellation support implemented yet. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I plan to run llama-server locally with the cuda backend to service autocomplete requests. I only care about the last-sent-request.
Is there a way to configure llama-server to "cancel all" requests when a new one is received?
Maybe I could set drop head queue with max size 1. Super best would be to actually cancel cuda before decoding finishes.
Beta Was this translation helpful? Give feedback.
All reactions