Replies: 1 comment
-
The server example use fetch & AbortController, it should work in the token generating. However the prompt processing is usually one-run, so it may not be interrupted immediately unless n_batch is set to a small value. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to stop ./server inference on the middle of the process without any good results so far.
I'm stuck here. Can someone explain me better how can I "stop" the the process when requesting a streamed inference on the server?
I'm using javascript's Fetch.
Beta Was this translation helpful? Give feedback.
All reactions