Llama.cpp server stopping #3513

appvoid · 2023-10-05T12:04:04Z

appvoid
Oct 5, 2023

I'm trying to stop ./server inference on the middle of the process without any good results so far.

I'm stuck here. Can someone explain me better how can I "stop" the the process when requesting a streamed inference on the server?

I'm using javascript's Fetch.

jhen0409 · 2023-10-06T23:17:24Z

jhen0409
Oct 6, 2023
Collaborator

The server example use fetch & AbortController, it should work in the token generating. However the prompt processing is usually one-run, so it may not be interrupted immediately unless n_batch is set to a small value.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama.cpp server stopping #3513

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Llama.cpp server stopping #3513

Uh oh!

appvoid Oct 5, 2023

Replies: 1 comment

Uh oh!

jhen0409 Oct 6, 2023 Collaborator

appvoid
Oct 5, 2023

jhen0409
Oct 6, 2023
Collaborator