Skip to content

Help! Llama.cpp server Stream Freeze current request and continue after processing the new request. #9367

Answered by ngxson
AnonymousVibrate asked this question in Q&A
Discussion options

You must be logged in to vote

You can also try to lower the batch, try -b 32. Be careful that lower batch have big impact on performance.

Also it seems like you're running on CPU, so the default batch size 2048 is significantly long.

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@ngxson
Comment options

ngxson Sep 9, 2024
Collaborator

@steampunque
Comment options

@ngxson
Comment options

ngxson Sep 9, 2024
Collaborator

Answer selected by AnonymousVibrate
@ngxson
Comment options

ngxson Sep 9, 2024
Collaborator

@steampunque
Comment options

@ngxson
Comment options

ngxson Sep 9, 2024
Collaborator

@steampunque
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants