Replies: 1 comment 1 reply
-
Do you use 1 or multiple slots? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a llama-server demo that's been running for about a month on an AWS Graviton instance (c8g.24xlarge), and over time the performance went from 60 TPS generation to about 5 TPS. I restarted the server process and performance went back to 60 TPS.
I checked log files and didn't notice any excessive single-file log use or anything like that. Basically the only thing I noticed was that resting CPU was about 11% before I restarted llama-server, at which point CPU use dropped back to near zero resting.
Has anyone here run llama-server for a significant period of uptime? Are there any known processes/files that snowball?
Beta Was this translation helpful? Give feedback.
All reactions