Recommended setting for running vLLM for CPU #5672
jerin-scalers-ai
announced in
Q&A
Replies: 1 comment
-
48 cores per instance would do fine, It's performing with almost 10 t/s throughput for single user. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
What are the recommended settings for running vLLM on a CPU to achieve high performance? For instance, if I have a dual-socket server with 96 cores per socket, how many cores (--cpuset-cpus) should be allocated to run multiple replicas of vLLM?
Beta Was this translation helpful? Give feedback.
All reactions