Recommended setting for running vLLM for CPU #5672

jerin-scalers-ai · 2024-06-19T06:27:04Z

jerin-scalers-ai
Jun 19, 2024

What are the recommended settings for running vLLM on a CPU to achieve high performance? For instance, if I have a dual-socket server with 96 cores per socket, how many cores (--cpuset-cpus) should be allocated to run multiple replicas of vLLM?

akhilreddy0703 · 2024-08-07T19:53:09Z

akhilreddy0703
Aug 7, 2024

48 cores per instance would do fine, It's performing with almost 10 t/s throughput for single user.
Echoswift is a performance benchmark tool for self hosted LLMs, currently supports TGI,vLLM,Llamacpp and Ollama
It's very useful to perform comparative tests to find out the best container size based on the latency and throughput.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Recommended setting for running vLLM for CPU #5672

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Recommended setting for running vLLM for CPU #5672

Uh oh!

Uh oh!

jerin-scalers-ai Jun 19, 2024

Replies: 1 comment

Uh oh!

Uh oh!

akhilreddy0703 Aug 7, 2024

jerin-scalers-ai
Jun 19, 2024

akhilreddy0703
Aug 7, 2024