Intel Xeon CPU Max 9480 (OpenVINO) performance? #9486
randomqhacker
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Getting less than 4 tok/s on a 78B Qwen2.5 finetune quantized to ~43G on 1 x Intel Xeon CPU Max 9480 (HBM only) with OpenVINO backend. pcm-memory confirms that HBM throughput reaches ~170GB/s. My understanding is the practical HBM read bandwidth limit on these should be closer to 575GB/s.
Also tested Qwen2.5 7B int8, and got ~21 tok/s at ~155GB/s HBM throughput.
Is this typical throughput with vLLM + OpenVINO on Xeon CPU Max? Or Xeon in general? Any tips?
Beta Was this translation helpful? Give feedback.
All reactions