Enhancing LLM Serving with ZenTorch on AMD Gen5 CPUs #13174
Manoj-red-hat
announced in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
With the recent advancements in ZenTorch, PyTorch workloads have seen significant speedups, particularly on AMD's latest Genoa and Turin (Gen5) CPUs (Hugging Face + AMD blog). This presents a great opportunity for optimizing LLM inference on CPU-based deployments.
I am already working on this and can lead the effort to integrate ZenTorch into vLLM, enabling enhanced serving performance for users leveraging AMD’s latest hardware. This could provide a highly efficient, cost-effective solution for CPU-based LLM inference, especially in environments where GPUs are constrained.
Would love to discuss how we can collaborate on this!
Beta Was this translation helpful? Give feedback.
All reactions