Description
This is a living document! We are eager to know what do you want for vLLM Ascend in Q3 2025. Any feedback is welcome.
We will complete planning roadmap in mid-July.
As a vital component of vLLM, the vLLM Ascend project is dedicated to providing an easy, fast, and cheap LLM serving for everyone on Ascend NPU, and to actively contribute to the enrichment of vLLM.
In 2025 Q2, we have focused on 4 themes: vLLM Ascend for Production, Performance Optimization, Key Features, Ecosystem Connect. In 2025 Q3, we will focus on: V1 Engine fully supports、Quality and Production ready、User / Developer Experience、Competitive for Key Workflow.
1. V1 Engine fully supports
- Stable plugin architecture for hardware platforms
- V1 Engine fully supports and cleanup V0 code path: [Feature]: Enable V1 by default and cleanup V0 code #1620
- Enable CustomOP register: [CustomOP][Refactor] Register CustomOP instead of overwrite forward_oot #1647
- V1 PP supports
2. Quality and Production ready
- Unit test coverage enhancement: [RFC]: Unit test coverage improvement #1298
- Multi-node test
3. User / Developer Experience
- Users doc: [RFC]: Doc enhancement #1248
- Developer Design doc: [RFC]: Doc enhancement #1248
- Distributions
- Perf Dashboard
- Developer Experience
- vLLM commit hash recording: Record vLLM commit in PR description #1623
4. Competitive for Key Workflow
-
Large Scale Serving
- EPLB
- Dynamic EPLB: [Feature] Dynamic Expert Load Balance Zero-like-overhead #1391
- static EPLB: Add static EPLB #1116
- Qwen series (Qwen3 / Qwen3 MoE) optimization [Perf] Optimize perf of Qwen3 #1245
- Qwen series (Qwen3 MoE) optimization: [Bugfix] Support Qwen3-MOE on aclgraph mode #1381
- Disaggregated Prefilling
- EPLB
-
RLHF
- Performance improvements
- Parallel support
-
Model
- Qwen/DeepSeek/Qwen VL series
- Gemma3
- New model support vLLM Ascend Model Support Priority #1608
- New trending models like: minimax / hunyuan / ERNIE - Quantization support: w4a16/w4a8 for Dense model
- Quantization support: w4a16/w4a8 for MoE model
- Model format support: gguf
-
Others
- Atlas 300I series experimental support and perf enhancement: [Performance] Disable JIT and nd2nz to improve performance for Altlas 300I series #1591
If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
Historical Roadmap: