vLLM Ascend Roadmap Q3 2025

This is a living document! We are eager to know what do you want for vLLM Ascend in Q3 2025. Any feedback is welcome.

We will complete planning roadmap in mid-July.

---

As a vital component of vLLM, the vLLM Ascend project is dedicated to providing an easy, fast, and cheap LLM serving for everyone on Ascend NPU, and to actively contribute to the enrichment of vLLM.

In 2025 Q2, we have focused on 4 themes: vLLM Ascend for Production, Performance Optimization, Key Features, Ecosystem Connect. In 2025 Q3, we will focus on: ***V1 Engine fully supports、Quality and Production ready、User / Developer Experience、Competitive for Key Workflow***.

## 1. V1 Engine fully supports
- [ ] Stable plugin architecture for hardware platforms
- [ ] V1 Engine fully supports and cleanup V0 code path: https://github.com/vllm-project/vllm-ascend/issues/1620
- [ ] Enable CustomOP register: https://github.com/vllm-project/vllm-ascend/pull/1647
- [ ] V1 PP supports

## 2. Quality and Production ready
- Unit test coverage enhancement: https://github.com/vllm-project/vllm-ascend/issues/1298
- Multi-node test

## 3. User / Developer Experience
- [ ] Users doc: https://github.com/vllm-project/vllm-ascend/issues/1248
- [ ] Developer Design doc: https://github.com/vllm-project/vllm-ascend/issues/1248
- [ ] Distributions
- [ ] Perf Dashboard
- [ ] Developer Experience
  - [ ] vLLM commit hash recording: https://github.com/vllm-project/vllm-ascend/pull/1623

## 4. Competitive for Key Workflow

- Large Scale Serving
  - EPLB
      - [ ] Dynamic EPLB: https://github.com/vllm-project/vllm-ascend/pull/1391
      - [ ] static EPLB: https://github.com/vllm-project/vllm-ascend/pull/1116
  - Qwen series (Qwen3 / Qwen3 MoE) optimization https://github.com/vllm-project/vllm-ascend/pull/1245
  - Qwen series (Qwen3 MoE) optimization: https://github.com/vllm-project/vllm-ascend/pull/1381
  - Disaggregated Prefilling
    - [ ] LLMDataDist: https://github.com/vllm-project/vllm-ascend/pull/950
    - [ ] Mooncake: https://github.com/vllm-project/vllm-ascend/pull/1568
 
- RLHF
    - [ ] Performance improvements
    - [ ] Parallel support

- Model
    - [ ] Qwen/DeepSeek/Qwen VL series
    - [ ] Gemma3
    - [ ] New model support https://github.com/vllm-project/vllm-ascend/issues/1608
          - New trending models like: minimax / hunyuan / ERNIE
    - [ ] Quantization support: w4a16/w4a8 for Dense model
    - [ ] Quantization support: w4a16/w4a8 for MoE model
    - [ ] Model format support: gguf

- Others
  - [ ] Atlas 300I series experimental support and perf enhancement: https://github.com/vllm-project/vllm-ascend/pull/1591



---

If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.

Historical Roadmap:
- https://github.com/vllm-project/vllm-ascend/issues/448
- https://github.com/vllm-project/vllm-ascend/issues/71 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM Ascend Roadmap Q3 2025 #1168

1. V1 Engine fully supports

2. Quality and Production ready

3. User / Developer Experience

4. Competitive for Key Workflow

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM Ascend Roadmap Q3 2025 #1168

Description

1. V1 Engine fully supports

2. Quality and Production ready

3. User / Developer Experience

4. Competitive for Key Workflow

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions