Release v0.9.0rc2 · vllm-project/vllm-ascend

This is the 2nd official release candidate of v0.9.0 for vllm-ascend. Please follow the official doc to start the journey. From this release, V1 Engine is recommended to use. The code of V0 Engine is frozen and will not be maintained any more. Please set environment VLLM_USE_V1=1 to enable V1 Engine.

Highlights

DeepSeek works with graph mode now. Follow the official doc to take a try. #789
Qwen series models works with graph mode now. It works by default with V1 Engine. Please note that in this release, only Qwen series models are well tested with graph mode. We'll make it stable and generalize in the next release. If you hit any issues, please feel free to open an issue on GitHub and fallback to eager mode temporarily by set enforce_eager=True when initializing the model.

Core

The performance of multi-step scheduler has been improved. Thanks for the contribution from China Merchants Bank. #814
LoRA、Multi-LoRA And Dynamic Serving is supported for V1 Engine now. Thanks for the contribution from China Merchants Bank. #893
prefix cache and chunked prefill feature works now #782 #844
Spec decode and MTP features work with V1 Engine now. #874 #890
DP feature works with DeepSeek now. #1012
Input embedding feature works with V0 Engine now. #916
Sleep mode feature works with V1 Engine now. #1084

Model

Qwen2.5 VL works with V1 Engine now. #736
LLama4 works now. #740
A new kind of DeepSeek model called dual-batch overlap(DBO) is added. Please set VLLM_ASCEND_ENABLE_DBO=1 to use it. #941

Other

online serve with ascend quantization works now. #877
A batch of bugs for graph mode and moe model have been fixed. #773 #771 #774 #816 #817 #819 #912 #897 #961 #958 #913 #905
A batch of performance improvement PRs have been merged. #784 #803 #966 #839 #970 #947 #987 #1085
From this release, binary wheel package will be released as well. #775
The contributor doc site is added

Known Issue

In some case, vLLM process may be crashed with aclgraph enabled. We're working this issue and it'll be fixed in the next release. #1038
Multi node data-parallel doesn't work with this release. This is a known issue in vllm and has been fixed on main branch. #18981

New Contributors

@chris668899 made their first contribution in #771
@NeverRaR made their first contribution in #789
@cxcxflying made their first contribution in #740
@22dimensions made their first contribution in #835
@wonderful199082 made their first contribution in #814
@yangpuPKU made their first contribution in #937
@ttanzhiqiang made their first contribution in #909
@ponix-j made their first contribution in #874
@XWFAlone made their first contribution in #890
@NINGBENZHE made their first contribution in #896
@momo609 made their first contribution in #970
@David9857 made their first contribution in #947
@depeng1994 made their first contribution in #1013
@hahazhky made their first contribution in #987
@weijinqian0 made their first contribution in #1067
@sdmyzlp made their first contribution in #1091
@zxdukki made their first contribution in #941
@ChenTaoyu-SJTU made their first contribution in #736
@Yuxiao-Xu made their first contribution in #1116

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.9.0rc2