v0.9.0rc2
Pre-release
Pre-release
This is the 2nd official release candidate of v0.9.0 for vllm-ascend. Please follow the official doc to start the journey. From this release, V1 Engine is recommended to use. The code of V0 Engine is frozen and will not be maintained any more. Please set environment VLLM_USE_V1=1
to enable V1 Engine.
Highlights
- DeepSeek works with graph mode now. Follow the official doc to take a try. #789
- Qwen series models works with graph mode now. It works by default with V1 Engine. Please note that in this release, only Qwen series models are well tested with graph mode. We'll make it stable and generalize in the next release. If you hit any issues, please feel free to open an issue on GitHub and fallback to eager mode temporarily by set
enforce_eager=True
when initializing the model.
Core
- The performance of multi-step scheduler has been improved. Thanks for the contribution from China Merchants Bank. #814
- LoRA、Multi-LoRA And Dynamic Serving is supported for V1 Engine now. Thanks for the contribution from China Merchants Bank. #893
- prefix cache and chunked prefill feature works now #782 #844
- Spec decode and MTP features work with V1 Engine now. #874 #890
- DP feature works with DeepSeek now. #1012
- Input embedding feature works with V0 Engine now. #916
- Sleep mode feature works with V1 Engine now. #1084
Model
- Qwen2.5 VL works with V1 Engine now. #736
- LLama4 works now. #740
- A new kind of DeepSeek model called dual-batch overlap(DBO) is added. Please set
VLLM_ASCEND_ENABLE_DBO=1
to use it. #941
Other
- online serve with ascend quantization works now. #877
- A batch of bugs for graph mode and moe model have been fixed. #773 #771 #774 #816 #817 #819 #912 #897 #961 #958 #913 #905
- A batch of performance improvement PRs have been merged. #784 #803 #966 #839 #970 #947 #987 #1085
- From this release, binary wheel package will be released as well. #775
- The contributor doc site is added
Known Issue
- In some case, vLLM process may be crashed with aclgraph enabled. We're working this issue and it'll be fixed in the next release. #1038
- Multi node data-parallel doesn't work with this release. This is a known issue in vllm and has been fixed on main branch. #18981
New Contributors
- @chris668899 made their first contribution in #771
- @NeverRaR made their first contribution in #789
- @cxcxflying made their first contribution in #740
- @22dimensions made their first contribution in #835
- @wonderful199082 made their first contribution in #814
- @yangpuPKU made their first contribution in #937
- @ttanzhiqiang made their first contribution in #909
- @ponix-j made their first contribution in #874
- @XWFAlone made their first contribution in #890
- @NINGBENZHE made their first contribution in #896
- @momo609 made their first contribution in #970
- @David9857 made their first contribution in #947
- @depeng1994 made their first contribution in #1013
- @hahazhky made their first contribution in #987
- @weijinqian0 made their first contribution in #1067
- @sdmyzlp made their first contribution in #1091
- @zxdukki made their first contribution in #941
- @ChenTaoyu-SJTU made their first contribution in #736
- @Yuxiao-Xu made their first contribution in #1116