Skip to content

v0.9.0rc2

Pre-release
Pre-release
Compare
Choose a tag to compare
@wangxiyuan wangxiyuan released this 10 Jun 14:29
· 242 commits to main since this release
8dd686d

This is the 2nd official release candidate of v0.9.0 for vllm-ascend. Please follow the official doc to start the journey. From this release, V1 Engine is recommended to use. The code of V0 Engine is frozen and will not be maintained any more. Please set environment VLLM_USE_V1=1 to enable V1 Engine.

Highlights

  • DeepSeek works with graph mode now. Follow the official doc to take a try. #789
  • Qwen series models works with graph mode now. It works by default with V1 Engine. Please note that in this release, only Qwen series models are well tested with graph mode. We'll make it stable and generalize in the next release. If you hit any issues, please feel free to open an issue on GitHub and fallback to eager mode temporarily by set enforce_eager=True when initializing the model.

Core

  • The performance of multi-step scheduler has been improved. Thanks for the contribution from China Merchants Bank. #814
  • LoRA、Multi-LoRA And Dynamic Serving is supported for V1 Engine now. Thanks for the contribution from China Merchants Bank. #893
  • prefix cache and chunked prefill feature works now #782 #844
  • Spec decode and MTP features work with V1 Engine now. #874 #890
  • DP feature works with DeepSeek now. #1012
  • Input embedding feature works with V0 Engine now. #916
  • Sleep mode feature works with V1 Engine now. #1084

Model

  • Qwen2.5 VL works with V1 Engine now. #736
  • LLama4 works now. #740
  • A new kind of DeepSeek model called dual-batch overlap(DBO) is added. Please set VLLM_ASCEND_ENABLE_DBO=1 to use it. #941

Other

Known Issue

  • In some case, vLLM process may be crashed with aclgraph enabled. We're working this issue and it'll be fixed in the next release. #1038
  • Multi node data-parallel doesn't work with this release. This is a known issue in vllm and has been fixed on main branch. #18981

New Contributors