Skip to content

v0.8.5rc1

Pre-release
Pre-release
Compare
Choose a tag to compare
@Yikun Yikun released this 06 May 15:53
· 364 commits to main since this release
ec27af3

This is the 1st release candidate of v0.8.5 for vllm-ascend. Please follow the official doc to start the journey.

Experimental: Now you can enable V1 egnine by setting the environment variable VLLM_USE_V1=1, see the feature support status of vLLM Ascend in here.

Highlights

  • Upgrade CANN version to 8.1.RC1 to support chunked prefill and automatic prefix caching (--enable_prefix_caching) when V1 is enabled #747
  • Optimize Qwen2 VL and Qwen 2.5 VL #701
  • Improve Deepseek V3 eager mode and graph mode performance, now you can use --additional_config={'enable_graph_mode': True} to enable graph mode. #598 #731

Core

  • Upgrade vLLM to 0.8.5.post1 #715
  • Fix early return in CustomDeepseekV2MoE.forward during profile_run #682
  • Adapts for new quant model generated by modelslim #719
  • Initial support on P2P Disaggregated Prefill based on llm_datadist #694
  • Use /vllm-workspace as code path and include .git in container image to fix issue when start vllm under /workspace #726
  • Optimize NPU memory usage to make DeepSeek R1 W8A8 32K model len work. #728
  • Fix PYTHON_INCLUDE_PATH typo in setup.py #762

Other

  • Add Qwen3-0.6B test #717
  • Add nightly CI #668
  • Add accuracy test report #542

Known issue

  • If you are running the DeepSeek with VLLM_USE_V1=1 enabled will encounter call aclnnInplaceCopy failed, Please refer #778 to fix.