Release v0.8.5rc1 · vllm-project/vllm-ascend

This is the 1st release candidate of v0.8.5 for vllm-ascend. Please follow the official doc to start the journey.

Experimental: Now you can enable V1 egnine by setting the environment variable VLLM_USE_V1=1, see the feature support status of vLLM Ascend in here.

Highlights

Upgrade CANN version to 8.1.RC1 to support chunked prefill and automatic prefix caching (--enable_prefix_caching) when V1 is enabled #747
Optimize Qwen2 VL and Qwen 2.5 VL #701
Improve Deepseek V3 eager mode and graph mode performance, now you can use --additional_config={'enable_graph_mode': True} to enable graph mode. #598 #731

Core

Upgrade vLLM to 0.8.5.post1 #715
Fix early return in CustomDeepseekV2MoE.forward during profile_run #682
Adapts for new quant model generated by modelslim #719
Initial support on P2P Disaggregated Prefill based on llm_datadist #694
Use /vllm-workspace as code path and include .git in container image to fix issue when start vllm under /workspace #726
Optimize NPU memory usage to make DeepSeek R1 W8A8 32K model len work. #728
Fix PYTHON_INCLUDE_PATH typo in setup.py #762

Other

Add Qwen3-0.6B test #717
Add nightly CI #668
Add accuracy test report #542

Known issue

If you are running the DeepSeek with VLLM_USE_V1=1 enabled will encounter call aclnnInplaceCopy failed, Please refer #778 to fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.8.5rc1

Highlights

Core

Other

Known issue

Uh oh!