v0.8.5rc1
Pre-release
Pre-release
This is the 1st release candidate of v0.8.5 for vllm-ascend. Please follow the official doc to start the journey.
Experimental: Now you can enable V1 egnine by setting the environment variable VLLM_USE_V1=1
, see the feature support status of vLLM Ascend in here.
Highlights
- Upgrade CANN version to 8.1.RC1 to support chunked prefill and automatic prefix caching (
--enable_prefix_caching
) when V1 is enabled #747 - Optimize Qwen2 VL and Qwen 2.5 VL #701
- Improve Deepseek V3 eager mode and graph mode performance, now you can use
--additional_config={'enable_graph_mode': True}
to enable graph mode. #598 #731
Core
- Upgrade vLLM to 0.8.5.post1 #715
- Fix early return in CustomDeepseekV2MoE.forward during profile_run #682
- Adapts for new quant model generated by modelslim #719
- Initial support on P2P Disaggregated Prefill based on llm_datadist #694
- Use
/vllm-workspace
as code path and include.git
in container image to fix issue when start vllm under/workspace
#726 - Optimize NPU memory usage to make DeepSeek R1 W8A8 32K model len work. #728
- Fix
PYTHON_INCLUDE_PATH
typo in setup.py #762
Other
Known issue
- If you are running the DeepSeek with
VLLM_USE_V1=1
enabled will encountercall aclnnInplaceCopy failed
, Please refer #778 to fix.