Skip to content

v0.7.3

Compare
Choose a tag to compare
@Yikun Yikun released this 08 May 13:38
· 18 commits to v0.7.3-dev since this release
779eebb

🎉 Hello, World!

We are excited to announce the release of 0.7.3 for vllm-ascend. This is the first official release. The functionality, performance, and stability of this release are fully tested and verified. We encourage you to try it out and provide feedback. We'll post bug fix versions in the future if needed. Please follow the official doc to start the journey.

Highlights

  • This release includes all features landed in the previous release candidates (v0.7.1rc1, v0.7.3rc1, v0.7.3rc2). And all the features are fully tested and verified. Visit the official doc the get the detail feature and model support matrix.
  • Upgrade CANN to 8.1.RC1 to enable chunked prefill and automatic prefix caching features. You can now enable them now.
  • Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. #662
  • Integrate MindIE Turbo into vLLM Ascend to improve DeepSeek V3/R1, Qwen 2 series performance. #708

Core

  • LoRA、Multi-LoRA And Dynamic Serving is supported now. The performance will be improved in the next release. Please follow the official doc for more usage information. Thanks for the contribution from China Merchants Bank. #700

Model

  • The performance of Qwen2 vl and Qwen2.5 vl is improved. #702
  • The performance of apply_penalties and topKtopP ops are improved. #525

Other

  • Fixed a issue that may lead CPU memory leak. #691 #712
  • A new environment SOC_VERSION is added. If you hit any soc detection erro when building with custom ops enabled, please set SOC_VERSION to a suitable value. #606
  • openEuler container image supported with v0.7.3-openeuler tag. #665
  • Prefix cache feature works on V1 engine now. #559