Release v0.7.3 · vllm-project/vllm-ascend

🎉 Hello, World!

We are excited to announce the release of 0.7.3 for vllm-ascend. This is the first official release. The functionality, performance, and stability of this release are fully tested and verified. We encourage you to try it out and provide feedback. We'll post bug fix versions in the future if needed. Please follow the official doc to start the journey.

Highlights

This release includes all features landed in the previous release candidates (v0.7.1rc1, v0.7.3rc1, v0.7.3rc2). And all the features are fully tested and verified. Visit the official doc the get the detail feature and model support matrix.
Upgrade CANN to 8.1.RC1 to enable chunked prefill and automatic prefix caching features. You can now enable them now.
Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. #662
Integrate MindIE Turbo into vLLM Ascend to improve DeepSeek V3/R1, Qwen 2 series performance. #708

Core

LoRA、Multi-LoRA And Dynamic Serving is supported now. The performance will be improved in the next release. Please follow the official doc for more usage information. Thanks for the contribution from China Merchants Bank. #700

Model

The performance of Qwen2 vl and Qwen2.5 vl is improved. #702
The performance of apply_penalties and topKtopP ops are improved. #525

Other

Fixed a issue that may lead CPU memory leak. #691 #712
A new environment SOC_VERSION is added. If you hit any soc detection erro when building with custom ops enabled, please set SOC_VERSION to a suitable value. #606
openEuler container image supported with v0.7.3-openeuler tag. #665
Prefix cache feature works on V1 engine now. #559

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.7.3

Highlights

Core

Model

Other

Uh oh!