v0.7.3rc2

Pre-release

Pre-release

wangxiyuan released this 29 Mar 01:12

· 59 commits to v0.7.3-dev since this release

00459ae

This is 2nd release candidate of v0.7.3 for vllm-ascend. Please follow the official doc to start the journey.

Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html

Highlights

Add Ascend Custom Ops framewrok. Developers now can write customs ops using AscendC. An example ops rotary_embedding is added. More tutorials will come soon. The Custome Ops complation is disabled by default when installing vllm-ascend. Set COMPILE_CUSTOM_KERNELS=1 to enable it. #371
V1 engine is basic supported in this release. The full support will be done in 0.8.X release. If you hit any issue or have any requirement of V1 engine. Please tell us here. #376
Prefix cache feature works now. You can set enable_prefix_caching=True to enable it. #282

Core

Bump torch_npu version to dev20250320.3 to improve accuracy to fix !!! output problem. #406

Model

The performance of Qwen2-vl is improved by optimizing patch embedding (Conv3D). #398

Other

Fixed a bug to make sure multi step scheduler feature work. #349
Fixed a bug to make prefix cache feature works with correct accuracy. #424

Assets 2