Skip to content

v0.7.3rc2

Pre-release
Pre-release
Compare
Choose a tag to compare
@wangxiyuan wangxiyuan released this 29 Mar 01:12
· 59 commits to v0.7.3-dev since this release
00459ae

This is 2nd release candidate of v0.7.3 for vllm-ascend. Please follow the official doc to start the journey.

Highlights

  • Add Ascend Custom Ops framewrok. Developers now can write customs ops using AscendC. An example ops rotary_embedding is added. More tutorials will come soon. The Custome Ops complation is disabled by default when installing vllm-ascend. Set COMPILE_CUSTOM_KERNELS=1 to enable it. #371
  • V1 engine is basic supported in this release. The full support will be done in 0.8.X release. If you hit any issue or have any requirement of V1 engine. Please tell us here. #376
  • Prefix cache feature works now. You can set enable_prefix_caching=True to enable it. #282

Core

  • Bump torch_npu version to dev20250320.3 to improve accuracy to fix !!! output problem. #406

Model

  • The performance of Qwen2-vl is improved by optimizing patch embedding (Conv3D). #398

Other

  • Fixed a bug to make sure multi step scheduler feature work. #349
  • Fixed a bug to make prefix cache feature works with correct accuracy. #424