v0.7.3rc2
Pre-release
Pre-release
·
59 commits
to v0.7.3-dev
since this release
This is 2nd release candidate of v0.7.3 for vllm-ascend. Please follow the official doc to start the journey.
- Quickstart with container: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/quick_start.html
- Installation: https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/installation.html
Highlights
- Add Ascend Custom Ops framewrok. Developers now can write customs ops using AscendC. An example ops
rotary_embedding
is added. More tutorials will come soon. The Custome Ops complation is disabled by default when installing vllm-ascend. SetCOMPILE_CUSTOM_KERNELS=1
to enable it. #371 - V1 engine is basic supported in this release. The full support will be done in 0.8.X release. If you hit any issue or have any requirement of V1 engine. Please tell us here. #376
- Prefix cache feature works now. You can set
enable_prefix_caching=True
to enable it. #282
Core
- Bump torch_npu version to dev20250320.3 to improve accuracy to fix
!!!
output problem. #406
Model
- The performance of Qwen2-vl is improved by optimizing patch embedding (Conv3D). #398