Skip to content

v0.8.4rc2

Pre-release
Pre-release
Compare
Choose a tag to compare
@Yikun Yikun released this 28 Apr 23:09
· 390 commits to main since this release
1fce70a

This is the second release candidate of v0.8.4 for vllm-ascend. Please follow the official doc to start the journey. Some experimental features are included in this version, such as W8A8 quantization and EP/DP support. We'll make them stable enough in the next release.

Highlights

  • Qwen3 and Qwen3MOE is supported now. Please follow the official doc to run the quick demo. #709
  • Ascend W8A8 quantization method is supported now. Please take the official doc for example. Any feedback is welcome. #580
  • DeepSeek V3/R1 works with DP, TP and MTP now. Please note that it's still in experimental status. Let us know if you hit any problem. #429 #585 #626 #636 #671

Core

  • Torch.compile feature is supported with V1 engine now. It's disabled by default because this feature rely on CANN 8.1 release. We'll make it avaiable by default in the next release #426
  • Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. #661

Other

  • MiniCPM model works now. #645
  • openEuler container image supported with v0.8.4-openeuler tag and customs Ops build is enabled by default for openEuler OS. #689
  • Fix ModuleNotFoundError bug to make Lora work #600
  • Add "Using EvalScope evaluation" doc #611
  • Add a VLLM_VERSION environment to make vLLM version configurable to help developer set correct vLLM version if the code of vLLM is changed by hand locally. #651