v0.8.4rc2
Pre-release
Pre-release
This is the second release candidate of v0.8.4 for vllm-ascend. Please follow the official doc to start the journey. Some experimental features are included in this version, such as W8A8 quantization and EP/DP support. We'll make them stable enough in the next release.
Highlights
- Qwen3 and Qwen3MOE is supported now. Please follow the official doc to run the quick demo. #709
- Ascend W8A8 quantization method is supported now. Please take the official doc for example. Any feedback is welcome. #580
- DeepSeek V3/R1 works with DP, TP and MTP now. Please note that it's still in experimental status. Let us know if you hit any problem. #429 #585 #626 #636 #671
Core
- Torch.compile feature is supported with V1 engine now. It's disabled by default because this feature rely on CANN 8.1 release. We'll make it avaiable by default in the next release #426
- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. #661
Other
- MiniCPM model works now. #645
- openEuler container image supported with
v0.8.4-openeuler
tag and customs Ops build is enabled by default for openEuler OS. #689 - Fix ModuleNotFoundError bug to make Lora work #600
- Add "Using EvalScope evaluation" doc #611
- Add a
VLLM_VERSION
environment to make vLLM version configurable to help developer set correct vLLM version if the code of vLLM is changed by hand locally. #651