·
2 commits
to v0.7.3-dev
since this release
This is the first post release of 0.7.3. Please follow the official doc to start the journey. It includes the following changes:
Highlights
- Qwen3 and Qwen3MOE is supported now. The performance and accuracy of Qwen3 is well tested. You can try it now. Mindie Turbo is recomanded to improve the performance of Qwen3. #903 #915
- Added a new performance guide. The guide aims to help users to improve vllm-ascend performance on system level. It includes OS configuration, library optimization, deploy guide and so on. #878 Doc Link
Bug Fix
- Qwen2.5-VL works for RLHF scenarios now. #928
- Users can launch the model from online weights now. e.g. from huggingface or modelscope directly #858 #918
- The meaningless log info
UserWorkspaceSize0
has been cleaned. #911 - The log level for
Failed to import vllm_ascend_C
has been changed towarning
instead oferror
. #956 - DeepSeek MLA now works with chunked prefill in V1 Engine. Please note that V1 engine in 0.7.3 is just expermential and only for test usage. #849 #936