Skip to content

Commit 86d818b

Browse files
wangxiyuanwangxiaoxin (A)
authored andcommitted
[Bugfix] add compilation/__init__.py to fix import error (vllm-project#1152)
1. Add `__init__.py` for vllm_ascend/compilation to make sure it's a python module 2. Fix model runner bug to keep the same with vllm 3. Add release note for 0.9.0rc2 --------- Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1 parent a7de091 commit 86d818b

File tree

5 files changed

+18
-9
lines changed

5 files changed

+18
-9
lines changed

docs/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,10 @@
6969
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
7070
# - main branch: 'main'
7171
# - vX.Y.Z branch: latest vllm-ascend release tag
72-
'vllm_ascend_version': 'v0.9.0rc1',
72+
'vllm_ascend_version': 'v0.9.0rc2',
7373
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
7474
# This value should be updated when cut down release.
75-
'pip_vllm_ascend_version': "0.9.0rc1",
75+
'pip_vllm_ascend_version': "0.9.0rc2",
7676
'pip_vllm_version': "0.9.0",
7777
# CANN image tag
7878
'cann_image_tag': "8.1.rc1-910b-ubuntu22.04-py3.10",

docs/source/developer_guide/versioning_policy.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
2222

2323
| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
2424
|-------------|--------------|------------------|-------------|--------------------|--------------|
25+
| v0.9.0rc2 | v0.9.0 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | |
2526
| v0.9.0rc1 | v0.9.0 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | |
2627
| v0.8.5rc1 | v0.8.5.post1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 | |
2728
| v0.8.4rc2 | v0.8.4 | >= 3.9, < 3.12 | 8.0.0 | 2.5.1 / 2.5.1 | |
@@ -34,6 +35,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
3435

3536
| Date | Event |
3637
|------------|-------------------------------------------|
38+
| 2025.06.10 | Release candidates, v0.9.0rc2 |
3739
| 2025.06.09 | Release candidates, v0.9.0rc1 |
3840
| 2025.05.29 | v0.7.x post release, v0.7.3.post1 |
3941
| 2025.05.08 | v0.7.x Final release, v0.7.3 |
@@ -71,6 +73,7 @@ Usually, each minor version of vLLM (such as 0.7) will correspond to a vLLM Asce
7173
| Branch | Status | Note |
7274
|------------|--------------|--------------------------------------|
7375
| main | Maintained | CI commitment for vLLM main branch and vLLM 0.9.x branch |
76+
| v0.9.1-dev | Maintained | CI commitment for vLLM 0.9.0 and 0.9.1 version |
7477
| v0.7.3-dev | Maintained | CI commitment for vLLM 0.7.3 version |
7578
| v0.7.1-dev | Unmaintained | Replaced by v0.7.3-dev |
7679

docs/source/faqs.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Version Specific FAQs
44

55
- [[v0.7.3.post1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1007)
6-
- [[v0.9.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1115)
6+
- [[v0.9.0rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/1115)
77

88
## General FAQs
99

@@ -69,24 +69,22 @@ If all above steps are not working, feel free to submit a GitHub issue.
6969

7070
### 7. How does vllm-ascend perform?
7171

72-
Currently, only some models are improved. Such as `Qwen2 VL`, `Deepseek V3`. Others are not good enough. In the future, we will support graph mode and custom ops to improve the performance of vllm-ascend. And when the official release of vllm-ascend is released, you can install `mindie-turbo` with `vllm-ascend` to speed up the inference as well.
72+
Currently, only some models are improved. Such as `Qwen2 VL`, `Deepseek V3`. Others are not good enough. From 0.9.0rc2, Qwen and Deepseek works with graph mode to play a good performance. What's more, you can install `mindie-turbo` with `vllm-ascend v0.7.3` to speed up the inference as well.
7373

7474
### 8. How vllm-ascend work with vllm?
7575
vllm-ascend is a plugin for vllm. Basically, the version of vllm-ascend is the same as the version of vllm. For example, if you use vllm 0.7.3, you should use vllm-ascend 0.7.3 as well. For main branch, we will make sure `vllm-ascend` and `vllm` are compatible by each commit.
7676

7777
### 9. Does vllm-ascend support Prefill Disaggregation feature?
7878

79-
Currently, only 1P1D is supported by vllm. For vllm-ascend, it'll be done by [this PR](https://github.com/vllm-project/vllm-ascend/pull/432). For NPND, vllm is not stable and fully supported yet. We will make it stable and supported by vllm-ascend in the future.
79+
Currently, only 1P1D is supported on V0 Engine. For V1 Engine or NPND support, We will make it stable and supported by vllm-ascend in the future.
8080

8181
### 10. Does vllm-ascend support quantization method?
8282

8383
Currently, w8a8 quantization is already supported by vllm-ascend originally on v0.8.4rc2 or higher, If you're using vllm 0.7.3 version, w8a8 quantization is supporeted with the integration of vllm-ascend and mindie-turbo, please use `pip install vllm-ascend[mindie-turbo]`.
8484

8585
### 11. How to run w8a8 DeepSeek model?
8686

87-
Currently, w8a8 DeepSeek is working in process: [support AscendW8A8 quantization](https://github.com/vllm-project/vllm-ascend/pull/511)
88-
89-
Please run DeepSeek with BF16 now, following the [Multi-Node DeepSeek inferencing tutorail](https://vllm-ascend.readthedocs.io/en/main/tutorials/multi_node.html)
87+
Please following the [quantization inferencing tutorail](https://vllm-ascend.readthedocs.io/en/main/tutorials/multi_npu_quantization.html) and replace model to DeepSeek.
9088

9189
### 12. There is not output in log when loading models using vllm-ascend, How to solve it?
9290

docs/source/user_guide/graph_mode.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ outputs = model.generate("Hello, how are you?")
5454
online example:
5555

5656
```shell
57-
vllm serve Qwen/Qwen2-7B-Instruct --additional-config='{"torchair_graph_config": {"enable": True}}'
57+
vllm serve Qwen/Qwen2-7B-Instruct --additional-config='{"torchair_graph_config": {"enable": true}}'
5858
```
5959

6060
You can find more detail about additional config [here](./additional_config.md)

docs/source/user_guide/release_notes.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# Release note
22

3+
## v0.9.0rc2 - 2025.06.10
4+
5+
This release contains some quick fixes for v0.9.0rc1. Please use this release instead of v0.9.0rc1.
6+
7+
### Highlights
8+
9+
- Fix the import error when vllm-ascend is installed without editable way. [#1152](https://github.com/vllm-project/vllm-ascend/pull/1152)
10+
311
## v0.9.0rc1 - 2025.06.09
412

513
This is the 1st release candidate of v0.9.0 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to start the journey. From this release, V1 Engine is recommended to use. The code of V0 Engine is frozen and will not be maintained any more. Please set environment `VLLM_USE_V1=1` to enable V1 Engine.

0 commit comments

Comments
 (0)