Skip to content

Commit ec27af3

Browse files
authored
[Doc] Add 0.8.5rc1 release note (#756)
### What this PR does / why we need it? Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI passed --------- Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1 parent 2cd036e commit ec27af3

File tree

7 files changed

+45
-26
lines changed

7 files changed

+45
-26
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ By using vLLM Ascend plugin, popular open-source models, including Transformer-l
3737
- OS: Linux
3838
- Software:
3939
* Python >= 3.9, < 3.12
40-
* CANN >= 8.1.rc1
40+
* CANN >= 8.1.RC1
4141
* PyTorch >= 2.5.1, torch-npu >= 2.5.1
4242
* vLLM (the same version as vllm-ascend)
4343

docs/source/conf.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -63,15 +63,15 @@
6363
# the branch of vllm, used in vllm clone
6464
# - main branch: 'main'
6565
# - vX.Y.Z branch: 'vX.Y.Z'
66-
'vllm_version': 'v0.8.4',
66+
'vllm_version': 'v0.8.5.post1',
6767
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
6868
# - main branch: 'main'
6969
# - vX.Y.Z branch: latest vllm-ascend release tag
70-
'vllm_ascend_version': 'v0.8.4rc2',
70+
'vllm_ascend_version': 'v0.8.5rc1',
7171
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
7272
# This value should be updated when cut down release.
73-
'pip_vllm_ascend_version': "0.8.4rc2",
74-
'pip_vllm_version': "0.8.4",
73+
'pip_vllm_ascend_version': "0.8.5rc1",
74+
'pip_vllm_version': "0.8.5.post1",
7575
# CANN image tag
7676
'cann_image_tag': "8.1.rc1-910b-ubuntu22.04-py3.10",
7777
}

docs/source/developer_guide/versioning_policy.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
8080

8181
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
8282
|--------------|--------------|----------------| --- | --- |
83+
| v0.8.5rc1 | v0.8.5.post1 | >= 3.9, < 3.12 | 8.1.RC1 | 2.5.1 / 2.5.1 |
8384
| v0.8.4rc2 | v0.8.4 | >= 3.9, < 3.12 | 8.0.0 | 2.5.1 / 2.5.1 |
8485
| v0.8.4rc1 | v0.8.4 | >= 3.9, < 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250320 |
8586
| v0.7.3rc2 | v0.7.3 | >= 3.9, < 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250320 |
@@ -92,7 +93,8 @@ Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
9293

9394
| Date | Event |
9495
|------------|-------------------------------------------|
95-
| End of 2025.04 | v0.7.x Final release, v0.7.3 |
96+
| Early of 2025.05 | v0.7.x Final release, v0.7.3 |
97+
| 2025.05.06 | Release candidates, v0.8.5rc1 |
9698
| 2025.04.28 | Release candidates, v0.8.4rc2 |
9799
| 2025.04.18 | Release candidates, v0.8.4rc1 |
98100
| 2025.03.28 | Release candidates, v0.7.3rc2 |

docs/source/faqs.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
- [[v0.7.3rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/418)
88
- [[v0.8.4rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/546)
99
- [[v0.8.4rc2] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/707)
10+
- [[v0.8.5rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/754)
1011

1112
## General FAQs
1213

docs/source/installation.md

Lines changed: 2 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ This document describes how to install vllm-ascend manually.
1111

1212
| Software | Supported version | Note |
1313
|-----------|-------------------|----------------------------------------|
14-
| CANN | >= 8.1.rc1 | Required for vllm-ascend and torch-npu |
14+
| CANN | >= 8.1.RC1 | Required for vllm-ascend and torch-npu |
1515
| torch-npu | >= 2.5.1 | Required for vllm-ascend |
1616
| torch | >= 2.5.1 | Required for torch-npu and vllm |
1717

@@ -135,23 +135,12 @@ Then you can install `vllm` and `vllm-ascend` from **pre-built wheel**:
135135
:substitutions:
136136
137137
# Install vllm-project/vllm from pypi
138-
# (v0.8.4 aarch64 is unsupported see detail in below note)
139-
# pip install vllm==|pip_vllm_version|
140-
# Install vLLM
141-
git clone --depth 1 --branch |vllm_version| https://github.com/vllm-project/vllm
142-
cd vllm
143-
VLLM_TARGET_DEVICE=empty pip install -v -e .
144-
cd ..
138+
pip install vllm==|pip_vllm_version|
145139
146140
# Install vllm-project/vllm-ascend from pypi.
147141
pip install vllm-ascend==|pip_vllm_ascend_version|
148142
```
149143

150-
```{note}
151-
There was a installation bug on vLLM v0.8.4 aarch64: [No matching distribution found for triton](https://github.com/vllm-project/vllm-ascend/issues/581).
152-
If you failed to install vLLM due to it, please build from source code.
153-
```
154-
155144
:::{dropdown} Click here to see "Build from source code"
156145
or build from **source code**:
157146

docs/source/user_guide/release_notes.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,28 @@
11
# Release note
22

3+
## v0.8.5rc1
4+
5+
This is the 1st release candidate of v0.8.5 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to start the journey. Now you can enable V1 egnine by setting the environment variable `VLLM_USE_V1=1`, see the feature support status of vLLM Ascend in [here](https://vllm-ascend.readthedocs.io/en/latest/user_guide/suppoted_features.html).
6+
7+
### Highlights
8+
- Upgrade CANN version to 8.1.RC1 to support chunked prefill and automatic prefix caching (`--enable_prefix_caching`) when V1 is enabled [#747](https://github.com/vllm-project/vllm-ascend/pull/747)
9+
- Optimize Qwen2 VL and Qwen 2.5 VL [#701](https://github.com/vllm-project/vllm-ascend/pull/701)
10+
- Improve Deepseek V3 eager mode and graph mode performance, now you can use --additional_config={'enable_graph_mode': True} to enable graph mode. [#598](https://github.com/vllm-project/vllm-ascend/pull/598) [#719](https://github.com/vllm-project/vllm-ascend/pull/719)
11+
12+
### Core
13+
- Upgrade vLLM to 0.8.5.post1 [#715](https://github.com/vllm-project/vllm-ascend/pull/715)
14+
- Fix early return in CustomDeepseekV2MoE.forward during profile_run [#682](https://github.com/vllm-project/vllm-ascend/pull/682)
15+
- Adapts for new quant model generated by modelslim [#719](https://github.com/vllm-project/vllm-ascend/pull/719)
16+
- Initial support on P2P Disaggregated Prefill based on llm_datadist [#694](https://github.com/vllm-project/vllm-ascend/pull/694)
17+
- Use `/vllm-workspace` as code path and include `.git` in container image to fix issue when start vllm under `/workspace` [#726](https://github.com/vllm-project/vllm-ascend/pull/726)
18+
- Optimize NPU memory usage to make DeepSeek R1 W8A8 32K model len work. [#728](https://github.com/vllm-project/vllm-ascend/pull/728)
19+
- Fix `PYTHON_INCLUDE_PATH` typo in setup.py [#762](https://github.com/vllm-project/vllm-ascend/pull/762)
20+
21+
### Other
22+
- Add Qwen3-0.6B test [#717](https://github.com/vllm-project/vllm-ascend/pull/717)
23+
- Add nightly CI [#668](https://github.com/vllm-project/vllm-ascend/pull/668)
24+
- Add accuracy test report [#542](https://github.com/vllm-project/vllm-ascend/pull/542)
25+
326
## v0.8.4rc2
427

528
This is the second release candidate of v0.8.4 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to start the journey. Some experimental features are included in this version, such as W8A8 quantization and EP/DP support. We'll make them stable enough in the next release.

docs/source/user_guide/suppoted_features.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,18 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
66

77
| Feature | vLLM V0 Engine | vLLM V1 Engine | Next Step |
88
|-------------------------------|----------------|----------------|------------------------------------------------------------------------|
9-
| Chunked Prefill | 🚧 WIP | 🚧 WIP | Functional, waiting for CANN 8.1 nnal package release |
10-
| Automatic Prefix Caching | 🚧 WIP | 🚧 WIP | Functional, waiting for CANN 8.1 nnal package release |
9+
| Chunked Prefill | 🚧 WIP | 🟢 Functional | Functional, see detail note: [Chunked Prefill][cp] |
10+
| Automatic Prefix Caching | 🚧 WIP | 🟢 Functional | Functional, see detail note: [vllm-ascend#732][apc] |
1111
| LoRA | 🟢 Functional | 🚧 WIP | [vllm-ascend#396][multilora], CI needed, working on V1 support |
12-
| Prompt adapter | No plan | 🟡 Planned | Plan in 2025.06.30 |
12+
| Prompt adapter | 🔴 No plan | 🟡 Planned | Plan in 2025.06.30 |
1313
| Speculative decoding | 🟢 Functional | 🚧 WIP | CI needed; working on V1 support |
14-
| Pooling | 🟢 Functional | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
14+
| Pooling | 🟢 Functional | 🟡 Planned | CI needed and adapting more models; V1 support rely on vLLM support. |
1515
| Enc-dec | 🔴 NO plan | 🟡 Planned | Plan in 2025.06.30 |
1616
| Multi Modality | 🟢 Functional | 🟢 Functional | [Tutorial][multimodal], optimizing and adapting more models |
1717
| LogProbs | 🟢 Functional | 🟢 Functional | CI needed |
1818
| Prompt logProbs | 🟢 Functional | 🟢 Functional | CI needed |
1919
| Async output | 🟢 Functional | 🟢 Functional | CI needed |
20-
| Multi step scheduler | 🟢 Functional | 🔴 Deprecated | [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler]) |
20+
| Multi step scheduler | 🟢 Functional | 🔴 Deprecated | [vllm#8779][v1_rfc], replaced by [vLLM V1 Scheduler][v1_scheduler] |
2121
| Best of | 🟢 Functional | 🔴 Deprecated | [vllm#13361][best_of], CI needed |
2222
| Beam search | 🟢 Functional | 🟢 Functional | CI needed |
2323
| Guided Decoding | 🟢 Functional | 🟢 Functional | [vllm-ascend#177][guided_decoding] |
@@ -27,11 +27,12 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
2727
| Data Parallel | 🔴 NO plan | 🟢 Functional | CI needed; No plan on V0 support |
2828
| Prefill Decode Disaggregation | 🟢 Functional | 🟢 Functional | 1P1D available, working on xPyD and V1 support. |
2929
| Quantization | 🟢 Functional | 🟢 Functional | W8A8 available, CI needed; working on more quantization method support |
30-
| Graph Mode | 🔴 NO plan | 🟢 Functional | Functional, waiting for CANN 8.1 nnal package release |
30+
| Graph Mode | 🔴 NO plan | 🔵 Experimental| Experimental, see detail note: [vllm-ascend#767][graph_mode] |
3131
| Sleep Mode | 🟢 Functional | 🟢 Functional | level=1 available, CI needed, working on V1 support |
3232

3333
- 🟢 Functional: Fully operational, with ongoing optimizations.
34-
- 🚧 WIP: Under active development
34+
- 🔵 Experimental: Experimental support, interfaces and functions may change.
35+
- 🚧 WIP: Under active development, will be supported soon.
3536
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
3637
- 🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1.
3738

@@ -42,3 +43,6 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
4243
[v1_scheduler]: https://github.com/vllm-project/vllm/blob/main/vllm/v1/core/sched/scheduler.py
4344
[v1_rfc]: https://github.com/vllm-project/vllm/issues/8779
4445
[multilora]: https://github.com/vllm-project/vllm-ascend/issues/396
46+
[graph_mode]: https://github.com/vllm-project/vllm-ascend/issues/767
47+
[apc]: https://github.com/vllm-project/vllm-ascend/issues/732
48+
[cp]: https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill

0 commit comments

Comments
 (0)