You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### What this PR does / why we need it?
Add 0.8.5rc1 release note and bump vllm version to v0.8.5.post1
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
CI passed
---------
Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
There was a installation bug on vLLM v0.8.4 aarch64: [No matching distribution found for triton](https://github.com/vllm-project/vllm-ascend/issues/581).
152
-
If you failed to install vLLM due to it, please build from source code.
153
-
```
154
-
155
144
:::{dropdown} Click here to see "Build from source code"
Copy file name to clipboardExpand all lines: docs/source/user_guide/release_notes.md
+23Lines changed: 23 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,28 @@
1
1
# Release note
2
2
3
+
## v0.8.5rc1
4
+
5
+
This is the 1st release candidate of v0.8.5 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to start the journey. Now you can enable V1 egnine by setting the environment variable `VLLM_USE_V1=1`, see the feature support status of vLLM Ascend in [here](https://vllm-ascend.readthedocs.io/en/latest/user_guide/suppoted_features.html).
6
+
7
+
### Highlights
8
+
- Upgrade CANN version to 8.1.RC1 to support chunked prefill and automatic prefix caching (`--enable_prefix_caching`) when V1 is enabled [#747](https://github.com/vllm-project/vllm-ascend/pull/747)
9
+
- Optimize Qwen2 VL and Qwen 2.5 VL [#701](https://github.com/vllm-project/vllm-ascend/pull/701)
10
+
- Improve Deepseek V3 eager mode and graph mode performance, now you can use --additional_config={'enable_graph_mode': True} to enable graph mode. [#598](https://github.com/vllm-project/vllm-ascend/pull/598)[#719](https://github.com/vllm-project/vllm-ascend/pull/719)
11
+
12
+
### Core
13
+
- Upgrade vLLM to 0.8.5.post1 [#715](https://github.com/vllm-project/vllm-ascend/pull/715)
14
+
- Fix early return in CustomDeepseekV2MoE.forward during profile_run [#682](https://github.com/vllm-project/vllm-ascend/pull/682)
15
+
- Adapts for new quant model generated by modelslim [#719](https://github.com/vllm-project/vllm-ascend/pull/719)
16
+
- Initial support on P2P Disaggregated Prefill based on llm_datadist [#694](https://github.com/vllm-project/vllm-ascend/pull/694)
17
+
- Use `/vllm-workspace` as code path and include `.git` in container image to fix issue when start vllm under `/workspace`[#726](https://github.com/vllm-project/vllm-ascend/pull/726)
18
+
- Optimize NPU memory usage to make DeepSeek R1 W8A8 32K model len work. [#728](https://github.com/vllm-project/vllm-ascend/pull/728)
19
+
- Fix `PYTHON_INCLUDE_PATH` typo in setup.py [#762](https://github.com/vllm-project/vllm-ascend/pull/762)
20
+
21
+
### Other
22
+
- Add Qwen3-0.6B test [#717](https://github.com/vllm-project/vllm-ascend/pull/717)
23
+
- Add nightly CI [#668](https://github.com/vllm-project/vllm-ascend/pull/668)
24
+
- Add accuracy test report [#542](https://github.com/vllm-project/vllm-ascend/pull/542)
25
+
3
26
## v0.8.4rc2
4
27
5
28
This is the second release candidate of v0.8.4 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to start the journey. Some experimental features are included in this version, such as W8A8 quantization and EP/DP support. We'll make them stable enough in the next release.
0 commit comments