[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1235

MengqingCao · 2025-06-16T03:53:15Z

What this PR does / why we need it?

Fix rank set in DP scenario. The new poc version of torch-npu support setting ASCEND_RT_VISIBLE_DEVICES dynamically, thus we could use the rank set in DPEngineCoreProc directly instead of calculating local rank across dp by hand in the patched _init_data_parallel

Closes: #1170

Bump torch-npu version to 2.5.1.post1.dev20250528

Closes: #1242
Closes: #1232

How was this patch tested?

CI passed with new added test.

MengqingCao · 2025-06-16T04:39:35Z

This should be merged after #884

wangxiyuan · 2025-06-16T06:14:28Z

I like the PR which remove the patch code. Let's merge this asap.

github-actions · 2025-06-16T12:26:28Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: MengqingCao <cmq0113@163.com>

Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: MengqingCao <cmq0113@163.com>

Yikun · 2025-06-17T00:44:55Z

pd: https://github.com/vllm-project/vllm-ascend/actions/runs/15684668372/job/44184711624
accuracy: https://github.com/vllm-project/vllm-ascend/actions/runs/15684668412
long term: https://github.com/vllm-project/vllm-ascend/actions/runs/15684668386

All passed.

… to 2.5.1.post1.dev20250528 (#1247) ### What this PR does / why we need it? Cherry-pick form #1235 1. Fix rank set in DP scenario. The new poc version of torch-npu support setting `ASCEND_RT_VISIBLE_DEVICES` dynamically, thus we could use the rank set in `DPEngineCoreProc` directly instead of calculating local rank across dp by hand in the patched `_init_data_parallel` Closes: #1170 2. Bump torch-npu version to 2.5.1.post1.dev20250528 Closes: #1242 Closes: #1232 ### How was this patch tested? CI passed with new added test. --------- Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: MengqingCao <cmq0113@163.com> Co-authored-by: Icey <1790571317@qq.com>

@jianzs

…to main * 'main' of https://github.com/vllm-project/vllm-ascend: (22 commits) [Bugfix] Remove cuda related lines and add additional pip mirror (vllm-project#1252) [refactor] Refactoring AscendFusedMoE (vllm-project#1229) [Doc] Refactor and init user story page (vllm-project#1224) [Doctest] add installation doctest (vllm-project#1179) [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 (vllm-project#1235) Fix the device error when using ray as vllm-acend backend (vllm-project#884) [CI] Add unit test framework (vllm-project#1201) [Build] Speedup image build (vllm-project#1216) [CI] Make e2e test to be preemptible and simple (vllm-project#1217) Waiting for BMM NZ support(Improve TPOP 2ms performance) (vllm-project#1131) [Doc] fix VLLM_USE_V1 value in graph mode docs (vllm-project#1226) vllm-ascend support chunked prefill (vllm-project#1172) [CI/UT][Graph] Add ut for torchair graph mode (vllm-project#1103) Add ShouJian Zheng (@jianzs) as vLLM Ascend maintainer (vllm-project#1203) [CI] Recover ut for ascend scheduler only in ci of v1. (vllm-project#1180) Support multistream of MLA vector operations (vllm-project#1135) [Doc] Add Referer header for CANN package download url. (vllm-project#1192) [fix] fix bug in 1p1d disaggregated_prefill example (vllm-project#1184) [CI][Benchmark] Add qwen2.5-7b test (vllm-project#1104) [CI][Benchmark] Add new model and v1 test to perf benchmarks (vllm-project#1099) ... Sync with upstream main branch# the commit.

… to 2.5.1.post1.dev20250528 (vllm-project#1235)" This reverts commit 96fa7ff.

github-actions bot added the module:tests label Jun 16, 2025

MengqingCao mentioned this pull request Jun 16, 2025

Fix the device error when using ray and add initialize_cache to support vllm main #884

Merged

realliujiaxu approved these changes Jun 16, 2025

View reviewed changes

MengqingCao force-pushed the dpfix branch 2 times, most recently from 6f48563 to a92e5fb Compare June 16, 2025 08:36

github-actions bot added the merge-conflicts label Jun 16, 2025

github-actions bot added documentation Improvements or additions to documentation ci/build labels Jun 16, 2025

MengqingCao and others added 3 commits June 16, 2025 13:05

[DP][V1] Fix rank set in DP scenario

d6718ce

Signed-off-by: MengqingCao <cmq0113@163.com>

rm no-build-isolation

9ad7522

Signed-off-by: MengqingCao <cmq0113@163.com>

Bump torch-npu version to 2.5.1.post1.dev20250528

c11220f

Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: MengqingCao <cmq0113@163.com>

MengqingCao force-pushed the dpfix branch from c555787 to c11220f Compare June 16, 2025 13:08

github-actions bot removed the merge-conflicts label Jun 16, 2025

MengqingCao mentioned this pull request Jun 16, 2025

[RFC]: E2E CI test for key features #413

Open

83 tasks

Yikun changed the title ~~[DP][V1] Fix rank set in DP scenario~~ [DP][V1] Bump torch-npu version to 2.5.1.post1.dev20250528 and fix rank set in DP Jun 16, 2025

MengqingCao changed the title ~~[DP][V1] Bump torch-npu version to 2.5.1.post1.dev20250528 and fix rank set in DP~~ [DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 Jun 16, 2025

Yikun approved these changes Jun 16, 2025

View reviewed changes

Yikun added the ready read for review label Jun 16, 2025

MengqingCao mentioned this pull request Jun 16, 2025

[v0.9.1][DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1247

Merged

Yikun merged commit 96fa7ff into vllm-project:main Jun 16, 2025
24 checks passed

Yikun added long-term-test enable long term test for PR accuracy-test enable all accuracy test for PR pd-test enable pd test for PR ready-for-test start test by label for PR labels Jun 16, 2025

MengqingCao mentioned this pull request Jun 17, 2025

[Bug]: Stuck at "Adjusting world_size=16 rank=0 distributed_init_method=tcp://127.0.0.1:42051 for DP" #1255

Open

Yikun added a commit to Yikun/vllm-ascend that referenced this pull request Jun 21, 2025

Revert "[DP][V1] Fix rank set in DP scenario & Bump torch-npu version…

715da90

… to 2.5.1.post1.dev20250528 (vllm-project#1235)" This reverts commit 96fa7ff.

MengqingCao deleted the dpfix branch June 28, 2025 01:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1235

[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1235

Uh oh!

MengqingCao commented Jun 16, 2025 •

edited by Yikun

Loading

Uh oh!

MengqingCao commented Jun 16, 2025

Uh oh!

wangxiyuan commented Jun 16, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

Uh oh!

Yikun commented Jun 17, 2025

Uh oh!

Uh oh!

[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1235

[DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1235

Uh oh!

Conversation

MengqingCao commented Jun 16, 2025 • edited by Yikun Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

How was this patch tested?

Uh oh!

MengqingCao commented Jun 16, 2025

Uh oh!

wangxiyuan commented Jun 16, 2025

Uh oh!

github-actions bot commented Jun 16, 2025

Uh oh!

Uh oh!

Yikun commented Jun 17, 2025

Uh oh!

Uh oh!

MengqingCao commented Jun 16, 2025 •

edited by Yikun

Loading