-
Notifications
You must be signed in to change notification settings - Fork 234
[v0.9.1][DP][V1] Fix rank set in DP scenario & Bump torch-npu version to 2.5.1.post1.dev20250528 #1247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a soft cherrypick
This should merged after #1234 |
CI failed due to not compatible with V0.9.0, will fix this later |
Looks good |
5c6c5bc
to
b45f0ab
Compare
@ganyi1996ppo DP will raise timeout error on A2 with this pr, thus we just skip the new added ut currently. And this could fix DP on A3, could you merge this now? The CI except for dp have all passed in https://github.com/vllm-project/vllm-ascend/actions/runs/15700818646/job/44238373529, p.s., |
Why a2 failed on the dp case? Dose this failure related to torch_npu? |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: Icey <1790571317@qq.com> Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Signed-off-by: MengqingCao <cmq0113@163.com>
Sorry for the wrong info, there is no bug on A2. This failure is caused by the wrong method of enabling dp in this pr. And this has been fixed in #1273 cc @ganyi1996ppo |
What this PR does / why we need it?
Cherry-pick form #1235
ASCEND_RT_VISIBLE_DEVICES
dynamically, thus we could use the rank set inDPEngineCoreProc
directly instead of calculating local rank across dp by hand in the patched_init_data_parallel
Closes: #1170
Closes: #1242
Closes: #1232
How was this patch tested?
CI passed with new added test.