Skip to content

[bugfix] Add ep initialization check and change the return check to is_driver_worker #896

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 3, 2025

Conversation

NINGBENZHE
Copy link
Contributor

@NINGBENZHE NINGBENZHE commented May 19, 2025

What this PR does / why we need it?

Solve the bug that the graph mode is the same as p and d, and some other bugs.

Does this PR introduce any user-facing change?

Wouldn't be

How was this patch tested?

Follow the end-to-end test

port = int(os.environ.get("MASTER_PORT", answer)) # type: ignore
port = int(os.environ.get("VLLM_DP_MASTER_PORT", answer)) # type: ignore
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using envs.VLLM_DP_MASTER_PORT is better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

from torch.distributed import ProcessGroup
from torch.distributed.distributed_c10d import (Backend, PrefixStore,
_get_default_timeout,
is_nccl_available)
from torch.distributed.rendezvous import rendezvous
from vllm.config import ParallelConfig

_DP_GROUP = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still process group for dp in vllm now, why we add this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used to determine whether to execute the dummy_run of prefill process. The native stateless process does not have global variables to obtain.

@@ -21,12 +21,18 @@ def get_etp_group() -> GroupCoordinator:
return _ETP


def model_parallel_initialized():
return (_ETP is not None and _EP is not None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could use ep without etp, thus this will break this senario

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. If ETP is not enabled, communication groups will still be created.

@MengqingCao
Copy link
Collaborator

@NINGBENZHE Thanks for the details!

FYI, vllm-project/vllm#18763 has been merged now. But I think global dp group maybe still should be kept in patch_distiributed.py currently. Because the dp group is created at ascend_stateless_init_dp_group, instead of stateless_init_torch_distributed_process_group

@MengqingCao
Copy link
Collaborator

@zzzzwwjj please take a look at this pr, it fixes a bug on graph mode

intermediate_tensors=intermediate_tensors,
inputs_embeds=inputs_embeds)
if self.enable_torchair_graph_mode and attn_state == AscendAttentionState.DecodeOnly:
attn_metadata = self.attn_metadata_builder.dummy_build(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem can't find where this dummy_build implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pr:
#839

num_reqs=num_tokens, num_actual_tokens=1)
torch._dynamo.mark_static(input_ids)
torch._dynamo.mark_static(positions)
torch._dynamo.mark_static(attn_metadata.decode.block_table)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you implement this dummy block_table? Did you request the free block id from the sheduler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this pr:
#839

attn_metadata.attn_state == AscendAttentionState.DecodeOnly)

if self.dp_group:
while not self.has_prefilled and self.enable_torchair_graph_mode and attn_metadata.attn_state == AscendAttentionState.DecodeOnly:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is quiet complicate. This is aim for mix running for decode and prefill across dp rank right? Can you extract this part out with more simple strategy. This part may trigger some unexpected error when someone try to update this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. It will be removed along with its related calls once the official solution is implemented.

@@ -624,6 +629,9 @@ def _process_reqs(
input_ids = torch.cat([input_ids, padding])
positions = torch.cat([positions, padding])

if self.enable_torchair_graph_mode:
self.sync_prefill_when_enable_graph(attn_metadata)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more comments here? this sync between prefill and decode code piece will be removed in the future for more robust implementation.

@ganyi1996ppo
Copy link
Collaborator

This implementation relay on the PR #839 , and is more like a work round for emergency task, just discussed with @NINGBENZHE , some of the main code piece will wrapped into some function and will be removed after more robust implementation delivered. This may helped on the code maintenance and prevent us from diverge vllm main tree too far.

@ganyi1996ppo
Copy link
Collaborator

@wangxiyuan @Yikun Please aware this

This implementation relay on the PR #839 , and is more like a work round for emergency task, just discussed with @NINGBENZHE , some of the main code piece will wrapped into some function and will be removed after more robust implementation delivered. This may helped on the code maintenance and prevent us from diverge vllm main tree too far.

@@ -685,6 +693,41 @@ def _process_reqs(
return (attn_metadata, hidden_states, spec_decode_metadata, positions,
total_num_scheduled_tokens, sample_indices)

def sync_prefill_when_enable_graph(self, attn_metadata):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part will be removed after chunk mc2 support, cc @zzzzwwjj

@NeverRaR
Copy link
Contributor

DP support will soon be implemented in the next PR, so I think this workaround can temporarily not be merge to the main?

@NINGBENZHE
Copy link
Contributor Author

DP support will soon be implemented in the next PR, so I think this workaround can temporarily not be merge to the main?

When will the next pr be incorporated?

@NeverRaR
Copy link
Contributor

NeverRaR commented May 29, 2025

DP support will soon be implemented in the next PR, so I think this workaround can temporarily not be merge to the main?

When will the next pr be incorporated?

after #839 merged

@NINGBENZHE
Copy link
Contributor Author

DP 支持将很快在下一个 PR 中实现,所以我认为这个解决方法暂时不能合并到主版本中?

下一个 pr 什么时候会被纳入?

after #839 merged

Would it be possible to merge this first since another feature depends on the current functionality? When you submit the next PR, you could remove the related content later.

@NeverRaR
Copy link
Contributor

Would it be possible to merge this first since another feature depends on the current functionality? When you submit the next PR, you could remove the related content later.

If this PR is so urgent, perhaps you can add an option to disable these changes by default?

@ganyi1996ppo
Copy link
Collaborator

@wangxiyuan can you look at this PR to decide whether this PR needs to be merge?

@NINGBENZHE
Copy link
Contributor Author

Would it be possible to merge this first since another feature depends on the current functionality? When you submit the next PR, you could remove the related content later.

If this PR is so urgent, perhaps you can add an option to disable these changes by default?

Would it be possible to merge this first since another feature depends on the current functionality? When you submit the next PR, you could remove the related content later.

If this PR is so urgent, perhaps you can add an option to disable these changes by default?

The code is written in model_runner_v1 and remarks are made.

ganyi1996ppo
ganyi1996ppo previously approved these changes May 29, 2025
@Yikun
Copy link
Collaborator

Yikun commented May 29, 2025

I prefer to merge #839 first and do a quick review on #1012 , then we don't need this PR anymore.

Because v0.9.0rc1 should also support #1012 IMO

@ganyi1996ppo ganyi1996ppo dismissed their stale review May 29, 2025 10:54

The dp dummy run support on torchair will adopt another PR, this PR will not merge in vllm-ascend

@NINGBENZHE NINGBENZHE force-pushed the main branch 3 times, most recently from 2a308fa to 552daf0 Compare May 30, 2025 02:23
@@ -66,8 +66,7 @@ def fused_experts_with_mc2(
local_rank = torch.distributed.get_rank(group=ep_group)
all_to_all_group_size = torch.distributed.get_world_size(ep_group)

world_szie = torch.distributed.get_world_size()
tp_size = world_szie // all_to_all_group_size
tp_size = get_etp_group().world_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ganyi1996ppo please double check this change.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine

@wangxiyuan
Copy link
Collaborator

dp related change has been removed. I'm fine with this PR. thanks for the fix

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
@wangxiyuan wangxiyuan merged commit 6ec64a3 into vllm-project:main Jun 3, 2025
20 of 22 checks passed
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 3, 2025
### What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other
bugs.
### Does this PR introduce _any_ user-facing change?
Wouldn't be
### How was this patch tested?
Follow the end-to-end test

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 3, 2025
### What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other
bugs.
### Does this PR introduce _any_ user-facing change?
Wouldn't be
### How was this patch tested?
Follow the end-to-end test

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 3, 2025
### What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other
bugs.
### Does this PR introduce _any_ user-facing change?
Wouldn't be
### How was this patch tested?
Follow the end-to-end test

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 3, 2025
### What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other
bugs.
### Does this PR introduce _any_ user-facing change?
Wouldn't be
### How was this patch tested?
Follow the end-to-end test

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
@Yikun
Copy link
Collaborator

Yikun commented Jun 3, 2025

Please update PR title and mesasge more meaningful.

David9857 pushed a commit to David9857/vllm-ascend that referenced this pull request Jun 3, 2025
### What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other
bugs.
### Does this PR introduce _any_ user-facing change?
Wouldn't be
### How was this patch tested?
Follow the end-to-end test

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
momo609 pushed a commit to momo609/vllm-ascend that referenced this pull request Jun 4, 2025
### What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other
bugs.
### Does this PR introduce _any_ user-facing change?
Wouldn't be
### How was this patch tested?
Follow the end-to-end test

Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
@NINGBENZHE NINGBENZHE changed the title [bugfix] some bugs maybe fail to run [bugfix] Add ep initialization check and change the return check to is_driver_worker Jun 4, 2025
@NINGBENZHE
Copy link
Contributor Author

Please update PR title and mesasge more meaningful.

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants