-
Notifications
You must be signed in to change notification settings - Fork 257
[bugfix] Add ep initialization check and change the return check to is_driver_worker #896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
port = int(os.environ.get("MASTER_PORT", answer)) # type: ignore | ||
port = int(os.environ.get("VLLM_DP_MASTER_PORT", answer)) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using envs.VLLM_DP_MASTER_PORT
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
9df5c0f
to
293eefe
Compare
from torch.distributed import ProcessGroup | ||
from torch.distributed.distributed_c10d import (Backend, PrefixStore, | ||
_get_default_timeout, | ||
is_nccl_available) | ||
from torch.distributed.rendezvous import rendezvous | ||
from vllm.config import ParallelConfig | ||
|
||
_DP_GROUP = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still process group for dp in vllm now, why we add this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to determine whether to execute the dummy_run of prefill process. The native stateless process does not have global variables to obtain.
@@ -21,12 +21,18 @@ def get_etp_group() -> GroupCoordinator: | |||
return _ETP | |||
|
|||
|
|||
def model_parallel_initialized(): | |||
return (_ETP is not None and _EP is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use ep without etp, thus this will break this senario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If ETP is not enabled, communication groups will still be created.
@NINGBENZHE Thanks for the details! FYI, vllm-project/vllm#18763 has been merged now. But I think global dp group maybe still should be kept in |
@zzzzwwjj please take a look at this pr, it fixes a bug on graph mode |
intermediate_tensors=intermediate_tensors, | ||
inputs_embeds=inputs_embeds) | ||
if self.enable_torchair_graph_mode and attn_state == AscendAttentionState.DecodeOnly: | ||
attn_metadata = self.attn_metadata_builder.dummy_build( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seem can't find where this dummy_build
implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pr:
#839
num_reqs=num_tokens, num_actual_tokens=1) | ||
torch._dynamo.mark_static(input_ids) | ||
torch._dynamo.mark_static(positions) | ||
torch._dynamo.mark_static(attn_metadata.decode.block_table) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you implement this dummy block_table? Did you request the free block id from the sheduler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this pr:
#839
attn_metadata.attn_state == AscendAttentionState.DecodeOnly) | ||
|
||
if self.dp_group: | ||
while not self.has_prefilled and self.enable_torchair_graph_mode and attn_metadata.attn_state == AscendAttentionState.DecodeOnly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is quiet complicate. This is aim for mix running for decode and prefill across dp rank right? Can you extract this part out with more simple strategy. This part may trigger some unexpected error when someone try to update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. It will be removed along with its related calls once the official solution is implemented.
@@ -624,6 +629,9 @@ def _process_reqs( | |||
input_ids = torch.cat([input_ids, padding]) | |||
positions = torch.cat([positions, padding]) | |||
|
|||
if self.enable_torchair_graph_mode: | |||
self.sync_prefill_when_enable_graph(attn_metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more comments here? this sync between prefill and decode code piece will be removed in the future for more robust implementation.
This implementation relay on the PR #839 , and is more like a work round for emergency task, just discussed with @NINGBENZHE , some of the main code piece will wrapped into some function and will be removed after more robust implementation delivered. This may helped on the code maintenance and prevent us from diverge vllm main tree too far. |
@wangxiyuan @Yikun Please aware this
|
@@ -685,6 +693,41 @@ def _process_reqs( | |||
return (attn_metadata, hidden_states, spec_decode_metadata, positions, | |||
total_num_scheduled_tokens, sample_indices) | |||
|
|||
def sync_prefill_when_enable_graph(self, attn_metadata): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part will be removed after chunk mc2 support, cc @zzzzwwjj
DP support will soon be implemented in the next PR, so I think this workaround can temporarily not be merge to the main? |
When will the next pr be incorporated? |
after #839 merged |
Would it be possible to merge this first since another feature depends on the current functionality? When you submit the next PR, you could remove the related content later. |
If this PR is so urgent, perhaps you can add an option to disable these changes by default? |
@wangxiyuan can you look at this PR to decide whether this PR needs to be merge? |
The code is written in model_runner_v1 and remarks are made. |
The dp dummy run support on torchair will adopt another PR, this PR will not merge in vllm-ascend
2a308fa
to
552daf0
Compare
@@ -66,8 +66,7 @@ def fused_experts_with_mc2( | |||
local_rank = torch.distributed.get_rank(group=ep_group) | |||
all_to_all_group_size = torch.distributed.get_world_size(ep_group) | |||
|
|||
world_szie = torch.distributed.get_world_size() | |||
tp_size = world_szie // all_to_all_group_size | |||
tp_size = get_etp_group().world_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ganyi1996ppo please double check this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks fine
dp related change has been removed. I'm fine with this PR. thanks for the fix |
Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
Please update PR title and mesasge more meaningful. |
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com>
### What this PR does / why we need it? Solve the bug that the graph mode is the same as p and d, and some other bugs. ### Does this PR introduce _any_ user-facing change? Wouldn't be ### How was this patch tested? Follow the end-to-end test Signed-off-by: ningbenzhe1 <ningbenzhe@huawei.com> Signed-off-by: wangxiaoxin (A) <w00664509@china.huawei.com>
fixed |
What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other bugs.
Does this PR introduce any user-facing change?
Wouldn't be
How was this patch tested?
Follow the end-to-end test