Skip to content

draft of dual stream launch #842

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ganyi1996ppo
Copy link
Collaborator

@ganyi1996ppo ganyi1996ppo commented May 14, 2025

What this PR does / why we need it?

The moe layer in sparse model like deepseek tends to bring significant communication overhead over the computation part, the communication result directly comsumed by the routing expert, which makes the computation resources keeped idle in a long time even if the shared expert exeute in parallel. According to the DeepEP indicate in their repo, slice the input into several microbatch and makes the attention computation overlapped with another microbatch's communication part can reduce this overhead significantly, thus bring performance boost. We here purpose a draft on the dual stream launch implementation.

dual stream

As the figure shows, we slice the input into 2 microbatch in model_runner, and maintain a threadpool to launch each stream on different thread. The second stream's execution can only be triggered when first stream have finished its first attention task.

Does this PR introduce any user-facing change?

How was this patch tested?

Signed-off-by: ganyi <pleaplusone.gy@gmail.com>
@ganyi1996ppo ganyi1996ppo changed the title draft of duel stream launch draft of dual stream launch May 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant