-
Notifications
You must be signed in to change notification settings - Fork 109
Support Data Parallel MOE on HPU #1022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1b3558b
to
820ad1d
Compare
b718e34
to
455cf52
Compare
/run-gaudi-tests |
1 similar comment
/run-gaudi-tests |
455cf52
to
915c389
Compare
Signed-off-by: Xinyu Chen <xichen@habana.ai>
Signed-off-by: Xinyu Chen <xichen@habana.ai>
Signed-off-by: Xinyu Chen <xichen@habana.ai>
Signed-off-by: Xinyu Chen <xichen@habana.ai>
/run-gaudi-tests |
/run-gaudi-tests |
@xinyu-intel , please add docstring to hacked codes in llm_engine.py, in that case, when hababa_team doing rebase, they can avoid to somehow break the DP path. |
@xinyu-intel , please also add UT, I think once this PR merged, it will be quite easy to get broken during rebase. |
it's hard to add UT. There is known hang issue for mix batch scenario. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
/run-gaudi-tests |
upstream code do not support DP for v0 and we implement it
Based on #947
Test CML:
PT_HPU_LAZY_MODE=1 VLLM_WORKER_MULTIPROC_METHOD=spawn VLLM_USE_V1=0 VLLM_SKIP_WARMUP=true python examples/offline_inference/data_parallel.py --model="ibm-research/PowerMoE-3b" --dp-size=2 --tp-size=2