-
Notifications
You must be signed in to change notification settings - Fork 256
[MLA][Graph] Improve assertion on Graph mode with MLA #933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM. Note that MLA kernel may support |
thanks for the info!
…---- Replied Message ----
| From | ***@***.***> |
| Date | 05/22/2025 20:58 |
| To | ***@***.***> |
| Cc | Mengqing ***@***.***>***@***.***> |
| Subject | Re: [vllm-project/vllm-ascend] [MLA][Graph] Improve assertion on Graph mode with MLA (PR #933) |
linfeng-yuan left a comment (vllm-project/vllm-ascend#933)
LGTM. Note that MLA kernel may support numHeads / numKvHeads < 16 at future.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Maybe we should add note or TODO on code and doc current limit supported in FAQ or somewhere. |
Done, PTAL, thanks! |
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Improve assertion on Graph mode with MLA. When running deepseek with graph mode, the fused MLA op only support `numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion info here to avoid users confused with this. ### Does this PR introduce _any_ user-facing change? Adjusting tp size is required when running deepseek-v3/r1 with graph mode. deepseek-v2-lite is not supported in graph mode. ### How was this patch tested? Test locally as the CI machine could not run V3 due to the HBM limits. --------- Signed-off-by: MengqingCao <cmq0113@163.com>
### What this PR does / why we need it? Improve assertion on Graph mode with MLA. When running deepseek with graph mode, the fused MLA op only support `numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion info here to avoid users confused with this. ### Does this PR introduce _any_ user-facing change? Adjusting tp size is required when running deepseek-v3/r1 with graph mode. deepseek-v2-lite is not supported in graph mode. ### How was this patch tested? Test locally as the CI machine could not run V3 due to the HBM limits. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
### What this PR does / why we need it? Improve assertion on Graph mode with MLA. When running deepseek with graph mode, the fused MLA op only support `numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion info here to avoid users confused with this. ### Does this PR introduce _any_ user-facing change? Adjusting tp size is required when running deepseek-v3/r1 with graph mode. deepseek-v2-lite is not supported in graph mode. ### How was this patch tested? Test locally as the CI machine could not run V3 due to the HBM limits. --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: wangxiaoxin (A) <wangxiaoxin7@huawei.com>
The constraint here is affected by what? Why does it have to be 32/64/128? Can other values like 28 be used (or why is it not recommended)? Because there's a user-defined model with num_heads=28 and num_kv_heads=1, which fails to run here. |
This constraint comes from cann op, and it will be moved after #1653 and #1508 |
1.Although the restriction on multiples of 32 has been lifted, is it still recommended to set parameters as multiples of 32? I've seen analyses suggesting that using multiples of 32 allows for hardware alignment and full utilization of hardware resources. |
1.Although the restriction on multiples of 32 has been lifted, is it still recommended to set parameters as multiples of 32? I've seen analyses suggesting that using multiples of 32 allows for hardware alignment and full utilization of hardware resources. |
Improve assertion on Graph mode with MLA. When running deepseek with graph mode, the fused MLA op only support `numHeads / numKvHeads ∈ {32, 64, 128}`, thus we improve the assertion info here to avoid users confused with this. Adjusting tp size is required when running deepseek-v3/r1 with graph mode. deepseek-v2-lite is not supported in graph mode. Test locally as the CI machine could not run V3 due to the HBM limits. --------- Signed-off-by: MengqingCao <cmq0113@163.com>
What this PR does / why we need it?
Improve assertion on Graph mode with MLA.
When running deepseek with graph mode, the fused MLA op only support
numHeads / numKvHeads ∈ {32, 64, 128}
, thus we improve the assertion info here to avoid users confused with this.Does this PR introduce any user-facing change?
Adjusting tp size is required when running deepseek-v3/r1 with graph mode. deepseek-v2-lite is not supported in graph mode.
How was this patch tested?
Test locally as the CI machine could not run V3 due to the HBM limits.