Skip to content

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mengsoso opened this issue Apr 24, 2025 · 1 comment
Labels

Comments

@mengsoso
Copy link

For the Hopper architecture, the mixed_dtype_grouped_gemm currently only supports row-wise scaling. However, for the AWQ quantization, the precision loss is still quite significant.

Image

Will CUTLASS support the Zero-points and block-wise scaling of AWQ (W4A16 / W4A8) for MoE models?

Thanks~

@mengsoso mengsoso added ? - Needs Triage feature request New feature or request labels Apr 24, 2025
@mnicely
Copy link
Collaborator

mnicely commented Apr 24, 2025

Hi @mengsoso, this feature is not on our roadmap, but we welcome community contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants