[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

mengsoso · 2025-04-24T09:30:56Z

For the Hopper architecture, the mixed_dtype_grouped_gemm currently only supports row-wise scaling. However, for the AWQ quantization, the precision loss is still quite significant.

Will CUTLASS support the Zero-points and block-wise scaling of AWQ (W4A16 / W4A8) for MoE models?

Thanks~

mnicely · 2025-04-24T15:40:48Z

Hi @mengsoso, this feature is not on our roadmap, but we welcome community contributions!

mengsoso added ? - Needs Triage feature request New feature or request labels Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

mengsoso commented Apr 24, 2025

mnicely commented Apr 24, 2025

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

[FEA] Has CUTLASS considered supporting Zero-points and block-wise scaling in Hoppr Mixed Grouped Gemm recently? #2261

Comments

mengsoso commented Apr 24, 2025

mnicely commented Apr 24, 2025