Commit bea4890

Tianyu Liang

authored and

committed

Enabling stacked NVFP4 quantization kernels (#4458)

Summary: Pull Request resolved: #4458 X-link: facebookresearch/FBGEMM#1518 Enable E2E under a modified setting where the input x is already concatenated and only m_sizes on GPU is available. The new design is compatible with cuda graph usage at the cost of a few extra kernels. Reviewed By: q10, jiawenliu64 Differential Revision: D77420779 fbshipit-source-id: 891ab3f99356a279b608c494bb9eac7a8745f084

1 parent c732774 commit bea4890Copy full SHA for bea4890

5 files changed

+540

-56

lines changed

fbgemm_gpu/experimental
- gemm/triton_gemm
  - fp4_quantize.py
- gen_ai
  - bench
    - quantize_ops.py
  - src/quantize
    - cutlass_extensions
      - f4f4bf16_grouped.cu
    - quantize.cpp
    - quantize_defs.cpp

5 files changed

+540

-56

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit bea4890

5 files changed

5 files changed

File tree

5 files changed

5 files changed

0 commit comments