Skip to content

Commit bea4890

Browse files
Tianyu Liangfacebook-github-bot
authored andcommitted
Enabling stacked NVFP4 quantization kernels (#4458)
Summary: Pull Request resolved: #4458 X-link: facebookresearch/FBGEMM#1518 Enable E2E under a modified setting where the input x is already concatenated and only m_sizes on GPU is available. The new design is compatible with cuda graph usage at the cost of a few extra kernels. Reviewed By: q10, jiawenliu64 Differential Revision: D77420779 fbshipit-source-id: 891ab3f99356a279b608c494bb9eac7a8745f084
1 parent c732774 commit bea4890

File tree

5 files changed

+540
-56
lines changed

5 files changed

+540
-56
lines changed

0 commit comments

Comments
 (0)