[NVFP4][WIP] Add NVFp4 Support #287

dsikka · 2025-03-31T21:33:37Z

Summary

Introduce FloatArg classes to manage float args data (min/min/rounding table)
Introduce new global_scale parameter
Update calculate_qparams, quant, dequant to handle global scale operations
Add NVFP4 QuantScheme to match the expected group_size
Add ModelOptQuantizer to compress models using the expected format

For LLM Compressor, requires: vllm-project/llm-compressor#1309
For vLLM Emulation Runs: neuralmagic/vllm#59

brian-dellabetta · 2025-04-01T16:20:41Z

src/compressed_tensors/quantization/quant_args.py

+FP8_E4M3_DATA = FloatArgs(
+    exponent=4,
+    mantissa=3,
+    bits=8,


i think we usually use num_bits right? I changed AWQ Modifier to it based on what i saw elsewhere in llm comressor

Suggested change

bits=8,

num_bits=8,

This PR isn’t ready for review lol. It’s set to Draft

* add nvfp4 packing * add model_opt compressor * update script * update compress/decompress methods * update * update * update

Signed-off-by: mgoin <michael@neuralmagic.com>

brian-dellabetta reviewed Apr 1, 2025

View reviewed changes

dsikka added 10 commits April 1, 2025 18:49

initial commit

d22a137

update

be02849

update

974953c

update

79437ef

update quant/dequant steps; update scale calculation step

36204f0

update NVFP4 data type; add scheme

d49830d

update datatype/look-up table

9254821

fix param name

1727508

update

eec7bd3

swap operations

b11b96a

dsikka force-pushed the nvfp4 branch from c24756c to b11b96a Compare April 1, 2025 18:49

dsikka added 2 commits April 1, 2025 19:23

fix typo

e8c6c8f

fix condition

be30822

dsikka requested a review from anmarques April 2, 2025 20:05

Merge branch 'main' into nvfp4

271a936

mgoin self-requested a review April 24, 2025 19:18

dsikka added 3 commits April 24, 2025 19:58

fix condition

682c110

per tensor input scales are never good???

35d98d5

remove scheme

107bd93

dsikka mentioned this pull request May 1, 2025

NVFP4 Emulation neuralmagic/vllm#59

Closed

dsikka and others added 6 commits May 5, 2025 10:47

[WIP][NVFP4] Add compression/decompression code (#291)

daca970

* add nvfp4 packing * add model_opt compressor * update script * update compress/decompress methods * update * update * update

remove script, add tests

7fbf300

Optimize pack_fp4_to_uint8 for fp4 (#309)

5544ef4

Signed-off-by: mgoin <michael@neuralmagic.com>

fix pack dtype, update test, clean-up

ddb41ab

update compressor

a95520d

update global scale calculation

7435b3f

dsikka closed this Jun 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVFP4][WIP] Add NVFp4 Support #287

[NVFP4][WIP] Add NVFp4 Support #287

Uh oh!

dsikka commented Mar 31, 2025 •

edited

Loading

Uh oh!

brian-dellabetta Apr 1, 2025

Uh oh!

dsikka Apr 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

[NVFP4][WIP] Add NVFp4 Support #287

[NVFP4][WIP] Add NVFp4 Support #287

Uh oh!

Conversation

dsikka commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-dellabetta Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

dsikka Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dsikka commented Mar 31, 2025 •

edited

Loading

dsikka Apr 1, 2025 •

edited

Loading