feat: enable FP8 quantized models loading #316

rafvasq · 2025-07-16T23:24:15Z

Description

"Enable" FP8 quantized models to load via fms_mo
Updates example to include quantization flag
Adds quantization flag (unused for now) in generate_spyre_vllm_output

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

github-actions · 2025-07-16T23:24:22Z

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

maxdebayser · 2025-07-23T17:23:20Z

vllm_spyre/platform.py

@@ -40,7 +40,7 @@ class SpyrePlatform(Platform):
    # "spyre" device_name no longer worked due to https://github.com/vllm-project/vllm/pull/16464
    device_name: str = "cpu"
    device_type: str = "cpu"
-    supported_quantization: list[str] = ["gptq"]
+    supported_quantization: list[str] = ["gptq", "fp8", "compressed-tensors"]


I don't see a mention to compressed-tensors anywhere else in this PR.

maybe worth a comment linking to https://github.com/foundation-model-stack/fms-model-optimizer/pull/154/files#diff-1fb88c10872b0f03f1d0f6a00cd20328cdbb5e0e8bc53aa623be49b9ee3efe57R4-R9

sounds like fp8 support in fms-mo is focused on compressed-tensors

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

Some updates for quant testing

29f7021

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

rafvasq changed the title ~~[DRAFT] Quantized model testing~~ [WIP] Quantized model testing Jul 16, 2025

rafvasq added 3 commits July 22, 2025 16:14

Add fp8 config loading

5efcfaa

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

Lint

2df1c23

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

Refactor

dfd2376

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

rafvasq changed the title ~~[WIP] Quantized model testing~~ feat: enable FP8 quantized models loading Jul 22, 2025

Update tests

9762f56

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

rafvasq marked this pull request as ready for review July 22, 2025 21:13

rafvasq requested review from yannicks1, prashantgupta24, sducouedic, tdoublep and nikolaospapandreou as code owners July 22, 2025 21:13

Lint

e541cc8

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

maxdebayser reviewed Jul 23, 2025

View reviewed changes

Add comment

bb17bbf

Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>

rafvasq requested a review from maxdebayser July 23, 2025 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enable FP8 quantized models loading #316

feat: enable FP8 quantized models loading #316

rafvasq commented Jul 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

maxdebayser Jul 23, 2025

Uh oh!

joerunde Jul 23, 2025

Uh oh!

Uh oh!

feat: enable FP8 quantized models loading #316

Are you sure you want to change the base?

feat: enable FP8 quantized models loading #316

Conversation

rafvasq commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Jul 16, 2025

Uh oh!

maxdebayser Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

joerunde Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rafvasq commented Jul 16, 2025 •

edited

Loading