-
Notifications
You must be signed in to change notification settings - Fork 18
feat: enable FP8 quantized models loading #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
👋 Hi! Thank you for contributing to vLLM support on Spyre.
Or this can be done with
Now you are good to go 🚀 |
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
@@ -40,7 +40,7 @@ class SpyrePlatform(Platform): | |||
# "spyre" device_name no longer worked due to https://github.com/vllm-project/vllm/pull/16464 | |||
device_name: str = "cpu" | |||
device_type: str = "cpu" | |||
supported_quantization: list[str] = ["gptq"] | |||
supported_quantization: list[str] = ["gptq", "fp8", "compressed-tensors"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a mention to compressed-tensors
anywhere else in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth a comment linking to https://github.com/foundation-model-stack/fms-model-optimizer/pull/154/files#diff-1fb88c10872b0f03f1d0f6a00cd20328cdbb5e0e8bc53aa623be49b9ee3efe57R4-R9
sounds like fp8 support in fms-mo is focused on compressed-tensors
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Description
fms_mo
generate_spyre_vllm_output