Enabled BnB NF4 inference on Gaudi #1457

rsshaik1 · 2025-06-19T09:57:14Z

This PR adds inference tests for bitsandbytes NF4 quantization on Gaudi, supporting both pre-quantized models and models that are quantized at runtime.

vllm/model_executor/model_loader/loader.py

vllm/model_executor/layers/quantization/bitsandbytes.py

vllm/config.py

tests/quantization/test_bitsandbytes.py

vivekgoe

Can we also modify CI file to add the 2 tests that are passing?

vivekgoe · 2025-06-23T07:54:49Z

tests/quantization/test_bitsandbytes_hpu.py

+models_pre_quant_4bit_to_test = [("hugging-quants/Meta-Llama-3.1-8B-BNB-NF4",
+                                  "read_pre-quantized_4-bit_NF4_opt_model")]
+
+models_pre_quant_8bit_to_test = [


Since we are not testing for 8-bit quant support for now, therefore we can remove these lines.

vivekgoe · 2025-06-23T07:56:34Z

tests/quantization/test_bitsandbytes_hpu.py

+                             model_name, True)
+
+
+@pytest.mark.skipif(torch.cuda.device_count() < 2,


Does this work for Gaudi?

vivekgoe · 2025-06-23T07:56:50Z

tests/quantization/test_bitsandbytes_hpu.py

+    )
+
+
+@pytest.mark.skipif(torch.cuda.device_count() < 2,


Does this work for Gaudi?

rsshaik1 force-pushed the tests_bnb branch from 7b45f27 to 319d258 Compare June 19, 2025 10:02

vivekgoe requested changes Jun 19, 2025

View reviewed changes

rsshaik1 force-pushed the tests_bnb branch 2 times, most recently from 2cc1fdb to aac9d3f Compare June 23, 2025 05:19

rsshaik1 changed the title ~~supports bnb 4-bit quantization for llama~~ Enabled BnB NF4 inference on Gaudi Jun 23, 2025

rsshaik1 force-pushed the tests_bnb branch from aac9d3f to f12557a Compare June 23, 2025 06:33

vivekgoe requested changes Jun 23, 2025

View reviewed changes

rsshaik1 force-pushed the tests_bnb branch from f12557a to 7d71049 Compare June 23, 2025 08:47

Gaudi support to bnb NF4 inference tests

df66153

rsshaik1 force-pushed the tests_bnb branch from 7d71049 to df66153 Compare June 23, 2025 08:49

Added bnb NF4 tests to CI

46612f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enabled BnB NF4 inference on Gaudi #1457

Enabled BnB NF4 inference on Gaudi #1457

Uh oh!

rsshaik1 commented Jun 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vivekgoe left a comment

Uh oh!

vivekgoe Jun 23, 2025

Uh oh!

rsshaik1 Jun 23, 2025

Uh oh!

vivekgoe Jun 23, 2025

Uh oh!

rsshaik1 Jun 23, 2025

Uh oh!

vivekgoe Jun 23, 2025

Uh oh!

rsshaik1 Jun 23, 2025

Uh oh!

Uh oh!

		model_name, True)


		@pytest.mark.skipif(torch.cuda.device_count() < 2,

Enabled BnB NF4 inference on Gaudi #1457

Are you sure you want to change the base?

Enabled BnB NF4 inference on Gaudi #1457

Uh oh!

Conversation

rsshaik1 commented Jun 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vivekgoe left a comment

Choose a reason for hiding this comment

Uh oh!

vivekgoe Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rsshaik1 Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

vivekgoe Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rsshaik1 Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

vivekgoe Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

rsshaik1 Jun 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rsshaik1 commented Jun 19, 2025 •

edited

Loading