Is quantization performed locally ? #6583

ciliamadani · 2023-12-18T11:40:51Z

ciliamadani
Dec 18, 2023

Hello,
I'm currently utilizing the Llama-2-7b-chat-hf model from Hugging Face alongside Haystack with the following configuration:

pn = PromptNode(model_name_or_path="meta-llama/Llama-2-7b-chat-hf", model_kwargs={"stream":True,'device':None,'load_in_4bit':True, "device_map": "auto"} ,max_length=256)

I have a question regarding the "load_in_4bit" flag. I'm curious to know whether, when this flag is enabled, the model files are first loaded onto the local machine and then quantization is performed locally, or if the quantized model configuration is automatically downloaded from Hugging Face.

Thank you in advance for providing clarification.

Answered by anakin87

Dec 18, 2023

Hello, @ciliamadani...

load_in_4bit is a flag that makes the Hugging Face model run with a 4-bit quantization using the bitsandbytes library.
The model is downloaded in its full size and then executed in a quantized version.

There are other options that download and run a pre-quantized model. (Not all are supported in Haystack).

For a complete overview, read this good blog post.

View full answer

anakin87 · 2023-12-18T13:49:25Z

anakin87
Dec 18, 2023
Maintainer

Hello, @ciliamadani...

load_in_4bit is a flag that makes the Hugging Face model run with a 4-bit quantization using the bitsandbytes library.
The model is downloaded in its full size and then executed in a quantized version.

There are other options that download and run a pre-quantized model. (Not all are supported in Haystack).

For a complete overview, read this good blog post.

1 reply

ciliamadani Dec 18, 2023
Author

Got it, thank you for the clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is quantization performed locally ? #6583

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is quantization performed locally ? #6583

Uh oh!

Uh oh!

ciliamadani Dec 18, 2023

Replies: 1 comment · 1 reply

Uh oh!

anakin87 Dec 18, 2023 Maintainer

Uh oh!

ciliamadani Dec 18, 2023 Author

ciliamadani
Dec 18, 2023

Replies: 1 comment 1 reply

anakin87
Dec 18, 2023
Maintainer

ciliamadani Dec 18, 2023
Author