Skip to content

Is quantization performed locally ? #6583

Answered by anakin87
ciliamadani asked this question in Questions
Discussion options

You must be logged in to vote

Hello, @ciliamadani...

load_in_4bit is a flag that makes the Hugging Face model run with a 4-bit quantization using the bitsandbytes library.
The model is downloaded in its full size and then executed in a quantized version.

There are other options that download and run a pre-quantized model. (Not all are supported in Haystack).

For a complete overview, read this good blog post.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@ciliamadani
Comment options

Answer selected by ciliamadani
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants