Replies: 2 comments
-
Hi! Any update on this error? I replicated it using ValueError: .half() is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been casted to the correct dtype. |
Beta Was this translation helpful? Give feedback.
0 replies
-
@ocesp98, @rubenCrayon can you please try the updated zero-inference with 4bit quantization? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Is it possible to use deepspeed inference with a 4/8-bit quantized model using bitsandbytes?
I use the bitsandbytes package like this:
However, it throws an error:
The ultimate goal is to combine the quantization with deepspeed zero infinity offload in the hope to run a larger model that currently does not fit on my GPU.
Beta Was this translation helpful? Give feedback.
All reactions