"torch.bfloat16 is not supported for quantization method awq. Supported dtypes: [torch.float16]" error even after trying dtypr=auto/half/float16/bfloat16 #9116
lavishasharma
announced in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
from vllm import LLM, SamplingParams
llm = LLM(model="TheBloke/Llama-2-7b-Chat-AWQ", device="cpu", dtype="half", quantization="AWQ")
I am also facing the cuda error, "no nivida drivers found". I really wish to mititgate these two errors. I tried everything I could. Please help.
Beta Was this translation helpful? Give feedback.
All reactions