Replies: 2 comments 9 replies
-
See: https://github.com/vllm-project/vllm/blob/main/examples/lora_with_quantization_inference.py |
Beta Was this translation helpful? Give feedback.
9 replies
-
@tarukumar Feel free to open an issue at https://github.com/bd-iaas-us/vllm/issues and assign it to me. In this issue, please let us know why existing Quantization + LoRA solution in vLLM does not suffice, as well as some models in need of this feature. Thanks! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
What i have observed is when I try to deploy the model using
qlora_adapter_name_or_path
for qlora adapter it fails to deploy with the error mentioned in the line https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L899-L911. The question is to deploy qlora adpater should i use--lora-modules
oradapter-cache
parameter for qlora adapter? What is the best approach here ?Beta Was this translation helpful? Give feedback.
All reactions