How to use TGI for a finetuned LORA adapter of llama-3.3-70B using unsloth? #3278
InderjeetVishnoi
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi ,
I've fine-tuned a LoRA adapter on the LLaMA-3.3-70B model for my task and am exploring using TGI for optimized inference. From what I understand, TGI doesn’t natively support loading adapters directly for inference — is there a workaround for this?
Do I need to merge the adapter with the base model before serving it via TGI? I’ve been trying to avoid that due to compute constraints.
Any guidance or suggestions would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions