Replies: 1 comment 1 reply
-
Are you using the same LoRA in both cases? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
First of all, exllama v2 is a really great module.
But there is one problem.
In the past exllama v1, there was a slight slowdown when using Lora, but it was approximately 10%.
However, in the case of exllama v2, it is good to support Lora, but when using Lora, the token creation speed slows down by almost 2 times.
(13B based on 4090: without lora 80 tokens/s => with lora 30 tokens/s)
Does anyone know how to solve this?
Beta Was this translation helpful? Give feedback.
All reactions