Something wrong when i try to use speculative decoding in llama.cpp #9228
Unanswered
bulaikexiansheng
asked this question in
Q&A
Replies: 1 comment
-
I use the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I try to use the speculative decoding script, the command is show below:
But i find that model'weight has been offloaded to GPU. But the GPU is not utilized.

Is there something wrong?
Beta Was this translation helpful? Give feedback.
All reactions