why is llama.cpp server so slow as compared to ollama? #8413
Unanswered
EduardTalianu
asked this question in
Q&A
Replies: 1 comment
-
@EdwardDali you can use --ngl N option to offload to GPU, where N is the number of layers you want to offload to GPU for 7B model it is around 30. Read this for more details https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md or use llama-cli -h |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
what options can I enable to make it faster in a no gpu environment?
Beta Was this translation helpful? Give feedback.
All reactions