why is llama.cpp server so slow as compared to ollama? #8413

Unanswered

EduardTalianu asked this question in Q&A

EduardTalianu
Jul 10, 2024

what options can I enable to make it faster in a no gpu environment?

Replies: 1 comment

dspasyuk
Jul 10, 2024

@EdwardDali you can use --ngl N option to offload to GPU, where N is the number of layers you want to offload to GPU for 7B model it is around 30. Read this for more details https://github.com/ggerganov/llama.cpp/blob/master/examples/main/README.md or use llama-cli -h

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment