Replies: 2 comments 9 replies
-
Do you have an output log from the beginning of llama-cli run? |
Beta Was this translation helpful? Give feedback.
3 replies
-
You need to specify -n-gpu-layers 35 for example for a typical &b model. Something like this: ../llama.cpp/llama-cli --model models/Meta-Llama-3-8B-Instruct_Q5_K_S.gguf --n-gpu-layers 25 -cnv --interactive-first --simple-io -b 512 -n -1 --ctx_size 0 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 -r "/n>" --log-disable |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Environment
Steps
models/llama-7b
, rename toggml-model-q4_0.gguf
../examples/chat.sh
Expected and actual
It should utilize GPU, but the CUDA statistics in tasks manager is 0%.
I also have ollama-cuda which linked CUDA statically. It will utilize GPU.
I have done the same thing in Linux, it doesn't work as well.
Beta Was this translation helpful? Give feedback.
All reactions