Replies: 1 comment 1 reply
-
hello, glad to see you here.
all these tech are both very complicated(in my personal point of view, because there are too much tech encapsulation and we don't know how it works clearly) tech that's the reason of ggml-qnn comes about. one more important thing, we already know that "Qualcomm engineers have been participating in llama.cpp development for some time now" in this post #8273 (comment). |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I was able to run llama.cpp on my new Samsung S25 phone using the Termux app based on instructions at https://github.com/ggml-org/llama.cpp/blob/master/docs/android.md
The S25 uses Qualcomm Snapdragon 8 Elite CPU and I built my llama.cpp for CPU mode.
After building I loaded this model:
pwd
/data/data/com.termux/files/home
curl -L https://huggingface.co/bartowski/Phi-3.5-mini-instruct-GGUF/resolve/main/Phi-3.5-mini-instruct-Q4_K_M.gguf -o ~/Phi-3.5-mini-instruct-Q4_K_M.gguf
and run with
./llama.cpp/build/bin/llama-server -m ~/Phi-3.5-mini-instruct-Q4_K_M.gguf -c 16384 --n-gpu-layers 99 --host 10.0.0.172
Note that -c 32768 crashed Termux. I am using the model on my local WiFi network with a PC and tablets.
I am happy with the model performance/speed on the S25 and quality of the model output so far.
Is there anyone working on including (S25) Qualcomm NPU for building Android based llama.cpp?
Thank You.
Beta Was this translation helpful? Give feedback.
All reactions