[Feat]: llama.cpp vs mnn-llm speed #225

bi4key · 2025-01-26T15:57:15Z

bi4key
Jan 26, 2025

llama.cpp vs mnn-llm speed

https://www.reddit.com/r/LocalLLaMA/s/tDh1l0cvVe

.

https://github.com/alibaba/MNN/blob/master/project/android/apps/MnnLlmApp/README.md

.

https://github.com/alibaba/MNN

a-ghorbani · 2025-01-26T19:16:42Z

a-ghorbani
Jan 26, 2025
Maintainer

Interesting. The multimodality aspect sounds great, but the LLM inference speed doesn’t seem to be significantly different on phones, from what folks report:

0 replies

Mcxiaocaibug · 2025-01-30T03:24:06Z

Mcxiaocaibug
Jan 30, 2025

I think there's still a big difference

0 replies

Mcxiaocaibug · 2025-02-01T03:13:33Z

Mcxiaocaibug
Feb 1, 2025

pocketpal 5tokens/s
mnn 50token/s
DeepSeek r1 1.5B
There is a big difference between them and hopefully it can be improved
@a-ghorbani

0 replies

HiWill · 2025-02-10T02:10:36Z

HiWill
Feb 10, 2025

MNN beat llama.cpp

pocketpal 5tokens/s mnn 50token/s DeepSeek r1 1.5B There is a big difference between them and hopefully it can be improved @a-ghorbani

0 replies

HiWill · 2025-02-10T02:11:26Z

HiWill
Feb 10, 2025

llama.cpp 易用性要好，希望能提升性能

0 replies

Disonantemus · 2025-03-10T17:57:02Z

Disonantemus
Mar 10, 2025

Speed test:

PocketPal vs MNN Chat

Prompt:

"Why the sky is blue?"

Results

t/s	model	q
10.06	Qwen2.5-3B-instruct-MNN	4-bit
6.02	Qwen2.5-3B-instruct	iq4_xs
5.84	Qwen2.5-3B-instruct	q2_k
5.32	Qwen2.5-3B-instruct	q4_0
5.01	Qwen2.5-3B-instruct	q8_0
4.16	Qwen2.5-3B-instruct	iq2_m

PocketPal Models: bartowski/Qwen2.5-3B-Instruct-GGUF

MNN Models

I was expecting faster "t/s" when using models with lower q, but iq4_xs was the fastest, why?

Device: Samsung SM-G780G
OS: Android 13
App Version: 1.8.5
8 cores
7.4GB

1 reply

iAdanos May 9, 2025

As of now, mnn is still much (10x) faster on android. Worth researching.

bi4key · 2025-05-10T11:20:21Z

bi4key
May 10, 2025
Author

MNN vs PocketPal

Pixel 6a

Unsloth model

MNN in 3x faster
But I don't know what quant they use, I use q4

MNN

PocketPal

0 replies

bi4key · 2025-05-10T11:40:33Z

bi4key
May 10, 2025
Author

New version MNN is even faster that previous

https://github.com/alibaba/MNN/blob/master/apps/Android/MnnLlmChat/README.md#version-040

0 replies

eozalp · 2025-08-15T19:51:49Z

eozalp
Aug 15, 2025

Mnn is faster. I don't have statistics. Actually no need. Trust me.

But... Everything is not about speed. Ecosystem matter.

0 replies

Uh oh!

[Feat]: llama.cpp vs mnn-llm speed #225

Uh oh!

Replies: 9 comments · 1 reply

Uh oh!

a-ghorbani Jan 26, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Speed test:

Prompt:

Results

Uh oh!

Uh oh!

bi4key May 10, 2025 Author

Uh oh!

bi4key May 10, 2025 Author

Uh oh!

Replies: 9 comments 1 reply

a-ghorbani
Jan 26, 2025
Maintainer

bi4key
May 10, 2025
Author

bi4key
May 10, 2025
Author