Why does Q4_0_4_4 make massive difference in some hardwares whereas making no difference in other hardwares? #9830

Abhranta · 2024-10-10T22:01:24Z

Abhranta
Oct 10, 2024

I am running a Llama3 8B model on a Raspberry pi 5.
Numbers for Q4_0 :

llama_perf_sampler_print:    sampling time =      51.03 ms /   272 runs   (    0.19 ms per token,  5329.99 tokens per second)
llama_perf_context_print:        load time =   26701.98 ms
llama_perf_context_print: prompt eval time =   27882.46 ms /   104 tokens (  268.10 ms per token,     3.73 tokens per second)
llama_perf_context_print:        eval time =  162237.56 ms /   342 runs   (  474.38 ms per token,     2.11 tokens per second)
llama_perf_context_print:       total time =  314657.72 ms /   446 tokens

Numbers for Q4_0_4_4:

llama_perf_sampler_print:    sampling time =      17.94 ms /   117 runs   (    0.15 ms per token,  6522.10 tokens per second)
llama_perf_context_print:        load time =    2299.44 ms
llama_perf_context_print: prompt eval time =   10097.25 ms /    98 tokens (  103.03 ms per token,     9.71 tokens per second)
llama_perf_context_print:        eval time =   48624.09 ms /   108 runs   (  450.22 ms per token,     2.22 tokens per second)
llama_perf_context_print:       total time =  120127.36 ms /   206 tokens

There is a significant increase in prompt processing time but not much in text gen time.

Where as if we see this : #5780 (review)
We can see that text gen time increases by 5-6 tok/sec or more in some cases on a Apple M2 Ultra. Why is this happening ?

I hope it is okay to tag the man who created this format here : @Dibakar ???

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does Q4_0_4_4 make massive difference in some hardwares whereas making no difference in other hardwares? #9830

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why does Q4_0_4_4 make massive difference in some hardwares whereas making no difference in other hardwares? #9830

Uh oh!

Abhranta Oct 10, 2024

Replies: 0 comments

Abhranta
Oct 10, 2024