You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am running a Llama3 8B model on a Raspberry pi 5.
Numbers for Q4_0 :
llama_perf_sampler_print: sampling time = 51.03 ms / 272 runs ( 0.19 ms per token, 5329.99 tokens per second)
llama_perf_context_print: load time = 26701.98 ms
llama_perf_context_print: prompt eval time = 27882.46 ms / 104 tokens ( 268.10 ms per token, 3.73 tokens per second)
llama_perf_context_print: eval time = 162237.56 ms / 342 runs ( 474.38 ms per token, 2.11 tokens per second)
llama_perf_context_print: total time = 314657.72 ms / 446 tokens
Numbers for Q4_0_4_4:
llama_perf_sampler_print: sampling time = 17.94 ms / 117 runs ( 0.15 ms per token, 6522.10 tokens per second)
llama_perf_context_print: load time = 2299.44 ms
llama_perf_context_print: prompt eval time = 10097.25 ms / 98 tokens ( 103.03 ms per token, 9.71 tokens per second)
llama_perf_context_print: eval time = 48624.09 ms / 108 runs ( 450.22 ms per token, 2.22 tokens per second)
llama_perf_context_print: total time = 120127.36 ms / 206 tokens
There is a significant increase in prompt processing time but not much in text gen time.
Where as if we see this : #5780 (review)
We can see that text gen time increases by 5-6 tok/sec or more in some cases on a Apple M2 Ultra. Why is this happening ?
I hope it is okay to tag the man who created this format here : @Dibakar ???
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I am running a Llama3 8B model on a Raspberry pi 5.
Numbers for Q4_0 :
Numbers for Q4_0_4_4:
There is a significant increase in prompt processing time but not much in text gen time.
Where as if we see this : #5780 (review)
We can see that text gen time increases by 5-6 tok/sec or more in some cases on a Apple M2 Ultra. Why is this happening ?
I hope it is okay to tag the man who created this format here : @Dibakar ???
Beta Was this translation helpful? Give feedback.
All reactions