Inference time a lot higher when using cmake compared to w64devkit. #3834

v4lentin1879 · 2023-10-28T15:21:18Z

v4lentin1879
Oct 28, 2023

For certain reasons, the inference time of my mistral-orca is a lot longer when having compiled the binaries with cmake compared to w64devkit. What could be the cause of this? I'm using a macbook pro, 2019 with an i7. I want to run the inference on CPU only.

w64devkid:
llama_print_timings: load time = 2789.31 ms
llama_print_timings: sample time = 7.55 ms / 18 runs ( 0.42 ms per token, 2383.16 tokens per second)
llama_print_timings: prompt eval time = 1925.06 ms / 20 tokens ( 96.25 ms per token, 10.39 tokens per second)
llama_print_timings: eval time = 8256.93 ms / 18 runs ( 458.72 ms per token, 2.18 tokens per second)
llama_print_timings: total time = 23842.73 ms

cmake:
llama_print_timings: load time = 4133.27 ms
llama_print_timings: sample time = 5.71 ms / 18 runs ( 0.32 ms per token, 3153.47 tokens per second)
llama_print_timings: prompt eval time = 25917.85 ms / 19 tokens ( 1364.10 ms per token, 0.73 tokens per second)
llama_print_timings: eval time = 43493.77 ms / 18 runs ( 2416.32 ms per token, 0.41 tokens per second)
llama_print_timings: total time = 74989.23 ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference time a lot higher when using cmake compared to w64devkit. #3834

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Inference time a lot higher when using cmake compared to w64devkit. #3834

Uh oh!

v4lentin1879 Oct 28, 2023

Replies: 0 comments

v4lentin1879
Oct 28, 2023