Inference time a lot higher when using cmake compared to w64devkit. #3834
Unanswered
v4lentin1879
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
For certain reasons, the inference time of my mistral-orca is a lot longer when having compiled the binaries with cmake compared to w64devkit. What could be the cause of this? I'm using a macbook pro, 2019 with an i7. I want to run the inference on CPU only.
w64devkid:
llama_print_timings: load time = 2789.31 ms
llama_print_timings: sample time = 7.55 ms / 18 runs ( 0.42 ms per token, 2383.16 tokens per second)
llama_print_timings: prompt eval time = 1925.06 ms / 20 tokens ( 96.25 ms per token, 10.39 tokens per second)
llama_print_timings: eval time = 8256.93 ms / 18 runs ( 458.72 ms per token, 2.18 tokens per second)
llama_print_timings: total time = 23842.73 ms
cmake:
llama_print_timings: load time = 4133.27 ms
llama_print_timings: sample time = 5.71 ms / 18 runs ( 0.32 ms per token, 3153.47 tokens per second)
llama_print_timings: prompt eval time = 25917.85 ms / 19 tokens ( 1364.10 ms per token, 0.73 tokens per second)
llama_print_timings: eval time = 43493.77 ms / 18 runs ( 2416.32 ms per token, 0.41 tokens per second)
llama_print_timings: total time = 74989.23 ms
Beta Was this translation helpful? Give feedback.
All reactions