Replies: 1 comment 1 reply
-
if you don't specify a blas backend it defaults to llamafile i think which is faster in cpu, but not relevant unless you're using -nkvo ? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi! Recently (as in, I finished 5 minutes ago) I got curious as-to how fast my shitbox (for AI use anyways) can run.
Honestly, pretty fast! But the main thing here is the comparison between LCPP and IK_LCPP, and (un)surprisingly mainline LCPP gets pretty hosed.
Specs:
Here's the cherrypicked results that show each framework at their best -- both are running with
-ot exps=CPU
(with LCPP table slightly modified because they output different formats)And here's the full log including the commands used and other random attempts
Some other interesting notes:
amb
is higher, but its faster foramb
to be lower with FA. ???exps=CPU
(which I later found only offloads parts of the FFN to the CPU) andffn=CPU
(which offloads all of the FFN to the CPU as I was originally intending)... but it's slower to use the one which offloads the norms and stuff too! For some reason!I still need to try dense models, CPU without offload, etc etc for this to be a fair comparison, but I hope this is still interesting data :)
Beta Was this translation helpful? Give feedback.
All reactions