Replies: 2 comments
-
of course it can be faster and it will be, given enough time. Also it is important to note that Powerinfer is great for certain hardware, but right not lacking for others. For example, the speedup on Apple Silicon is not great (yet). I am hopeful though that this project will mature and more similar ideas will start off. |
Beta Was this translation helpful? Give feedback.
-
I am thinking about the new speculative streaming: https://arxiv.org/abs/2402.11131 Also wondering if and when concepts like speculative decoding, LLM in a flash and Flash attention would be applicable in llama.cpp. I know they are working on some of it. Curious to read what others think 🤔 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is PowerInfer the fastest or can it still be faster?
Beta Was this translation helpful? Give feedback.
All reactions