Is PowerInfer the fastest or can it still be faster? #5339

ouvaa · 2024-02-05T10:15:37Z

ouvaa
Feb 5, 2024

Is PowerInfer the fastest or can it still be faster?

CyborgArmy83 · 2024-02-17T11:50:50Z

CyborgArmy83
Feb 17, 2024

of course it can be faster and it will be, given enough time. Also it is important to note that Powerinfer is great for certain hardware, but right not lacking for others.

For example, the speedup on Apple Silicon is not great (yet). I am hopeful though that this project will mature and more similar ideas will start off.

0 replies

WiseFarAI · 2024-02-21T19:47:11Z

WiseFarAI
Feb 21, 2024

I am thinking about the new speculative streaming: https://arxiv.org/abs/2402.11131

Also wondering if and when concepts like speculative decoding, LLM in a flash and Flash attention would be applicable in llama.cpp. I know they are working on some of it.

Curious to read what others think 🤔

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is PowerInfer the fastest or can it still be faster? #5339

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Is PowerInfer the fastest or can it still be faster? #5339

Uh oh!

ouvaa Feb 5, 2024

Replies: 2 comments

Uh oh!

CyborgArmy83 Feb 17, 2024

Uh oh!

Uh oh!

WiseFarAI Feb 21, 2024

ouvaa
Feb 5, 2024

CyborgArmy83
Feb 17, 2024

WiseFarAI
Feb 21, 2024