PowerInfer: Built on llama.cpp, Now 11x Faster #4548

bobqianic · 2023-12-20T17:17:00Z

bobqianic
Dec 20, 2023

I think the main breakthrough is that it can arrange the position of weight parameters more scientifically based on the frequency of neuron activation, placing the frequently activated weights in faster-reading caches to improve inference speed. They developed a Neuron-aware Operator that can bypass neurons that are not activated, and also developed an offline profiling technique that allows the Neuron Placement Policy to be predetermined.

Github: https://github.com/SJTU-IPADS/PowerInfer

Paper: https://ipads.se.sjtu.edu.cn/_media/publications/powerinfer-20231219.pdf

Relevant discussions: #4543 #4542 #4534

joseph777111 · 2024-01-03T00:57:56Z

joseph777111
Jan 3, 2024

@ggerganov,what are your thoughts on this? Could this be useful for llama.cpp? 🤔

2 replies

bobqianic Jan 3, 2024
Author

See #4534 (reply in thread)

joseph777111 Jan 3, 2024

Thanks 🤘😁

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PowerInfer: Built on llama.cpp, Now 11x Faster #4548

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

PowerInfer: Built on llama.cpp, Now 11x Faster #4548

Uh oh!

Uh oh!

bobqianic Dec 20, 2023

Replies: 1 comment · 2 replies

Uh oh!

joseph777111 Jan 3, 2024

Uh oh!

bobqianic Jan 3, 2024 Author

Uh oh!

joseph777111 Jan 3, 2024

bobqianic
Dec 20, 2023

Replies: 1 comment 2 replies

joseph777111
Jan 3, 2024

bobqianic Jan 3, 2024
Author