Replies: 1 comment 2 replies
-
@ggerganov,what are your thoughts on this? Could this be useful for llama.cpp? 🤔 |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I think the main breakthrough is that it can arrange the position of weight parameters more scientifically based on the frequency of neuron activation, placing the frequently activated weights in faster-reading caches to improve inference speed. They developed a Neuron-aware Operator that can bypass neurons that are not activated, and also developed an offline profiling technique that allows the Neuron Placement Policy to be predetermined.
Github: https://github.com/SJTU-IPADS/PowerInfer
Paper: https://ipads.se.sjtu.edu.cn/_media/publications/powerinfer-20231219.pdf
Relevant discussions: #4543 #4542 #4534
Beta Was this translation helpful? Give feedback.
All reactions