Replies: 3 comments
-
Following! @bachittle if you find any implementations, please share! |
Beta Was this translation helpful? Give feedback.
-
(Just my personal opinion) I don't think it contains anything of real use for now. Implementing it is impossible without code, given we already have all code for PowerInfer and even that is currently not on the table to really include .. that one is far from it. |
Beta Was this translation helpful? Give feedback.
-
FYI. Here is a related topic discussed in the PowerInfer community. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm wondering if any of the techniques proposed in the following paper could be implemented here: https://huggingface.co/papers/2312.11514
https://arxiv.org/abs/2312.11514
This goes above my level of understanding, but I'm wondering if they give enough technical details to run this in a new example. The results were that they were able to load a model that was twice the size of available DRAM with major increases in inference speed. So figured I'd make a discussion post as an initial seed and see where this idea goes.
Beta Was this translation helpful? Give feedback.
All reactions