Replies: 1 comment 1 reply
-
Interesting stuff! I don't think I can implement this approach - it seems quite complicated. But I got inspired by this idea and started implementing an n-bit quantisation + matrix multiplication in |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
Great work with this project, love it!
As this will probably always be running on a CPU, the project made me recall a paper from a year or so back, on fast approximate matrix multiplication on CPU. It can deliver speed ups 100x the speed of exact matrix multiplication:
https://arxiv.org/abs/2106.10860
There is a C++ implementation here:
https://github.com/dblalock/bolt
Maybe this would get things to GPU speeds on CPU!
Beta Was this translation helpful? Give feedback.
All reactions