1.58 BitNets - a new opportunities for llamafile? #313
Replies: 4 comments 10 replies
-
Is here any news? |
Beta Was this translation helpful? Give feedback.
-
I have now submitted #552 that allows usage of In terms of recommending a really good model: ternary models released so far are just toys, I haven't done much experimentation, so it is hard to make a recommendation. My guess is that it is best to go with the largest TriLM model. It has 4B parameters, but with #552 it quantizes to 1.31 GiB, has a very decent inference speed, and hence it can be a viable option even for low-end devices. |
Beta Was this translation helpful? Give feedback.
-
I was (and still am) skeptical for a reason. Here is what I see as quoted performance on M2-Ultra in the T-MAC repository for the 3B Bitnet-1.58b model (I have copy/pasted the graph in the T-MAC repository for your convenience here): I don't have an M2-Ultra, but I do have an M2-Max laptop (so basically half of an M2-Ultra). Here is what I get using Very similar performance as T-MAC for 1-3 threads but then, instead of saturating at ~60-65 tokens/second as they do, we a) get 99 t/s at 8 threads (50+% faster than T-MAC), and b) it does not look at all like performance is saturating as it is with T-MAC, so I wouldn't be surprised if we get 150 t/s on M2-Ultra with 16 threads (2.5X T-MAC). T-MAC saturates because the threads start fighting for the available bandwidth to load values from the lookup table(s). |
Beta Was this translation helpful? Give feedback.
-
Any support for 1.58 bit falcon based model ? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
BitNets are the most exciting thing for LLM happeing right now. @jart - Llalmafile can become the BitNet leader if you get early on! The big advantages are:
Checkout these resources:
Beta Was this translation helpful? Give feedback.
All reactions