Mixtral 8x7b #4539
Replies: 2 comments
-
Sure, it works good with recent llama.cpp. I don't have a good GPU and run it on CPU only. Depending on quantisation method more or less RAM is required. I use 6-Bit Q at the moment. |
Beta Was this translation helpful? Give feedback.
-
I tested the q4_K_M variant hf/TheBloke/Mixtral_Instruct on intel i5 at 3 cores – cpu only as well. It needs about ~30 gb of RAM and generates at 3 tokens per second. It is of course not at the level as GPT-4, but it is anyway indeed incredibly smart! The smartes llm I have seen so far after GPT-4. However in my case it was not really useable for everday usecases due to extrem long eval time, but afaik there is a fix for it in the latest llama.cpp, but haven't tested yet. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This is not a small model, but it is shown to preform at the level as GPT 4 and is open source. I'm super curious if anyone has gotten it working on their machine.
Beta Was this translation helpful? Give feedback.
All reactions