llama.cpp no longer supports GGML model files #3070
Replies: 3 comments 3 replies
-
Not sure there's a good answer here other than get HF version and convert to GGUF/quantize it yourself. The LLaMA GGML to GGUF conversion script could probably be updated to support Falcon relatively easily but I'm not that familiar with Falcon models. I guess there's also trying to contact TheBloke and asking him to post GGUF Falcon models. If you're actually asking for something that doesn't exist you might want to make a donation or something too. He's on GitHub and you could potentially @ him. |
Beta Was this translation helpful? Give feedback.
-
Don't worry, I plan to do Falcon GGUFs fairly soon. I'm a bit confused by the original question though - llama.cpp never used to support Falcon GGMLs :) They were supported by llm.cpp, a llama.cpp fork, and (I think?) by other clients that used the GGML library, like KoboldCpp. All of which continue to work with the original GGML - or GGCC as llm.cpp renamed it latterly. It's only with GGUF that llama.cpp now supports Falcon. Anyway yeah I'll try to have some Falcon 40Bs and 7Bs out by the weekend. |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response @TheBloke and all of the work that do you to keep these quantized models coming at us! Is there a way I can help out with 40B or 7B using my hardware? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have been using TheBloke/Falcon-40b-Instruct-GGML with llama.cpp to get great results on my modest hardware (~16 gb vRam). Since llama.cpp no longer supports GGML models, and the bloke has yet to release the GGUF falcon models that are smaller than 180b, what are those affected doing as a workaround?
Beta Was this translation helpful? Give feedback.
All reactions