llama.cpp no longer supports GGML model files #3070

jmunsell1195 · 2023-09-07T21:09:14Z

jmunsell1195
Sep 7, 2023

I have been using TheBloke/Falcon-40b-Instruct-GGML with llama.cpp to get great results on my modest hardware (~16 gb vRam). Since llama.cpp no longer supports GGML models, and the bloke has yet to release the GGUF falcon models that are smaller than 180b, what are those affected doing as a workaround?

KerfuffleV2 · 2023-09-07T21:39:32Z

KerfuffleV2
Sep 7, 2023
Collaborator

Not sure there's a good answer here other than get HF version and convert to GGUF/quantize it yourself.

The LLaMA GGML to GGUF conversion script could probably be updated to support Falcon relatively easily but I'm not that familiar with Falcon models.

I guess there's also trying to contact TheBloke and asking him to post GGUF Falcon models. If you're actually asking for something that doesn't exist you might want to make a donation or something too. He's on GitHub and you could potentially @ him.

3 replies

jmunsell1195 Sep 7, 2023
Author

Can you link to that script?

KerfuffleV2 Sep 7, 2023
Collaborator

Sure, it's this one: https://github.com/ggerganov/llama.cpp/blob/master/convert-llama-ggml-to-gguf.py

I wrote it so I may be able to help if you have questions about what it's doing. (Disclosure though: I'm not an expert on this stuff.)

jmunsell1195 Sep 8, 2023
Author

Disclosure: Me neither. Thanks for your help @KerfuffleV2

TheBloke · 2023-09-07T21:40:59Z

TheBloke
Sep 7, 2023

Don't worry, I plan to do Falcon GGUFs fairly soon.

I'm a bit confused by the original question though - llama.cpp never used to support Falcon GGMLs :) They were supported by llm.cpp, a llama.cpp fork, and (I think?) by other clients that used the GGML library, like KoboldCpp. All of which continue to work with the original GGML - or GGCC as llm.cpp renamed it latterly.

It's only with GGUF that llama.cpp now supports Falcon.

Anyway yeah I'll try to have some Falcon 40Bs and 7Bs out by the weekend.

0 replies

jmunsell1195 · 2023-09-07T23:18:26Z

jmunsell1195
Sep 7, 2023
Author

Thank you for your response @TheBloke and all of the work that do you to keep these quantized models coming at us! Is there a way I can help out with 40B or 7B using my hardware?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama.cpp no longer supports GGML model files #3070

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

llama.cpp no longer supports GGML model files #3070

Uh oh!

Uh oh!

jmunsell1195 Sep 7, 2023

Replies: 3 comments · 3 replies

Uh oh!

KerfuffleV2 Sep 7, 2023 Collaborator

Uh oh!

jmunsell1195 Sep 7, 2023 Author

Uh oh!

KerfuffleV2 Sep 7, 2023 Collaborator

Uh oh!

jmunsell1195 Sep 8, 2023 Author

Uh oh!

Uh oh!

TheBloke Sep 7, 2023

Uh oh!

jmunsell1195 Sep 7, 2023 Author

jmunsell1195
Sep 7, 2023

Replies: 3 comments 3 replies

KerfuffleV2
Sep 7, 2023
Collaborator

jmunsell1195 Sep 7, 2023
Author

KerfuffleV2 Sep 7, 2023
Collaborator

jmunsell1195 Sep 8, 2023
Author

TheBloke
Sep 7, 2023

jmunsell1195
Sep 7, 2023
Author