Model llama-2-7b.Q4_0.gguf
Loads with llama.cpp
but Fails with whisper.cpp
#1316
Replies: 2 comments
-
Further Investigation on Model Compatibility: I've done some additional testing to narrow down the problem. I converted a llama model from Hugging Face, Meta repo, using the following command: Pleasingly, the converted model ( |
Beta Was this translation helpful? Give feedback.
-
The issue seems to be with the newer quantization models, a Q8 gguf model works fine. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Description:
Hello! When I try to run the model
llama-2-7b.Q4_0.gguf
(TheBloke repo) usingllama.cpp
, everything works fine. However, when I attempt to use the same model withwhisper.cpp
talk-llama
, I encounter an error. Additionally, I'd like to mention that executing./main -m models/ggml-small.en.bin -f samples/jfk.wav
works correctly without any issues.Steps to Reproduce:
Load the
llama-2-7b.Q4_0.gguf
model usingllama.cpp
(Works without issues).Attempt to use the above model with
whisper.cpp
talk-llama
using the following command:./talk-llama -mw ./models/ggml-small.en.bin -ml ../llama.cpp/models/llama-2-7b.Q4_0.gguf -p "Hey, there" -t 4
Expected Behavior:
The model should load and work without any issues, just as it does with
llama.cpp
.Actual Behavior:
An error message is displayed, stating:
This is followed by a segmentation fault.
Additional Information:
Device: Apple M2
Model file: llama-2-7b.Q4_0.gguf
Whisper model file: ./models/ggml-small.en.bin
I Would appreciate any guidance or insights into why this might be happening and how to resolve it. Thanks for your time!
Full Error Message:
Beta Was this translation helpful? Give feedback.
All reactions