Replies: 1 comment 6 replies
-
What prompt format are you using ? |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I don't know where else to ask but this issue is killing me. When I run llama.cpp in console, I use a command like this ./main -m models/7B/ggml-model-q4_1.bin -ins --color -ngl 1
Then I type my prompt for example and receive exactly what I want. For example:
However, when I'm trying to use llama.cpp in production, for example via ./server -m models/7B/ggml-model-q4_1.bin --port 8001 -ngl 1, having the same prompt I receive a piece of garbage in the beginning of the output.
Example:
alex@M1 llama-test % python3 ll10.py
razor blade technology
The Adidas brand has been synonymous with high-quality sports equipment for decades, and their razor blade technology is no exception. This innovative design feature allows athletes to experience unparalleled performance on the field or court.... etc.
I didn't ask anything about "razor blade technology".
In my case, I'm using llama-2-chat officially requested from META, which I quantized manually to 4 bit using llama.ccp.
I discovered that this issue/bug relates to any wrapper of LLAMA2. I suppose I don't understand some basics, but as you see from the file name, it's my 10th attempt to run the script and in any case, in some variations I receive this garbage pre-output.
I have seen many answers that it's a LLAMA2 bug. However, it never (100% never) happens when I run llama.cpp in the -ins mode in command line.
Please help me with this issue. Thank you.
Beta Was this translation helpful? Give feedback.
All reactions