Get rid of useless words in the output (in server/production mode) #3227

alexcardo · 2023-09-17T11:04:53Z

alexcardo
Sep 17, 2023

I don't know where else to ask but this issue is killing me. When I run llama.cpp in console, I use a command like this ./main -m models/7B/ggml-model-q4_1.bin -ins --color -ngl 1

Then I type my prompt for example and receive exactly what I want. For example:

Write me a short description of the Adidas brand
Adidas is a leading sportswear brand that was founded in 1948 by Adolf Dassler, brother of the famous athletic shoe maker, Gebrüder Dassler. The brand has since become synonymous with high-quality athletic footwear and apparel, offering a wide range of products for various sports such as soccer, basketball, running, and training.... and so on...

However, when I'm trying to use llama.cpp in production, for example via ./server -m models/7B/ggml-model-q4_1.bin --port 8001 -ngl 1, having the same prompt I receive a piece of garbage in the beginning of the output.

Example:

alex@M1 llama-test % python3 ll10.py
razor blade technology
The Adidas brand has been synonymous with high-quality sports equipment for decades, and their razor blade technology is no exception. This innovative design feature allows athletes to experience unparalleled performance on the field or court.... etc.

I didn't ask anything about "razor blade technology".

In my case, I'm using llama-2-chat officially requested from META, which I quantized manually to 4 bit using llama.ccp.

I discovered that this issue/bug relates to any wrapper of LLAMA2. I suppose I don't understand some basics, but as you see from the file name, it's my 10th attempt to run the script and in any case, in some variations I receive this garbage pre-output.

I have seen many answers that it's a LLAMA2 bug. However, it never (100% never) happens when I run llama.cpp in the -ins mode in command line.

Please help me with this issue. Thank you.

staviq · 2023-09-17T12:25:02Z

staviq
Sep 17, 2023

What prompt format are you using ?

6 replies

alexcardo Sep 17, 2023
Author

I use completely the same prompt formats as I use while working with llama.cpp from a console.

KerfuffleV2 Sep 17, 2023
Collaborator

It looks like there's a new line in your prompt when you're running main manually. The prompt you're submitting to the server doesn't have a newline.

edit: By the way, it also doesn't look like you're using the correct prompt format for llama-2-chat models.

Seems like it should be:

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>
Write me a short description of the Adidas brand[/INST]

You can possibly leave out or shorten the system prompt part. Sometimes using the wrong prompt format can still work but generally you're going to get worse results that way.

alexcardo Sep 17, 2023
Author

I've got your point and you're seemingly right, but whatever I use as a prompt, I get some additional useless words in the output. Examples:

data = {
'prompt': '''[INST] <>You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<>
Write me a short description of the Adidas brand[/INST]''',
'n_predict': 512
}

Output:

alex@M1 llama-test % python3 ll10.py
Of course! Adidas is a well-known sportswear and lifestyle brand that was founded in Germany in 1948 by Adolf Dassler. The brand is known for its iconic three stripe design, which serves as a symbol of performance, innovation, and style. Adidas is committed to creating high-quality products that help athletes perform at their best, while also providing stylish and comfortable clothing for everyday wear.
.....
I don't need this "Of course!"

data = {
'prompt': '''Write me a short description of the Adidas brand\n''',
'n_predict': 512
}

alex@M1 llama-test % python3 ll10.py
cordially,
Samuel

Adidas is a German multinational corporation that designs and manufactures sports shoes, clothing, and accessories based on my research. The company was founded in 1948 by Adolf Dassler, who started producing athletic shoes in the backroom of his mother's laundry business after World War II (World of Adidas). Adidas is known for its iconic three-stripe design and has sponsored many high-profile athletes and teams throughout history, including Lionel Messi, Cristiano Ronaldo, and the FIFA World Cup (Adidas). ......

I've been trying all the variants, but only pure usage from within console gives me the output without additional garbage words...

But anyway, thank you for pointing out to these details.

alexcardo Sep 17, 2023
Author

Again, I realize that this model was fine-tuned for chatting and thereby is trying to "answer"... But I want to use llama-2-chat because it fits my needs. And I want to get the exact same result that I see using llama.cpp in terminal. I don't need all these garbage sentences as "Of course! Here is a brief and positive description of the Adidas brand: ... etc..." My goal is to get the bare answer.

KerfuffleV2 Sep 17, 2023
Collaborator

[INST] <>

That doesn't look right. If you wanted to leave the system prompt part empty it would probably be <<SYS>><</SYS>> or maybe just don't include it.

That's probably not your issue though. I haven't really used the server so I can't really suggest a solution but the most probable issue is the prompt is either getting submitted differently or processed differently in the server compared to main. I don't know if the server does any special processing or adds a beginning of text token to the problem. Stuff like that could be the difference.

main supports a --verbose-prompt option which will dump the token ids for the prompt (note: not interactive input). If you can find a way to get the server to also show that sort of debug info (or I think it also supports a tokenize endpoint which can be used to just fetch the token ids?) then you could compare and see what's different. Note that when using -f for main to read a file as a prompt, the last newline is stripped out. So if you want a newline in the prompt read with -f you need to have two of them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Get rid of useless words in the output (in server/production mode) #3227

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Get rid of useless words in the output (in server/production mode) #3227

Uh oh!

alexcardo Sep 17, 2023

Replies: 1 comment · 6 replies

Uh oh!

staviq Sep 17, 2023

Uh oh!

alexcardo Sep 17, 2023 Author

Uh oh!

Uh oh!

KerfuffleV2 Sep 17, 2023 Collaborator

Uh oh!

alexcardo Sep 17, 2023 Author

Uh oh!

Uh oh!

alexcardo Sep 17, 2023 Author

Uh oh!

KerfuffleV2 Sep 17, 2023 Collaborator

alexcardo
Sep 17, 2023

Replies: 1 comment 6 replies

staviq
Sep 17, 2023

alexcardo Sep 17, 2023
Author

KerfuffleV2 Sep 17, 2023
Collaborator

alexcardo Sep 17, 2023
Author

alexcardo Sep 17, 2023
Author

KerfuffleV2 Sep 17, 2023
Collaborator