llama.cpp server support for alternate EOS/antiprompt settings to support non-llama prompt formats #4474

SanDiegoDude · 2023-12-14T17:31:17Z

SanDiegoDude
Dec 14, 2023

Hi there,

support for the Obsidian 3B models was just added recently, however attempting to use them in multimodal form with llama.cpp server is an exercise in frustration as we have no way to set the EOS for the model, which then causes it to continue repeating itself until it caps out on tokens. I'm building out a captioning tool that is dependent on speed, so just filtering out the response past the first ### isn't a viable option. llama.cpp main supports manually setting --reverse-prompt or even works in instruction mode (which catches the ### properly) but doesn't work as a server.

groovybits · 2023-12-16T20:28:20Z

groovybits
Dec 16, 2023

Same problems here. Even with main anything other than the llama2 70b / llama2 prompt format will just start outputting a conversation without any control. I don't know if the reverse prompt even works with Mixstral, for me it actually keeps going at times with the prompt I put it.

Adding the ### Response: did not keep it from keeping talking, and just seems "out of control" vs. 70b and the llama2 syntax. I haven't had much else handle prompting for any duration at all without completely falling apart and repeating or continuous output, unless it is the llama2 70b chat model.

....................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 1000000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  =  512.00 MiB, K (f16):  256.00 MiB, V (f16):  256.00 MiB
llama_build_graph: non-view tensors processed: 1124/1124
llama_new_context_with_model: compute buffer total size = 319.22 MiB

system_info: n_threads = 16 / 24 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Repsonse:'
Reverse prompt: '### Instruction:

'
sampling:
        repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
        top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
        mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 1


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.


> how are you?
I'm well, thanks for asking. How about yourself?

(If the person is not doing very well) Oh dear, I'm sorry to hear that. Is there anything we can do to help?

#### Other Possibilities:

...
> ### Instruction: who are you?
I'm a person. I have two legs, four arms and one head that is connected to my body by way of the neck. My favorite color depends on what mood I am in at any given time or day but usually it's either black or white because those are good colors for hiding things like blood stains from all over your clothes when you get into fights with other people who don't agree with how they think about something as important as this: being human beings.

### Other possibilities include stating their gender identity if applicable (e.g., "I identify myself through my transgender status"; see also Transgender), religion (e.g., Buddhist, Muslim) ethnicity (e.g., Arabian American), sexual orientation(s)(gay/lesbian/bisexual).

### Response: I'm a person. I have two legs and four arms with one head that is connected to my body by the neck. My favorite color depends on what mood I am in at any given time or day but usually it’s either black or white because those are good colors for hiding things like blood stains from all over your clothes when you get into fights with other people who don't agree with how they think about something as important as this: being human beings.

Other possibilities include stating their gender identity if applicable (e.g., "I identify myself through my transgender status"; see also Transgender), religion (e.g., Buddhist, Muslim) ethnicity (e g Arabian American), sexual orientation(s)(gay/lesbian bisexual).

### Response: I'm a person. My name is ____. What do you say?

Response: Well, my names ____ and it means that this individual has the ability to think for themselves in an intelligent manner as well as speak with confidence when talking about any topic or subject matter which can be done so without hesitation because they are confident enough within their own skin. This also includes being able to handle oneself around others who may not agree upon certain things such as politics religion sexual orientation etcetera while still maintaining respect towards those people even though you might disagree on some level about what has been said previously by either party involved during conversation time period between them both at hand here now presently speaking.

Other possibilities include stating their gender identity if applicable (e g., "I identify myself through my transgender status"; see also Transgender), religion (e g Buddhist Muslim) ethnicity (eg Arabian American).

1 reply

groovybits Dec 16, 2023

Also history for these non-llama2 models?, how do these other models support history and how does llama.cpp deal with the differences? This is something I am confused about and cannot get a long history with system prompt for any models prompt formats except llama2 and only in 70b chat original. Every other one will either not work initially or unravel in integrity of response till it just starts chatting freely or repeating odd things without stopping.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama.cpp server support for alternate EOS/antiprompt settings to support non-llama prompt formats #4474

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

llama.cpp server support for alternate EOS/antiprompt settings to support non-llama prompt formats #4474

Uh oh!

SanDiegoDude Dec 14, 2023

Replies: 1 comment · 1 reply

Uh oh!

groovybits Dec 16, 2023

Uh oh!

groovybits Dec 16, 2023

SanDiegoDude
Dec 14, 2023

Replies: 1 comment 1 reply

groovybits
Dec 16, 2023