Chat templates and llama-server #9741

tesseract241 · 2024-10-04T12:03:19Z

tesseract241
Oct 4, 2024

Hello all, I'm trying to wrap my mind around how to use llama-server.
The chat template wiki page says

Currently, it's not possible to use your own chat template with llama.cpp server's /chat/completions
One of the possible solutions is use /completions endpoint instead, and write your own code (for example, using python) to apply a template before passing the final prompt to /completions

I'm not sure how to read this: are chat templates embedded into a gguf file used if present?
Let's assume the model has no chat template and I want to provide my own, should I still separate the system prompt (wrapped in its special tokens) from the user prompt (also wrapped in its special tokens and ending with the token that starts the AI reply) or just put everything in the user prompt instead?

jurov · 2024-10-04T20:28:44Z

jurov
Oct 4, 2024

It's not separate, there's just one prompt for everything. Put the system one first, then append user prompt, then append the conversation.

There's some prompt information in gguf files and it is shown in console output on startup but don't know if it can be used automatically. I just manually copied it.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chat templates and llama-server #9741

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Chat templates and llama-server #9741

Uh oh!

Uh oh!

tesseract241 Oct 4, 2024

Replies: 1 comment

Uh oh!

Uh oh!

jurov Oct 4, 2024

tesseract241
Oct 4, 2024

jurov
Oct 4, 2024