Dynamic System Prompt #10006
-
I am using the CLI version of llamacpp right now in conversation mode. My assistant's system prompt is supposed to change over time (it will have access to additional knowledge after a while or have entirely different personality). To my understanding so far, i should be able to change the System Prompt using the llama3 template. Somehow the model seems to ignore a new system prompt in some cases. I am guessing it has to do with the context and that it still "remembers" the previous instruction. Model: Example Input message:
e.g. in my next turn, i want to change the personality to someone else (for example a funny clown) the Input could look as follows:
In some cases it works, but in some cases it doesn't and the model is relating to the previous conversation snippets. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Injecting chat template tokens through the user input is likely not going to work as you expect since the structure of the template will be destroyed. There is no easy way to do what you want with ./llama-server -m models/llama-3.1-8b-instruct/ggml-model-q8_0.gguf -c 2048 -ngl 18 -fa -mg 1 --port 8012 curl -s \
--request POST --url http://127.0.0.1:8012/v1/chat/completions \
--header "Content-Type: application/json" \
--data '{"messages": [ { "role": "system", "content": "End each sentence with a smiley emoji." }, { "role": "user", "content": "Hello, how are you today?" }, { "role": "assistant", "content": "I am functioning properly and ready to assist you, thanks for asking! 😊" }, { "role": "system", "content": "End each sentence with a party emoji." }, { "role": "user", "content": "Nice to meet you, my name is Georgi." } ], "cache_prompt": true, "top_k": 1}' | jq
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "It's great to meet you too, Georgi, I'm happy to chat with you 🎉",
"role": "assistant"
}
}
],
...
} Note you will need to keep appending the assistant and user messages to each new request and be careful to not overrun the context. In the latter situation, you can start evicting old messages and consider using the new |
Beta Was this translation helpful? Give feedback.
Injecting chat template tokens through the user input is likely not going to work as you expect since the structure of the template will be destroyed. There is no easy way to do what you want with
llama-cli
. My recommendation is to use thellama-server
instead: