How can I load an in-context learning session and append a file prompt to generate output? #5642

mnuppnau · 2024-02-21T17:56:49Z

mnuppnau
Feb 21, 2024

Hello,

I'm trying to understand how to load a session from disk with in-context learning. I would like to be able to preprocess the in-context learning prompt, persist it to disk, and load the saved session for each future use and append a new prompt (different each time) to generate a response.

I initially thought the purpose of--prompt-cache was for this. I expect to be able to run:

./main -ngl 84 -m models/llama-2-7b.Q4_0.gguf -c 4096 -n 400 -s 42 --temp 0.7 --repeat_penalty 1.1 --prompt-cache prompt.cache.bin -f ./prompts/chat-with-bob.txt

to create a cache of the prompt template for in-context learning. I then expect to be able to run:

./main -ngl 84 -m models/llama-2-7b.Q4_0.gguf -c 4096 -n 400 -s 42 --temp 0.7 --repeat_penalty 1.1 --prompt-cache prompt.cache.bin --prompt-cache-ro -f ./chat/default/new-prompt.txt

where new-prompt.txt contains What is your first name?, which I'd expect the answer to be Bob given the cached prompt and the cached prompt to not be updated because --prompt-cache-ro. However, it seems that I cannot use the file parameter like this with prompt cache.

This is my expectation given:

https://github.com/ggerganov/llama.cpp/discussions/2110
https://github.com/ggerganov/llama.cpp/issues/1398

I then came across:

https://github.com/ggerganov/llama.cpp/pull/1169

However, I do not see a --session parameter available for ./main. I do see llama_save_session_file in main.cpp but I do not see any examples of how to use it for my purposes.

Does anyone have any insights into how I can save an in-context learning template session, and load it for future use while append a text file?

vgabbo · 2024-02-24T17:41:19Z

vgabbo
Feb 24, 2024

I am trying too to understand this. I am studying the main.cpp trying to delete everything I do not need and restart from the basics.
If I get to understand it, I'll come back here and try to answer. As for now, I'm just scraping the deepness of the flow.

As for now, I've found a suggestion here:
https://www.reddit.com/r/LocalLLaMA/comments/19b03o2/using_promptcache_with_llamacpp/

I'm citing SuperMonkeyCollider:

I initially run:

./main -c 32768 -m models/mixtral-8x7b-instruct-v0.1.Q8_0.gguf --prompt-cache context.gguf --keep -1 -f initialPrompt.txt

and then after that is processed, check that context.gguf exists. Then for asking questions, I no longer add questions to the file, but instead run interactively:

./main -c 32768 -m models/mixtral-8x7b-instruct-v0.1.Q8_0.gguf -ins --prompt-cache context.gguf --prompt-cache-ro --keep -1 -f initialPrompt.txt

and this starts quickly and lets me ask questions about the pre-processed initialPrompt.txt file's contents.

0 replies

x4080 · 2024-03-22T02:37:42Z

x4080
Mar 22, 2024

Is there load cache dynamically for server or llama cpp python ? Its very useful for usage case like extracting information based on in context learning, because prompt processing takes time

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I load an in-context learning session and append a file prompt to generate output? #5642

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How can I load an in-context learning session and append a file prompt to generate output? #5642

Uh oh!

Uh oh!

mnuppnau Feb 21, 2024

Replies: 2 comments

Uh oh!

vgabbo Feb 24, 2024

Uh oh!

x4080 Mar 22, 2024

mnuppnau
Feb 21, 2024

vgabbo
Feb 24, 2024

x4080
Mar 22, 2024