KV Cache and Context Shifting #8973
Unanswered
hentaitaku
asked this question in
Q&A
Replies: 1 comment
-
It is possible to do so, as we discussed in #5793 But it will be messy to implement for now, so I think we will do it later on. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello i try to use a history in python to store my last context for my prompt and when the list gets to big delete all but leave the last 20 entries.
That works great when i use the first 20 entries but after 21 entries and first one gets deleted (Context Shifting).
The KV Cache will eval the complete prompt again.
How can i remove from prompt the first user/assistant entrie and get KV cache working with the Context Shift.
Or is there a way to remove context from prompt after the context leaves the max context size.
Dont like to have my prompt to big so when can i remove context from it and dont break kv cache.
Here some codes i use.
`
"app/llama-server.exe" -t 12 -c 1024 -ngl 33 -m "models/vntl-llama3-8b-hf-q4_k_m.gguf" --port 8003
kccp_json = {
"prompt": prompt,
"temperature": 0.0,
"dynatemp_range": 0.0,
"dynatemp_exponent": 1.0,
"top_k": 100,
"top_p": 0.92,
"min_p": 0.00,
"n_predict": 50,
"n_keep": 1,
"stream": True,
"stop": [],
"tfs_z": 1.0,
"typical_p": 1.0,
"repeat_penalty": 1.04,
"repeat_last_n": 64,
"penalize_nl": True,
"presence_penalty": 0.0,
"frequency_penalty": 0.0,
"mirostat": 0,
"seed": -1,
"ignore_eos": False,
"logit_bias": [],
"n_probs": 0,
"min_keep": 0,
"cache_prompt": True,
"system_prompt": system_prompt,
"samplers": ["top_k", "tfs_z", "typical_p", "top_p", "min_p", "temperature"]
}
[{'role': 'user', 'content': 'そしたらなんか人がめちゃくちゃいっぱいで座れないんです。\r\n'}, {'role': 'assistant', 'content': "There are so many people here that I can't find a seat."}, {'role': 'user', 'content': 'で、よく見たらなんか垂れ幕下がってて、150円引き、とか書いてあるんです。\r\n'}, {'role': 'assistant', 'content': 'Then I look down and see a curtain hanging over it, with a sign saying "150 yen off."'}, {'role': 'user', 'content': 'もうね、アホかと。馬鹿かと。\r\n'}, {'role': 'assistant', 'content': 'I mean, seriously. What the hell?'}, {'role': 'user', 'content': 'お前らな、150円引き如きで普段来てない吉野家に来てんじゃねーよ、ボケが。\r\n'}, {'role': 'assistant', 'content': "Hey, you two, don't come to Yoshinoya just because they're offering 150 yen off."}, {'role': 'user', 'content': 'なんか親子連れとかもいるし。一 家4人で吉野家か。おめでてーな。\r\n'}, {'role': 'assistant', 'content': "There are even families here. Four people in a family, eating at Yoshinoya. That's great."}, {'role': 'user', 'content': 'よーしパパ特盛頼んじゃうぞー、とか言ってるの 。もう見てらんない。\r\n'}, {'role': 'assistant', 'content': '"Okay, I'll order the special." "I can't take it anymore."'}, {'role': 'user', 'content': 'お前らな、150円やるからその席空けろと。\r\n'}, {'role': 'assistant', 'content': "They said they'd give us 150 yen if we moved out of the way."}, {'role': 'user', 'content': '吉野家ってのはな、もっと殺伐と してるべきなんだよ。\r\n'}, {'role': 'assistant', 'content': 'Yoshinoya should be more violent.'}, {'role': 'user', 'content': 'Uの字テーブルの向かいに座った奴といつ喧嘩が始まってもおかしくない、\r\n'}, {'role': 'assistant', 'content': "I sit down across from him at a U-shaped table, and I'm ready to start a fight with him at any moment."}, {'role': 'user', 'content': '刺すか刺されるか、そんな雰囲気がいいんじゃねーか。女子供は、すっこんでろ。\r\n'}, {'role': 'assistant', 'content': "I'm ready to stab him or be stabbed. That's the way it should be. The kids and women should stay out of it."}]
`
Beta Was this translation helpful? Give feedback.
All reactions