Replies: 1 comment 2 replies
-
Using context shift with models that require chat template is tricky because you have to be careful not do destroy the structure of the template. In your tests, you are likely discarding the entire system prompt together with some important control tokens for the chat turns. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am testing context shift with a small context size (160) and facing some issues. I selected 160 because I wanted context shifting to happen during the second conversation (and not the first)
The conversation went as follows:
As you can see from the output,
Great Wall of China
immediately after thePyramids of Giza
seems unusual. This issue does not occur when the context size is larger. One possibility is that the small context size might have caused the loss of African context, the way the sentence is structured still feels odd.To test further, I enabled logs in
main.cpp
and increasedn_predict
to 128.Additionally, I have also noticed that sometime (not always) the next decoded token after context shifting is EOG.
Beta Was this translation helpful? Give feedback.
All reactions