Reduce Token Usage of System Prompt #2935
Replies: 2 comments
-
Beta Was this translation helpful? Give feedback.
-
Following observations by GosuCoder in his video, it seems prudent to optimize token usage at every stage. I have devised a methodology to do this that reduces system prompt footprint by up to 93.5% in my tests, with no observed side-effects, for example, the current total system prompt sent to the LLM is over 10,000 tokens and this can be reduced to less than 1000 tokens by having the target LLM optimise/distil the system prompt for its own use. This has a huge effect on reducing the required size of the context window and I have used this to great effect to run local models on my limited 6GB VRAM GPU, where non-optimised system prompts placed context window requirements too large for me to be able to implement a local model. I have explained the process in great detail here. In my limited tests I was able to create a sample python application just as effectively as without the distillation, and with lower token usage. I suspect it won't work on every model, but you wouldn't want to code with the less-capable models anyway. This could be an optional experimental feature whereby models could be selected "all"/"opt-in"/"opt-out". The entire system prompt should be re-distilled every time a change is made to the user's custom instructions, and distilled every time a new model is used. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I often use DeepSeek-V3 as my AI model due to its extremely low cost. However, its API only supports a 64K context window, so I have to be very mindful of token usage.
While using Roo (in code mode only), I noticed that the following tools are never invoked in my workflow:
switch_mode
new_task
fetch_instructions
So, I copied the system prompt and manually removed all references to these tools. I also removed
mode
information andask_followup_question
(as I prefer the AI not to interrupt tasks by asking questions). According to OpenAI's token calculator, this reduced the prompt size by approximately 1000 tokens.Still, the prompt felt too large. I then fed the prompt into Gemini 2.5 Pro Exp and asked it to rewrite it using more concise language, in both English and Chinese versions. This brought the total token count down to under 4000, with the lowest being around 3000. I tested one of the optimized Chinese prompts by asking for the same "Hello World" HTML page, and it produced an identical result compared to the original prompt, but with half the token cost.
I'll continue testing these new prompts for a while. Meanwhile, it would be great if Roo could offer settings to disable unused tools like the ones mentioned above — this would help reduce unnecessary token usage in the system prompt.
Additionally, I noticed that the insert_content and search_and_replace tools mix XML and JSON formats in their usage. This feels a bit inelegant and, intuitively, it seems more error-prone and could increase the cognitive load for the model to interpret the instructions correctly.
Beta Was this translation helpful? Give feedback.
All reactions