Reduce Token Usage of System Prompt #2935

System233 · 2025-04-25T05:24:48Z

System233
Apr 25, 2025

I often use DeepSeek-V3 as my AI model due to its extremely low cost. However, its API only supports a 64K context window, so I have to be very mindful of token usage.

While using Roo (in code mode only), I noticed that the following tools are never invoked in my workflow:

switch_mode
new_task
fetch_instructions

So, I copied the system prompt and manually removed all references to these tools. I also removed mode information and ask_followup_question (as I prefer the AI not to interrupt tasks by asking questions). According to OpenAI's token calculator, this reduced the prompt size by approximately 1000 tokens.

original	removed tools

Still, the prompt felt too large. I then fed the prompt into Gemini 2.5 Pro Exp and asked it to rewrite it using more concise language, in both English and Chinese versions. This brought the total token count down to under 4000, with the lowest being around 3000. I tested one of the optimized Chinese prompts by asking for the same "Hello World" HTML page, and it produced an identical result compared to the original prompt, but with half the token cost.

translated	simplified

I'll continue testing these new prompts for a while. Meanwhile, it would be great if Roo could offer settings to disable unused tools like the ones mentioned above — this would help reduce unnecessary token usage in the system prompt.

Additionally, I noticed that the insert_content and search_and_replace tools mix XML and JSON formats in their usage. This feels a bit inelegant and, intuitively, it seems more error-prone and could increase the cognitive load for the model to interpret the instructions correctly.

samhvw8 · 2025-04-25T09:27:15Z

samhvw8
Apr 25, 2025
Collaborator

yeah old version of insert_content and search_and_replace has a lot of bug but new version of it remove json from syntax and it better now

for Chinese version of system prompt can.nuri already test that

btw do you use discord ?

0 replies

fezzzza · 2025-05-01T14:18:27Z

fezzzza
May 1, 2025

Following observations by GosuCoder in his video, it seems prudent to optimize token usage at every stage. I have devised a methodology to do this that reduces system prompt footprint by up to 93.5% in my tests, with no observed side-effects, for example, the current total system prompt sent to the LLM is over 10,000 tokens and this can be reduced to less than 1000 tokens by having the target LLM optimise/distil the system prompt for its own use. This has a huge effect on reducing the required size of the context window and I have used this to great effect to run local models on my limited 6GB VRAM GPU, where non-optimised system prompts placed context window requirements too large for me to be able to implement a local model.

I have explained the process in great detail here.

In my limited tests I was able to create a sample python application just as effectively as without the distillation, and with lower token usage. I suspect it won't work on every model, but you wouldn't want to code with the less-capable models anyway.

This could be an optional experimental feature whereby models could be selected "all"/"opt-in"/"opt-out". The entire system prompt should be re-distilled every time a change is made to the user's custom instructions, and distilled every time a new model is used.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce Token Usage of System Prompt #2935

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Reduce Token Usage of System Prompt #2935

Uh oh!

System233 Apr 25, 2025

Replies: 2 comments

Uh oh!

samhvw8 Apr 25, 2025 Collaborator

Uh oh!

Uh oh!

fezzzza May 1, 2025

System233
Apr 25, 2025

samhvw8
Apr 25, 2025
Collaborator

fezzzza
May 1, 2025