Add the ability to limit context length for any model with a similar setting, such as for OpenAI-compatible models. #3703

explaindio · 2025-05-17T19:41:53Z

explaindio
May 17, 2025

Add the ability to limit context length for any model with a similar setting, such as for OpenAI-compatible models.

Reasons and Why This Is Extremely Useful:

When the model's context is incorrectly set by the provider

Sometimes, the model context reported by the provider is inaccurate. This can cause issues when the actual context limit is exceeded, leading to malfunction—especially if the router (e.g., Roo) does not properly enforce the real context limit. The result is often endless output repetition or interrupted coding.

Example:
The free "Maveric" model on OpenRouter (by Chute AI) is listed as supporting a 256k context. However, based on tests and direct feedback from Chute AI, the real context limit is 128k. When switching to the same model directly through Chute AI with the context correctly set to 128k, the issue is resolved.

Solution:
Allowing a user-defined maximum context length would prevent these issues entirely.

When the token-per-minute limit is smaller than the model’s context window

This results in the same problem as described above. If a model’s context window exceeds the provider’s token rate limit, it can lead to repeated outputs or broken sessions.

Example:
The free experimental Gemini 2.5 Pro has a token-per-minute limit of 250k, but the model supports a 1 million token context. When more than 250k tokens are sent in context, the model enters a loop or stalls.

Solution:
Setting a custom context limit of 250k for this model would prevent this issue and significantly improve the user experience.

Cost savings

Excessive context size often adds little value but significantly increases cost.

Example:
In the paid Gemini 2.5 Pro Preview, output quality decreases once the context exceeds 500k tokens. Unless there's a specific need for more, setting the context limit to 500k can reduce costs without sacrificing quality.

Solution:
By enforcing a context cap (e.g., 500k), tools like RooCode could automatically cut unnecessary tokens, reducing usage costs by 20–40% for longer tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the ability to limit context length for any model with a similar setting, such as for OpenAI-compatible models. #3703

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Add the ability to limit context length for any model with a similar setting, such as for OpenAI-compatible models. #3703

Uh oh!

explaindio May 17, 2025

Replies: 0 comments

explaindio
May 17, 2025