Add the ability to limit context length for any model with a similar setting, such as for OpenAI-compatible models. #3703
explaindio
started this conversation in
Feature Requests
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Add the ability to limit context length for any model with a similar setting, such as for OpenAI-compatible models.
Reasons and Why This Is Extremely Useful:
Sometimes, the model context reported by the provider is inaccurate. This can cause issues when the actual context limit is exceeded, leading to malfunction—especially if the router (e.g., Roo) does not properly enforce the real context limit. The result is often endless output repetition or interrupted coding.
Example:
The free "Maveric" model on OpenRouter (by Chute AI) is listed as supporting a 256k context. However, based on tests and direct feedback from Chute AI, the real context limit is 128k. When switching to the same model directly through Chute AI with the context correctly set to 128k, the issue is resolved.
Solution:
Allowing a user-defined maximum context length would prevent these issues entirely.
This results in the same problem as described above. If a model’s context window exceeds the provider’s token rate limit, it can lead to repeated outputs or broken sessions.
Example:
The free experimental Gemini 2.5 Pro has a token-per-minute limit of 250k, but the model supports a 1 million token context. When more than 250k tokens are sent in context, the model enters a loop or stalls.
Solution:
Setting a custom context limit of 250k for this model would prevent this issue and significantly improve the user experience.
Excessive context size often adds little value but significantly increases cost.
Example:
In the paid Gemini 2.5 Pro Preview, output quality decreases once the context exceeds 500k tokens. Unless there's a specific need for more, setting the context limit to 500k can reduce costs without sacrificing quality.
Solution:
By enforcing a context cap (e.g., 500k), tools like RooCode could automatically cut unnecessary tokens, reducing usage costs by 20–40% for longer tasks.
Beta Was this translation helpful? Give feedback.
All reactions