Ability to limit context window to keep api costs manageable #2585
myevolve
started this conversation in
Feature Requests
Replies: 1 comment
-
Just to chime in with some specific numbers: At ~300k context a single response costs around $0.85! It's not just the pure context length, it's also because the Gemini 2.5 Pro preview doesn't support input caching like Claude does, so you're paying the full input price for the entire context every request. As a side note, being able to limit the context to, say, 100k would also be great for accuracy. Gemini 2.5, like most long context models, tends to get a bit confused as its context fills up, leading to a lot of failed diff edits, referencing old errors that were fixed 30 minutes ago, etc. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
For example, when using Gemini 2.5 Pro -- due to the 1M context window, the individual api call costs skyrocket eventually due to the large context input that may not be needed. Is there a way we can limit this and when we need to send the entire context we can restore it? Letting it just run this way has already in under a day ran my bill up almost $1K.
Beta Was this translation helpful? Give feedback.
All reactions