Ability to limit context window to keep api costs manageable #2585

myevolve · 2025-04-14T02:53:11Z

myevolve
Apr 14, 2025

For example, when using Gemini 2.5 Pro -- due to the 1M context window, the individual api call costs skyrocket eventually due to the large context input that may not be needed. Is there a way we can limit this and when we need to send the entire context we can restore it? Letting it just run this way has already in under a day ran my bill up almost $1K.

jhouchin · 2025-04-16T16:52:14Z

jhouchin
Apr 16, 2025

Just to chime in with some specific numbers: At ~300k context a single response costs around $0.85! It's not just the pure context length, it's also because the Gemini 2.5 Pro preview doesn't support input caching like Claude does, so you're paying the full input price for the entire context every request.

As a side note, being able to limit the context to, say, 100k would also be great for accuracy. Gemini 2.5, like most long context models, tends to get a bit confused as its context fills up, leading to a lot of failed diff edits, referencing old errors that were fixed 30 minutes ago, etc.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ability to limit context window to keep api costs manageable #2585

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Ability to limit context window to keep api costs manageable #2585

Uh oh!

myevolve Apr 14, 2025

Replies: 1 comment

Uh oh!

jhouchin Apr 16, 2025

myevolve
Apr 14, 2025

jhouchin
Apr 16, 2025