High Token Usage and Model Limits: Best Practices & Feature Requests #451

a-chumagin · 2025-07-01T19:05:17Z

a-chumagin
Jul 1, 2025

High Token Usage and Model Limits: Best Practices & Feature Requests

I’m a user of Shippie and I’ve been running into issues with very high token usage during reviews, which has led to hitting model usage limits (especially on Azure OpenAI) and increased costs. I’d like to open a discussion on best practices for reducing token usage and controlling model limits in Shippie, and to share what I’ve already tried.

My Situation

Even for small MRs (e.g., 2 files), Shippie sometimes uses hundreds of thousands of tokens. It tries to get context from the whole project.
On Azure, this has resulted in my account being rate-limited or temporarily blocked due to intensive model usage. (I use a high tier in Azure.)
I want to keep using Shippie, but need to control costs and avoid hitting provider limits. Sometimes I get 500k tokens for one small review.

Efforts I’ve Already Made

Custom Instructions:
I tried using --customInstructions to tell the agent to only use certain tools (e.g., read_diff, thinking, suggest_change, submit_summary) and avoid others. However, this is not strictly enforced and the agent sometimes ignores it.
File Exclusion:
I used the --ignore flag to exclude as many files and directories as possible from the review.
Lowered maxSteps:
I set "shippieMaxSteps": 20 to reduce the number of agentic steps.
Minimized rules/docs:
I tried to reduce the size of project rules and documentation files that are included in the prompt. But this loses one of the advantages of Shippie—it’s cool when you can add custom rules!
Cheaper models:
I used less expensive models (like GPT-4o) for reviews, but then I faced a decrease in review quality.

Despite these efforts, I’ve noticed that:

Sometimes, token usage actually increases for certain MRs, even when fewer files are included. Especially when Shippie starts to duplicate steps.
There is no way to hard-limit the number of tokens or context size per review, or set timeouts between hops.
The agent may still use tools or context that I’d prefer to restrict.

Questions & Feature Requests

Are there any best practices or hidden options for strictly limiting token usage per review?
Is there a way to enforce a hard whitelist/blacklist of tools the agent can use?
Can we have a --maxPromptTokens or similar flag to cap the prompt/context size?
Is there a way to route different steps to different models (e.g., use a cheap model for tool calls, a premium model for final analysis)?
Are there plans for a “diff-only” review mode, or for more granular control over what context is included in the prompt?

I believe these features would help a lot of users who are concerned about cost and provider limits, especially on platforms like Azure OpenAI/OpenAI compability platforms .

Thanks for your work on Shippie! I’d love to hear any advice, and I’m happy to help test or contribute if these features are on the roadmap.

mattzcarey · 2025-07-02T11:17:28Z

mattzcarey
Jul 2, 2025
Maintainer

Hey @a-chumagin thanks so much for this thoughtful discussion. I'll have a think about some of these feature requests and get back to you.

Do you have any public examples of an overuse of context finding I can check out?

Best,
Matt

0 replies

a-chumagin · 2025-07-03T19:21:11Z

a-chumagin
Jul 3, 2025
Author

Hi @mattzcarey ,
Unfortunately, I can’t share any logs or code due to company policy. However, I want to reiterate that it would be extremely helpful if Shippie allowed for more precise control over the agent’s flow. For example:

The ability to set a timeout between agentic/tool calls (to avoid rate limits and excessive API usage)
A way to hard-limit total token usage per review
Support for model routing (using a cheaper model for exploration/tool calls and a premium model for final analysis)

These features would make it much easier to manage costs and stay within provider limits, especially on platforms like Azure OpenAI.
Thanks again for your attention to this!

1 reply

mattzcarey Jul 28, 2025
Maintainer

Hey sorry for leaving you on read. Been super busy in my world. You can artificially limit the tool usage by limiting the MAX_STEPS variable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

High Token Usage and Model Limits: Best Practices & Feature Requests #451

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

High Token Usage and Model Limits: Best Practices & Feature Requests #451

Uh oh!

a-chumagin Jul 1, 2025