Skip to content

feat: implement smart exponential backoff with rate limit headers for OpenAI embedder #6068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

daniel-lxs
Copy link
Collaborator

Summary

This PR implements intelligent rate limit handling for the OpenAI embedder by utilizing response headers to calculate optimal retry delays. The implementation ensures no embedding batches are lost due to rate limiting while minimizing unnecessary wait times.

Problem

Previously, the OpenAI embedder used a simple exponential backoff strategy with a fixed retry limit (3 attempts). This approach had several issues:

  • Batches could be lost if rate limits persisted beyond 3 retries
  • Fixed exponential backoff didn't consider actual rate limit reset times
  • Multiple concurrent batches could hit the API simultaneously when rate limited
  • Excessive retry logs flooded stderr

Solution

1. Smart Backoff Using Rate Limit Headers

  • Extracts OpenAI's rate limit headers:
    • x-ratelimit-limit-requests / x-ratelimit-limit-tokens
    • x-ratelimit-remaining-requests / x-ratelimit-remaining-tokens
    • x-ratelimit-reset-requests / x-ratelimit-reset-tokens
  • Calculates optimal wait time based on the maximum of request and token reset times
  • Adds 10% buffer to account for clock differences
  • Falls back to exponential backoff when headers are unavailable

2. Infinite Retries for Rate Limits

  • HTTP 429 (rate limit) errors now retry indefinitely
  • Ensures no embedding batches are lost
  • Other errors (401, 500, etc.) fail immediately without retries

3. Global Rate Limit Coordination

  • Implemented mutex-based coordination using async-mutex
  • Prevents multiple concurrent batches from hitting the API when rate limited
  • All batches wait for the global rate limit to clear before proceeding
  • Thread-safe access to rate limit state

4. Reduced Logging

  • Only logs rate limit warnings on first retry
  • Silent waiting during global rate limit periods
  • Prevents stderr flooding

Code Changes

Modified Files:

  • src/services/code-index/embedders/openai.ts

    • Added rate limit header extraction
    • Implemented smart backoff calculation
    • Added global rate limit state with mutex
    • Modified retry logic for infinite retries on 429
    • Reduced logging frequency
  • src/services/code-index/embedders/__tests__/openai.spec.ts

    • Updated tests for new retry behavior
    • Added tests for smart backoff calculation
    • Added tests for rate limit header parsing
    • Reset global state in beforeEach

Testing

All existing tests pass with the following updates:

  • Rate limit errors are tested to retry indefinitely
  • Smart backoff calculation is tested with various header formats
  • Non-rate-limit errors are tested to fail immediately
  • Global rate limit coordination is implicitly tested

Example Log Output

Before:

Rate limit hit, retrying in 500ms (attempt 1/3)
Rate limit hit, retrying in 1000ms (attempt 2/3)
Rate limit hit, retrying in 2000ms (attempt 3/3)
[DirectoryScanner] Error processing batch: Failed to create embeddings after 3 attempts

After:

Rate limit hit, retrying in 33000ms (attempt 1/∞)
Rate limits - Requests: 0/60, Tokens: 0/150000
[Silent waiting for subsequent retries]

Benefits

  1. No Data Loss: Infinite retries ensure all batches are eventually processed
  2. Optimal Throughput: Smart backoff minimizes wait times based on actual rate limits
  3. Better Resource Usage: Mutex prevents thundering herd problem
  4. Cleaner Logs: Reduced logging prevents stderr flooding
  5. Consistent Behavior: Aligns with openai-compatible embedder implementation

Breaking Changes

None. The API remains the same, only the internal retry behavior has changed.

Related Issues

  • Fixes rate limit handling issues where batches were lost after 3 retries
  • Addresses log flooding during rate limiting periods
  • Implements feature parity with openai-compatible embedder

Checklist

  • Code follows project style guidelines
  • Tests have been added/updated
  • All tests pass
  • No breaking changes to public API
  • Documentation has been updated (inline comments)

… OpenAI embedder

- Extract rate limit headers from OpenAI API responses
- Calculate optimal wait times based on reset headers
- Implement infinite retries for rate limit errors (HTTP 429)
- Add mutex-based global rate limit coordination
- Reduce logging to prevent stderr flooding
- Update tests for new retry behavior
@daniel-lxs daniel-lxs moved this from Triage to PR [Needs Prelim Review] in Roo Code Roadmap Jul 22, 2025
@daniel-lxs daniel-lxs moved this from PR [Needs Prelim Review] to PR [Draft / In Progress] in Roo Code Roadmap Jul 22, 2025
Copy link

delve-auditor bot commented Jul 22, 2025

No security or compliance issues detected. Reviewed everything up to 2d15f06.

Security Overview
  • 🔎 Scanned files: 2 changed file(s)
Detected Code Changes
Change Type Relevant files
Enhancement ► openai.ts
    Implement smart exponential backoff with rate limit headers
    Add mutex-based global rate limit coordination
    Reduce logging for rate limits
► openai.spec.ts
    Update tests for new retry behavior
Bug Fix ► check-services.sh
    Update service start command references
► package.json
    Fix evals command
► new-run.tsx
    Add missing unit to timeout display
Configuration changes ► package.json
    Update service commands
► migrations/*
    Remove migration files

Reply to this PR with @delve-auditor followed by a description of what change you want and we'll auto-submit a change to this PR to implement it.

@daniel-lxs
Copy link
Collaborator Author

@roomote-agent use switch_mode to switch to pr-fixer mode and fix the unit tests on this PR, do not create a new PR.

Copy link

roomote bot commented Jul 22, 2025

Hi @daniel-lxs! I see your request to fix the unit tests on this PR. I notice the platform-unit-test checks are failing on both Ubuntu and Windows. Let me check out the branch and investigate the failing tests.

- Switch to real timers for non-rate-limit error retry tests
- Prevents async promise rejections from being detected as unhandled
- Fixes failing CI tests on Ubuntu and Windows platforms
Copy link

roomote bot commented Jul 22, 2025

Hi @daniel-lxs! Ive successfully fixed the unit tests on this PR.

Summary of changes:

  • Fixed unhandled promise rejections in the OpenAI embedder tests
  • Modified two test cases that were testing non-rate-limit error retries to use real timers instead of fake timers
  • This prevents async promise rejections from being detected as unhandled by Vitest

Results:

✅ All CI checks are now passing
✅ platform-unit-test (ubuntu-latest) - PASS
✅ platform-unit-test (windows-latest) - PASS

The fix was minimal and only affected the test implementation, not the actual embedder logic. The tests still verify the same behavior but now handle the async nature of the retries properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: PR [Draft / In Progress]
Development

Successfully merging this pull request may close these issues.

3 participants