Skip to content

feat: add prompt caching support for LiteLLM (#5791) #6074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

MuriloFP
Copy link
Collaborator

@MuriloFP MuriloFP commented Jul 22, 2025

Related GitHub Issue

Closes: #5791

Roo Code Task Context (Optional)

No Roo Code task context for this PR

Description

This PR implements prompt caching support for LiteLLM, allowing users to benefit from reduced costs and improved response times when using models that support prompt caching (like Claude 3.7).

Key implementation details:

  • Added litellmUsePromptCache boolean option to provider settings schema
  • Modified the LiteLLM handler to apply cache control headers to system messages and the last two user messages when the feature is enabled
  • Enhanced usage tracking to properly capture cache read/write tokens from LiteLLM responses, including alternative field names
  • Added UI checkbox that only appears when the selected model supports prompt caching
  • Reused existing translation keys (enablePromptCaching and enablePromptCachingTitle) to maintain consistency across all supported languages

Design choices:

  • Followed the same caching pattern as the Anthropic provider (caching system prompt + last 2 user messages)
  • Made the feature opt-in via a checkbox to give users control over when caching is used
  • Kept the test focused on the critical functionality to follow project patterns

Test Procedure

Automated Testing:

  • Added unit test in src/api/providers/__tests__/lite-llm.spec.ts that verifies:
    • Cache control headers are properly added when litellmUsePromptCache is enabled
    • Cache tokens are correctly tracked in usage data
    • The feature respects the model's prompt caching support capability

Manual Testing Steps:

  1. Configure LiteLLM as your provider with a model that supports prompt caching (e.g., Claude 3.7)
  2. Navigate to Settings > Providers > LiteLLM
  3. Verify the "Enable prompt caching" checkbox appears
  4. Enable the checkbox and save settings
  5. Start a conversation and monitor the LiteLLM logs/dashboard
  6. Verify that cache hits/misses are being recorded
  7. Check that the usage tracking in Roo Code shows cache read/write tokens

Test Command:

cd src && npx vitest run api/providers/__tests__/lite-llm.spec.ts

Pre-Submission Checklist

  • Issue Linked: This PR is linked to an approved GitHub Issue (see "Related GitHub Issue" above).
  • Scope: My changes are focused on the linked issue (one major feature/fix per PR).
  • Self-Review: I have performed a thorough self-review of my code.
  • Testing: New and/or updated tests have been added to cover my changes (if applicable).
  • Documentation Impact: I have considered if my changes require documentation updates (see "Documentation Updates" section below).
  • Contribution Guidelines: I have read and agree to the Contributor Guidelines.

Screenshots / Videos

Before: The LiteLLM settings page shows only Base URL, API Key, and Model selection.

After: When a model that supports prompt caching is selected, an additional "Enable prompt caching" checkbox appears with a description.

Note: The checkbox only appears for models that have supportsPromptCache: true in their model info.

Documentation Updates

  • No documentation updates are required.

The feature is self-explanatory through the UI, using existing translation keys that are already documented.

Additional Notes

This implementation follows the same approach as the referenced Cline commit but adapts it to RooCode's architecture. The main difference is that we reuse existing translation keys instead of creating new ones, which ensures all languages are supported without additional translation work.

Get in Touch

@MuriloFP


Important

Adds prompt caching support for LiteLLM, including schema updates, handler modifications, UI changes, and tests.

  • Behavior:
    • Adds litellmUsePromptCache boolean to provider settings schema in provider-settings.ts.
    • Modifies LiteLLMHandler in lite-llm.ts to add cache control headers to system and last two user messages if caching is enabled.
    • Tracks cache read/write tokens in LiteLLMHandler.
  • UI:
    • Adds a checkbox for enabling prompt caching in LiteLLM.tsx, visible only for models supporting caching.
  • Testing:
    • Adds unit test in lite-llm.spec.ts to verify cache control headers and token tracking when caching is enabled.

This description was created by Ellipsis for d460f43. You can customize this summary. It will automatically update as commits are pushed.

- Add litellmUsePromptCache configuration option to provider settings
- Implement cache control headers in LiteLLM handler when enabled
- Add UI checkbox for enabling prompt caching (only shown for supported models)
- Track cache read/write tokens in usage data
- Add comprehensive test for prompt caching functionality
- Reuse existing translation keys for consistency across languages

This allows LiteLLM users to benefit from prompt caching with supported models
like Claude 3.7, reducing costs and improving response times.
@MuriloFP MuriloFP requested review from mrubens, cte and jr as code owners July 22, 2025 18:51
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. enhancement New feature or request labels Jul 22, 2025

expect(createCall.messages[lastUserIdx]).toMatchObject({
cache_control: { type: "ephemeral" },
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding an assertion for the second last user message as well, to fully verify that cache control headers are applied to both the last two user messages.

@hannesrudolph hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. size:L This PR changes 100-499 lines, ignoring generated files.
Projects
Status: Triage
Development

Successfully merging this pull request may close these issues.

Support prompt caching for LiteLLM
2 participants