Skip to content

Feature request: Cache control support to BedrockLLMAgent, AgentTools #382

@manikandan-m2

Description

@manikandan-m2

Use case

Summary
Introduce a prompt caching option that allows caching of the static portion of prompts, documents, and queries to optimize performance, reduce latency, and minimize costs.

Motivation
When building applications that rely on large prompt templates, many parts of the prompt (system instructions, reference documents, metadata, etc.) remain static across multiple requests. Currently, these repeated static tokens are re-sent and re-processed for every query, which:

Increases latency due to redundant processing.
Leads to higher costs since repeated tokens contribute to billable usage.
Adds unnecessary overhead when only the user’s dynamic query changes.

Solution/User Experience

Proposed Solution

  • Provide an opt-in cache_control flag (or similar) in the API to enable prompt caching.

  • Allow cache checkpoints to be placed in the prompt: these mark the end of the static portion (the prefix) that can be cached.

  • Only cache the prefix if it meets a minimum token count requirement.


Key Details

Component | Description -- | -- Static portion | System messages, reference documents, instructions—unchanging across many queries. Dynamic portion | User query, changing context; comes after the checkpoint. Cache checkpoint | Marker in prompt that demarcates end of static content. Token thresholds | Minimum tokens required in prefix; limit to number of cache checkpoints per request. TTL / expiration | Cached prefix expires if not reused in a set time window. Hit / miss behavior | On new request, if prefix matches cached one → cache hit (reuse processing), else miss.

API Sketch

{ "model": "...", "use_prompt_cache": true, "cache_checkpoints": [ { "location": "system", "after_message_index": 0 } ], "messages": [ { "role": "system", "content": "Static instructions..." }, { "role": "system", "content": { "text": "Long document content...", "cache_control": { "type": "checkpoint" } } }, { "role": "user", "content": "User query here" } ] }

Benefits

  • Lower latency (skip re-processing static content)

  • Reduced cost (fewer input tokens billable)

  • Better performance for use-cases with big or repeated context (docs, few-shot examples, system prompts)

Alternative solutions

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions