Feature request: Cache control support to BedrockLLMAgent, AgentTools

### Use case

Summary
Introduce a prompt caching option that allows caching of the static portion of prompts, documents, and queries to optimize performance, reduce latency, and minimize costs.

Motivation
When building applications that rely on large prompt templates, many parts of the prompt (system instructions, reference documents, metadata, etc.) remain static across multiple requests. Currently, these repeated static tokens are re-sent and re-processed for every query, which:

Increases latency due to redundant processing.
Leads to higher costs since repeated tokens contribute to billable usage.
Adds unnecessary overhead when only the user’s dynamic query changes.

### Solution/User Experience

<html>
<body>
<h3 data-start="339" data-end="362">Proposed Solution</h3>
<ul data-start="364" data-end="669">
<li data-start="364" data-end="458">
Provide an opt-in cache_control flag (or similar) in the API to enable prompt caching.
</li>
<li data-start="459" data-end="592">
Allow cache checkpoints to be placed in the prompt: these mark the end of the static portion (the prefix) that can be cached.
</li>
<li data-start="593" data-end="669">
Only cache the prefix if it meets a minimum token count requirement.
</li>
</ul>
<hr data-start="671" data-end="674">
<h3 data-start="676" data-end="693">Key Details</h3>
<div class="_tableContainer_1rjym_1"><div tabindex="-1" class="group w-fit _tableWrapper_1rjym_13 flex flex-col-reverse">
Component | Description
-- | --
Static portion | System messages, reference documents, instructions—unchanging across many queries.
Dynamic portion | User query, changing context; comes after the checkpoint.
Cache checkpoint | Marker in prompt that demarcates end of static content.
Token thresholds | Minimum tokens required in prefix; limit to number of cache checkpoints per request.
TTL / expiration | Cached prefix expires if not reused in a set time window.
Hit / miss behavior | On new request, if prefix matches cached one → cache hit (reuse processing), else miss.

</div></div>
<hr data-start="1324" data-end="1327">
<h3 data-start="1329" data-end="1345">API Sketch</h3>
<pre class="overflow-visible!" data-start="1347" data-end="1853"><div class="contain-inline-size rounded-2xl relative bg-token-sidebar-surface-primary"><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-json">{
 "model": "...",
 "use_prompt_cache": true,
 "cache_checkpoints": [
 {
 "location": "system",
 "after_message_index": 0
 }
 ],
 "messages": [
 {
 "role": "system",
 "content": "Static instructions..."
 },
 {
 "role": "system",
 "content": {
 "text": "Long document content...",
 "cache_control": {
 "type": "checkpoint"
 }
 }
 },
 {
 "role": "user",
 "content": "User query here"
 }
 ]
}
</code></div></div></pre>
<hr data-start="1855" data-end="1858">
<h3 data-start="1860" data-end="1874">Benefits</h3>
<ul data-start="1876" data-end="2082">
<li data-start="1876" data-end="1929">
Lower latency (skip re-processing static content)
</li>
<li data-start="1930" data-end="1976">
Reduced cost (fewer input tokens billable)
</li>
<li data-start="1977" data-end="2082">
Better performance for use-cases with big or repeated context (docs, few-shot examples, system prompts)</li></ul>
</body>
</html>

### Alternative solutions

```markdown

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: Cache control support to BedrockLLMAgent, AgentTools #382

Use case

Solution/User Experience

Proposed Solution

Key Details

API Sketch

Benefits

Alternative solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature request: Cache control support to BedrockLLMAgent, AgentTools #382

Description

Use case

Solution/User Experience

Proposed Solution

Key Details

API Sketch

Benefits

Alternative solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions