You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assistant: Anthropic prompt caching extension API (#8336)
This PR makes it possible for extensions to manually define cache
breakpoints everywhere that's supported by Anthropic, except tool
definitions (although tools will often be cached via system prompt cache
breakpoints). Addresses #8325;
This PR also moves the user context message from _before_ the user query
to _after_ for better prompt caching. @wch mentioned that he noticed no
changes to model responses when experimenting with the context/query
order, but we should double-check.
I cherry-picked an upstream commit to bring in updates to
`LanguageModelDataPart` so that we can implement this in the same way as
the Copilot extension. That gives us the added benefit that when the
`LanguageModelDataPart` API proposal is accepted, extensions like
`shiny-vscode` will be able to set cache breakpoints for Anthropic
models contributed by both the Copilot extension and Positron Assistant.
### Release Notes
#### New Features
- Extensions can set Anthropic prompt cache breakpoints in the message
history (#8325).
#### Bug Fixes
- N/A
### QA Notes
Since this PR also moves the user context message from _before_ the user
query to _after_ for better prompt caching, we should also double-check
that the quality of responses is roughly the same.
In the cases below, if caching is working, you should see logs
indicating cache writes followed by cache reads, for example:
```
2025-06-27 18:58:33.965 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0
2025-06-27 18:58:40.010 [debug] [anthropic] SEND messages.stream [req_011CQZ5f35yWxARapzaM565V]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2
2025-06-27 18:58:41.896 [debug] [anthropic] RECV messages.stream [req_011CQZ5f35yWxARapzaM565V]: usage: {"input_tokens":4,"cache_creation_input_tokens":45353,"cache_read_input_tokens":0,"output_tokens":74,"service_tier":"standard"}
2025-06-27 18:59:05.508 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0
2025-06-27 18:59:07.680 [debug] [anthropic] SEND messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2
2025-06-27 18:59:14.208 [debug] [anthropic] RECV messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: usage: {"input_tokens":4,"cache_creation_input_tokens":0,"cache_read_input_tokens":45353,"output_tokens":289,"service_tier":"standard"}
```
Step-by-step instructions:
1. Positron Assistant participants cache write/read the last 2 user
messages when using Anthropic models
2. Positron Assistant participants should behave as before for Vercel
models (e.g. after disabling `positron.assistant.useAnthropicSdk` and
restarting)
3. Requires a bit more setup to test the Shiny extension in Positron:
1. Start a Positron dev instance at branch
`feature/anthropic-cache-messages`
2. In Positron, open the Shiny extension repo at this branch:
posit-dev/shiny-vscode#94. Open the
`src/extension.ts` file and press F5 to start debugging
3. Try the `@shiny` participant in the Positron Assistant chat pane,
with Anthropic and Vercel models
4. Similarly, the Shiny extension can be tested in VSCode by following
the same steps as above in VSCode. There will be no caching but nothing
should break.
@:assistant
0 commit comments