Skip to content

Commit 03ae7a5

Browse files
authored
Assistant: Anthropic prompt caching extension API (#8336)
This PR makes it possible for extensions to manually define cache breakpoints everywhere that's supported by Anthropic, except tool definitions (although tools will often be cached via system prompt cache breakpoints). Addresses #8325; This PR also moves the user context message from _before_ the user query to _after_ for better prompt caching. @wch mentioned that he noticed no changes to model responses when experimenting with the context/query order, but we should double-check. I cherry-picked an upstream commit to bring in updates to `LanguageModelDataPart` so that we can implement this in the same way as the Copilot extension. That gives us the added benefit that when the `LanguageModelDataPart` API proposal is accepted, extensions like `shiny-vscode` will be able to set cache breakpoints for Anthropic models contributed by both the Copilot extension and Positron Assistant. ### Release Notes #### New Features - Extensions can set Anthropic prompt cache breakpoints in the message history (#8325). #### Bug Fixes - N/A ### QA Notes Since this PR also moves the user context message from _before_ the user query to _after_ for better prompt caching, we should also double-check that the quality of responses is roughly the same. In the cases below, if caching is working, you should see logs indicating cache writes followed by cache reads, for example: ``` 2025-06-27 18:58:33.965 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0 2025-06-27 18:58:40.010 [debug] [anthropic] SEND messages.stream [req_011CQZ5f35yWxARapzaM565V]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2 2025-06-27 18:58:41.896 [debug] [anthropic] RECV messages.stream [req_011CQZ5f35yWxARapzaM565V]: usage: {"input_tokens":4,"cache_creation_input_tokens":45353,"cache_read_input_tokens":0,"output_tokens":74,"service_tier":"standard"} 2025-06-27 18:59:05.508 [debug] [anthropic] Adding cache breakpoint to text part. Source: User message 0 2025-06-27 18:59:07.680 [debug] [anthropic] SEND messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: model: claude-3-5-sonnet-latest; cache options: default; tools: <snip>; tool choice: {"type":"auto"}; system chars: 0; user messages: 1; user message characters: 151384; assistant messages: 0; assistant message characters: 2 2025-06-27 18:59:14.208 [debug] [anthropic] RECV messages.stream [req_011CQZ5hNZZE1XbxccZ8pvh6]: usage: {"input_tokens":4,"cache_creation_input_tokens":0,"cache_read_input_tokens":45353,"output_tokens":289,"service_tier":"standard"} ``` Step-by-step instructions: 1. Positron Assistant participants cache write/read the last 2 user messages when using Anthropic models 2. Positron Assistant participants should behave as before for Vercel models (e.g. after disabling `positron.assistant.useAnthropicSdk` and restarting) 3. Requires a bit more setup to test the Shiny extension in Positron: 1. Start a Positron dev instance at branch `feature/anthropic-cache-messages` 2. In Positron, open the Shiny extension repo at this branch: posit-dev/shiny-vscode#94. Open the `src/extension.ts` file and press F5 to start debugging 3. Try the `@shiny` participant in the Positron Assistant chat pane, with Anthropic and Vercel models 4. Similarly, the Shiny extension can be tested in VSCode by following the same steps as above in VSCode. There will be no caching but nothing should break. @:assistant
1 parent 8df24d0 commit 03ae7a5

File tree

14 files changed

+666
-332
lines changed

14 files changed

+666
-332
lines changed

extensions/positron-assistant/src/anthropic.ts

Lines changed: 162 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -8,18 +8,25 @@ import * as vscode from 'vscode';
88
import Anthropic from '@anthropic-ai/sdk';
99
import { ModelConfig } from './config';
1010
import { isLanguageModelImagePart, LanguageModelImagePart } from './languageModelParts.js';
11-
import { isChatImagePart, processMessages } from './utils.js';
11+
import { isChatImagePart, isCacheBreakpointPart, parseCacheBreakpoint, processMessages } from './utils.js';
1212
import { DEFAULT_MAX_TOKEN_OUTPUT } from './constants.js';
1313
import { log } from './extension.js';
1414

1515
/**
1616
* Options for controlling cache behavior in the Anthropic language model.
1717
*/
1818
export interface CacheControlOptions {
19-
/** Add a cache control point to the system prompt (default: true). */
19+
/** Add a cache breakpoint to the system prompt (default: true). */
2020
system?: boolean;
2121
}
2222

23+
/**
24+
* Block params that set cache breakpoints.
25+
*/
26+
type CacheControllableBlockParam = Anthropic.TextBlockParam |
27+
Anthropic.ImageBlockParam |
28+
Anthropic.ToolUseBlockParam |
29+
Anthropic.ToolResultBlockParam;
2330

2431
export class AnthropicLanguageModel implements positron.ai.LanguageModelChatProvider {
2532
name: string;
@@ -201,11 +208,10 @@ export class AnthropicLanguageModel implements positron.ai.LanguageModelChatProv
201208
});
202209
} else {
203210
// For LanguageModelChatMessage, ensure it has non-empty message content
204-
const processedMessages = processMessages([text]);
205-
if (processedMessages.length === 0) {
211+
messages.push(...toAnthropicMessages([text]));
212+
if (messages.length === 0) {
206213
return 0;
207214
}
208-
messages.push(...processedMessages.map(toAnthropicMessage));
209215
}
210216
const result = await this._client.messages.countTokens({
211217
model: this._config.model,
@@ -224,28 +230,39 @@ export class AnthropicLanguageModel implements positron.ai.LanguageModelChatProv
224230
}
225231

226232
function toAnthropicMessages(messages: vscode.LanguageModelChatMessage2[]): Anthropic.MessageParam[] {
227-
const anthropicMessages = processMessages(messages).map(toAnthropicMessage);
233+
let userMessageIndex = 0;
234+
let assistantMessageIndex = 0;
235+
const anthropicMessages = processMessages(messages).map((message) => {
236+
const source = message.role === vscode.LanguageModelChatMessageRole.User ?
237+
`User message ${userMessageIndex++}` :
238+
`Assistant message ${assistantMessageIndex++}`;
239+
return toAnthropicMessage(message, source);
240+
});
228241
return anthropicMessages;
229242
}
230243

231-
function toAnthropicMessage(message: vscode.LanguageModelChatMessage2): Anthropic.MessageParam {
244+
function toAnthropicMessage(message: vscode.LanguageModelChatMessage2, source: string): Anthropic.MessageParam {
232245
switch (message.role) {
233246
case vscode.LanguageModelChatMessageRole.Assistant:
234-
return toAnthropicAssistantMessage(message);
247+
return toAnthropicAssistantMessage(message, source);
235248
case vscode.LanguageModelChatMessageRole.User:
236-
return toAnthropicUserMessage(message);
249+
return toAnthropicUserMessage(message, source);
237250
default:
238251
throw new Error(`Unsupported message role: ${message.role}`);
239252
}
240253
}
241254

242-
function toAnthropicAssistantMessage(message: vscode.LanguageModelChatMessage2): Anthropic.MessageParam {
255+
function toAnthropicAssistantMessage(message: vscode.LanguageModelChatMessage2, source: string): Anthropic.MessageParam {
243256
const content: Anthropic.ContentBlockParam[] = [];
244-
for (const part of message.content) {
257+
for (let i = 0; i < message.content.length; i++) {
258+
const [part, nextPart] = [message.content[i], message.content[i + 1]];
259+
const dataPart = nextPart instanceof vscode.LanguageModelDataPart ? nextPart : undefined;
245260
if (part instanceof vscode.LanguageModelTextPart) {
246-
content.push(toAnthropicTextBlock(part));
261+
content.push(toAnthropicTextBlock(part, source, dataPart));
247262
} else if (part instanceof vscode.LanguageModelToolCallPart) {
248-
content.push(toAnthropicToolUseBlock(part));
263+
content.push(toAnthropicToolUseBlock(part, source, dataPart));
264+
} else if (part instanceof vscode.LanguageModelDataPart) {
265+
// Skip extra data parts. They're handled in part conversion.
249266
} else {
250267
throw new Error('Unsupported part type on assistant message');
251268
}
@@ -256,20 +273,22 @@ function toAnthropicAssistantMessage(message: vscode.LanguageModelChatMessage2):
256273
};
257274
}
258275

259-
function toAnthropicUserMessage(message: vscode.LanguageModelChatMessage2): Anthropic.MessageParam {
276+
function toAnthropicUserMessage(message: vscode.LanguageModelChatMessage2, source: string): Anthropic.MessageParam {
260277
const content: Anthropic.ContentBlockParam[] = [];
261-
for (const part of message.content) {
278+
for (let i = 0; i < message.content.length; i++) {
279+
const [part, nextPart] = [message.content[i], message.content[i + 1]];
280+
const dataPart = nextPart instanceof vscode.LanguageModelDataPart ? nextPart : undefined;
262281
if (part instanceof vscode.LanguageModelTextPart) {
263-
content.push(toAnthropicTextBlock(part));
282+
content.push(toAnthropicTextBlock(part, source, dataPart));
264283
} else if (part instanceof vscode.LanguageModelToolResultPart) {
265-
content.push(toAnthropicToolResultBlock(part));
284+
content.push(toAnthropicToolResultBlock(part, source, dataPart));
266285
} else if (part instanceof vscode.LanguageModelToolResultPart2) {
267-
content.push(toAnthropicToolResultBlock(part));
286+
content.push(toAnthropicToolResultBlock(part, source, dataPart));
268287
} else if (part instanceof vscode.LanguageModelDataPart) {
269288
if (isChatImagePart(part)) {
270-
content.push(chatImagePartToAnthropicImageBlock(part));
289+
content.push(chatImagePartToAnthropicImageBlock(part, source, dataPart));
271290
} else {
272-
throw new Error('Unsupported language model data part type on user message');
291+
// Skip other data parts.
273292
}
274293
} else {
275294
throw new Error('Unsupported part type on user message');
@@ -281,62 +300,106 @@ function toAnthropicUserMessage(message: vscode.LanguageModelChatMessage2): Anth
281300
};
282301
}
283302

284-
function toAnthropicTextBlock(part: vscode.LanguageModelTextPart): Anthropic.TextBlockParam {
285-
return {
286-
type: 'text',
287-
text: part.value,
288-
};
303+
function toAnthropicTextBlock(
304+
part: vscode.LanguageModelTextPart,
305+
source: string,
306+
dataPart?: vscode.LanguageModelDataPart,
307+
): Anthropic.TextBlockParam {
308+
return withCacheControl(
309+
{
310+
type: 'text',
311+
text: part.value,
312+
},
313+
source,
314+
dataPart,
315+
);
289316
}
290317

291-
function toAnthropicToolUseBlock(part: vscode.LanguageModelToolCallPart): Anthropic.ToolUseBlockParam {
292-
return {
293-
type: 'tool_use',
294-
id: part.callId,
295-
name: part.name,
296-
input: part.input,
297-
};
318+
function toAnthropicToolUseBlock(
319+
part: vscode.LanguageModelToolCallPart,
320+
source: string,
321+
dataPart?: vscode.LanguageModelDataPart,
322+
): Anthropic.ToolUseBlockParam {
323+
return withCacheControl(
324+
{
325+
type: 'tool_use',
326+
id: part.callId,
327+
name: part.name,
328+
input: part.input,
329+
},
330+
source,
331+
dataPart,
332+
);
298333
}
299334

300-
function toAnthropicToolResultBlock(part: vscode.LanguageModelToolResultPart): Anthropic.ToolResultBlockParam {
335+
function toAnthropicToolResultBlock(
336+
part: vscode.LanguageModelToolResultPart,
337+
source: string,
338+
dataPart?: vscode.LanguageModelDataPart,
339+
): Anthropic.ToolResultBlockParam {
301340
const content: Anthropic.ToolResultBlockParam['content'] = [];
302-
for (const resultPart of part.content) {
341+
for (let i = 0; i < part.content.length; i++) {
342+
const [resultPart, resultNextPart] = [part.content[i], part.content[i + 1]];
343+
const resultDataPart = resultNextPart instanceof vscode.LanguageModelDataPart ? resultNextPart : undefined;
303344
if (resultPart instanceof vscode.LanguageModelTextPart) {
304-
content.push(toAnthropicTextBlock(resultPart));
345+
content.push(toAnthropicTextBlock(resultPart, source, resultDataPart));
305346
} else if (isLanguageModelImagePart(resultPart)) {
306-
content.push(languageModelImagePartToAnthropicImageBlock(resultPart));
347+
content.push(languageModelImagePartToAnthropicImageBlock(resultPart, source, resultDataPart));
348+
} else if (resultPart instanceof vscode.LanguageModelDataPart) {
349+
// Skip data parts.
307350
} else {
308351
throw new Error('Unsupported part type on tool result part content');
309352
}
310353
}
311-
return {
312-
type: 'tool_result',
313-
tool_use_id: part.callId,
314-
content,
315-
};
354+
return withCacheControl(
355+
{
356+
type: 'tool_result',
357+
tool_use_id: part.callId,
358+
content,
359+
},
360+
source,
361+
dataPart,
362+
);
316363
}
317364

318-
function chatImagePartToAnthropicImageBlock(part: vscode.LanguageModelDataPart): Anthropic.ImageBlockParam {
319-
return {
320-
type: 'image',
321-
source: {
322-
type: 'base64',
323-
// We may pass an unsupported mime type; let Anthropic throw the error.
324-
media_type: part.mimeType as Anthropic.Base64ImageSource['media_type'],
325-
data: Buffer.from(part.data).toString('base64'),
365+
function chatImagePartToAnthropicImageBlock(
366+
part: vscode.LanguageModelDataPart,
367+
source: string,
368+
dataPart?: vscode.LanguageModelDataPart,
369+
): Anthropic.ImageBlockParam {
370+
return withCacheControl(
371+
{
372+
type: 'image',
373+
source: {
374+
type: 'base64',
375+
// We may pass an unsupported mime type; let Anthropic throw the error.
376+
media_type: part.mimeType as Anthropic.Base64ImageSource['media_type'],
377+
data: Buffer.from(part.data).toString('base64'),
378+
},
326379
},
327-
};
380+
source,
381+
dataPart,
382+
);
328383
}
329384

330-
function languageModelImagePartToAnthropicImageBlock(part: LanguageModelImagePart): Anthropic.ImageBlockParam {
331-
return {
332-
type: 'image',
333-
source: {
334-
type: 'base64',
335-
// We may pass an unsupported mime type; let Anthropic throw the error.
336-
media_type: part.value.mimeType as Anthropic.Base64ImageSource['media_type'],
337-
data: part.value.base64,
385+
function languageModelImagePartToAnthropicImageBlock(
386+
part: LanguageModelImagePart,
387+
source: string,
388+
dataPart?: vscode.LanguageModelDataPart,
389+
): Anthropic.ImageBlockParam {
390+
return withCacheControl(
391+
{
392+
type: 'image',
393+
source: {
394+
type: 'base64',
395+
// We may pass an unsupported mime type; let Anthropic throw the error.
396+
media_type: part.value.mimeType as Anthropic.Base64ImageSource['media_type'],
397+
data: part.value.base64,
398+
},
338399
},
339-
};
400+
source,
401+
dataPart,
402+
);
340403
}
341404

342405
function toAnthropicTools(tools: vscode.LanguageModelChatTool[]): Anthropic.ToolUnion[] {
@@ -387,17 +450,30 @@ function toAnthropicSystem(system: unknown, cacheSystem = true): Anthropic.Messa
387450
}];
388451

389452
if (cacheSystem) {
390-
// Add a cache control point to the last system prompt block.
453+
// Add a cache breakpoint to the last system prompt block.
391454
const lastSystemBlock = anthropicSystem[anthropicSystem.length - 1];
392455
lastSystemBlock.cache_control = { type: 'ephemeral' };
393-
log.debug(`[anthropic] Adding cache control point to system prompt`);
456+
log.debug(`[anthropic] Adding cache breakpoint to system prompt`);
394457
}
395458

396459
return anthropicSystem;
397460
}
398-
// Pass the system prompt through as-is.
399-
// We may pass an invalid system prompt; let Anthropic throw the error.
400-
return system as Anthropic.MessageCreateParams['system'];
461+
462+
// Check if it's an array of parts.
463+
if (Array.isArray(system) && system.every(part => (part instanceof vscode.LanguageModelTextPart) ||
464+
(part instanceof vscode.LanguageModelDataPart))) {
465+
const anthropicSystem: Anthropic.MessageCreateParams['system'] = [];
466+
for (let i = 0; i < system.length; i++) {
467+
const [part, nextPart] = [system[i], system[i + 1]];
468+
const dataPart = nextPart instanceof vscode.LanguageModelDataPart ? nextPart : undefined;
469+
if (part instanceof vscode.LanguageModelTextPart) {
470+
anthropicSystem.push(toAnthropicTextBlock(part, 'System prompt', dataPart));
471+
}
472+
}
473+
return anthropicSystem;
474+
}
475+
476+
throw new Error(`Unexpected system prompt value`);
401477
}
402478

403479
function isCacheControlOptions(options: unknown): options is CacheControlOptions {
@@ -407,3 +483,25 @@ function isCacheControlOptions(options: unknown): options is CacheControlOptions
407483
const cacheControlOptions = options as CacheControlOptions;
408484
return cacheControlOptions.system === undefined || typeof cacheControlOptions.system === 'boolean';
409485
}
486+
487+
function withCacheControl<T extends CacheControllableBlockParam>(
488+
part: T,
489+
source: string,
490+
dataPart: vscode.LanguageModelDataPart | undefined,
491+
): T {
492+
if (!isCacheBreakpointPart(dataPart)) {
493+
return part;
494+
}
495+
496+
try {
497+
const cachBreakpoint = parseCacheBreakpoint(dataPart);
498+
log.debug(`[anthropic] Adding cache breakpoint to ${part.type} part. Source: ${source}`);
499+
return {
500+
...part,
501+
cache_control: cachBreakpoint,
502+
};
503+
} catch (error) {
504+
log.error(`[anthropic] Failed to parse cache breakpoint: ${error}`);
505+
return part;
506+
}
507+
}

extensions/positron-assistant/src/models.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ class EchoLanguageModel implements positron.ai.LanguageModelChatProvider {
102102
token: vscode.CancellationToken
103103
): Promise<any> {
104104
const _messages = toAIMessage(messages);
105-
const message = _messages[_messages.length - 1];
105+
const message = _messages[0];
106106

107107
if (typeof message.content === 'string') {
108108
message.content = [{ type: 'text', text: message.content }];

0 commit comments

Comments
 (0)