Skip to content

Add ChatContextOptions to ChatOptions #347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 30, 2025

Conversation

austin-denoble
Copy link
Contributor

@austin-denoble austin-denoble commented May 30, 2025

Problem

There was a change made to the Assistant chat interface in the last release that were not captured in the TypeScript client - topK and snippetSize have been moved into a context_options value: https://docs.pinecone.io/reference/api/2025-04/assistant/chat_assistant#body-context-options. The client currently has topK at the top level of the ChatOptions payload.

While cleaning this up I also noticed a few inconsistencies with how some things are being passed to the Assistant API which I've also addressed here. There's some added complexity in the TypeScript implementation for Assistant as the generated code did not fully cover implementation for streaming and file upload, so chatStream, chatCompletionStream, and uploadFile all use fetch directly rather than plumbing through generated code.

This looks to address this community post: https://community.pinecone.io/t/pinecone-assistant-chatstream-topk-and-snippet-size/8065/1

Solution

  • Add and export ChatContextOptions as a new type which wraps topK and snippetSize. To keep this somewhat non-breaking, I've left the topK field at the top of the ChatOptions interface as well. There's a bit of logic to use this if present, and otherwise use contextOptions.topK. I wanted to keep things non-breaking so we could release this as a minor upgrade.
  • In the chatStream, chatCompletionStream functions we need to manually convert the keys of the passed objects to snake case. The API is expecting snakecase, and normally the generated OpenAPI types handle this for us. Since we're not using those directly in these cases, we need to handle this ourselves. I had missed this previously, although I did handle converting responses from snake to camel where necessary. I know this is a bit confusing - apologies.
  • The context function wasn't passing messages, topK, or snippetSize properly. This was just a miss on my part.

We need better testing coverage in general for assistant operations. The implementation was a bit rushed on my part earlier this year.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Infrastructure change (CI configs, etc)
  • Non-code change (docs, etc)
  • None of the above: (explain here)

Test Plan

CI - external app tests, unit tests, integration tests

Manually tested using the assistant interface via repl. You can pull this branch down yourself and play around using npm run repl locally:

npm run repl
await init()

await client.createAssistant({ name: 'test-assistant' })

# you'll need to use a local path
await client.Assistant('test-assistant').uploadFile({ path: '/Users/austin/Downloads/A_Primer_on_Memory_Consistency_and_Cache_Coherence-2nd-Edition.pdf', metadata: { genre: 'classical', meat: 'cute' }})

# wait for file to process, etc

# non-streaming chat, contextOptions
await client.assistant('test-assistant').chat({ messages: [{ role: 'user', content: 'tell me a few basics about caching'}], model: 'claude-3-5-sonnet', filter: { genre: 'classical'}, jsonResponse: true, includeHighlights: true, contextOptions: {topK:20, snippetSize: 584 }})

# non-streaming chat, topK only
await client.assistant('test-assistant').chat({ messages: [{ role: 'user', content: 'tell me a few basics about caching'}], model: 'claude-3-5-sonnet', filter: { genre: 'classical'}, jsonResponse: true, includeHighlights: true, topK: 20})

# streaming chat, contextOptions
const chatStream = await client.assistant('test-assistant').chatStream({ messages: [{ role: 'user', content: 'tell me a few basics about caching'}], model: 'claude-3-5-sonnet', filter: { genre: 'classical'}, includeHighlights: true, contextOptions: {topK:20, snippetSize: 584 }})
for await (const chunk of chatStream) { console.log(chunk) }

# streaming chat, contextOptions
const chatStream = await client.assistant('test-assistant').chatStream({ messages: [{ role: 'user', content: 'tell me a few basics about caching'}], model: 'claude-3-5-sonnet', filter: { genre: 'classical'}, includeHighlights: true, topK: 20})
for await (const chunk of chatStream) { console.log(chunk) }

Here are some examples with PINECONE_DEBUG output from my local runs:

> await client.assistant('test-assistant').chat({ messages: [{ role: 'user', content: 'tell me a few basics about caching'}], model: 'claude-3-5-sonnet', filter: { genre: 'classical'}, jsonResponse: true, includeHighlights: true, contextOptions: {topK:20, snippetSize: 584 }})
>>> Request: GET https://api.pinecone.io/assistant/assistants/test-assistant
>>> Headers: {"User-Agent":"@pinecone-database/pinecone v6.0.1; lang=typescript; node v18.17.1","X-Pinecone-Api-Version":"2025-04","Api-Key":"***REDACTED***"}

<<< Status: 200
<<< Body: {"name":"test-assistant","instructions":null,"metadata":null,"status":"Ready","host":"https://prod-1-data.ke.pinecone.io","created_at":"2025-05-30T16:51:21.987343597Z","updated_at":"2025-05-30T16:51:22.374927670Z"}

curl -X GET https://api.pinecone.io/assistant/assistants/test-assistant -H "Api-Key: ****REDACTED***" 

>>> Request: POST https://prod-1-data.ke.pinecone.io/assistant/chat/test-assistant
>>> Headers: {"User-Agent":"@pinecone-database/pinecone v6.0.1; lang=typescript; node v18.17.1","X-Pinecone-Api-Version":"2025-04","Content-Type":"application/json","Api-Key":"***REDACTED***"}
>>> Body: {"messages":[{"role":"user","content":"tell me a few basics about caching"}],"stream":false,"model":"claude-3-5-sonnet","filter":{"genre":"classical"},"json_response":true,"include_highlights":true,"context_options":{"top_k":20,"snippet_size":584}}

<<< Status: 200
<<< Body: {"finish_reason":"stop","message":{"role":"assistant","content":"{\n  \"basics\": [\n ... ETC }

curl -X POST https://prod-1-data.ke.pinecone.io/assistant/chat/test-assistant -H "Api-Key: ***REDACTED***" -H "Content-Type: application/json" -d '{"messages":[{"role":"user","content":"tell me a few basics about caching"}],"stream":false,"model":"claude-3-5-sonnet","filter":{"genre":"classical"},"json_response":true,"include_highlights":true,"context_options":{"top_k":20,"snippet_size":584}}'

{
  id: '00000000000000005ec73ab3e9e573ad',
  finishReason: 'stop',
  message: {
    role: 'assistant',
    content: '{\n' +
      '  "basics": [\n' +
      '    "Caches are used to reduce average latencies to access storage structures.",\n' +
      '    "A typical system model includes a multicore processor chip with private data caches for each core and a shared last-level cache (LLC).",\n' +
      '    "Cache coherence is needed to maintain consistency between multiple cached copies of data.",\n' +
      '    "The granularity of coherence is usually maintained at the level of cache blocks, rather than individual bytes.",\n' +
      '    "Common cache states include Modified (M), Shared (S), and Invalid (I), which are part of the MSI protocol."\n' +
      '  ]\n' +
      '}'
  },
  model: 'arn:aws:bedrock:us-east-1::inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0',
  citations: [
    { position: 93, references: [Array] },
    { position: 235, references: [Array] },
    { position: 332, references: [Array] },
    { position: 450, references: [Array] },
    { position: 564, references: [Array] }
  ],
  usage: { promptTokens: 13795, completionTokens: 189, totalTokens: 13984 }
}
>  await client.assistant('test-assistant').chat({ messages: [{ role: 'user', content: 'tell me a few basics about caching'}], model: 'claude-3-5-sonnet', filter: { genre: 'classical'}, jsonResponse: true, includeHighlights: true, topK: 20})
>>> Request: GET https://api.pinecone.io/assistant/assistants/test-assistant
>>> Headers: {"User-Agent":"@pinecone-database/pinecone v6.0.1; lang=typescript; node v18.17.1","X-Pinecone-Api-Version":"2025-04","Api-Key":"***REDACTED***"}

<<< Status: 200
<<< Body: {"name":"test-assistant","instructions":null,"metadata":null,"status":"Ready","host":"https://prod-1-data.ke.pinecone.io","created_at":"2025-05-30T16:51:21.987343597Z","updated_at":"2025-05-30T16:51:22.374927670Z"}

curl -X GET https://api.pinecone.io/assistant/assistants/test-assistant -H "Api-Key: ***REDACTED***" 

>>> Request: POST https://prod-1-data.ke.pinecone.io/assistant/chat/test-assistant
>>> Headers: {"User-Agent":"@pinecone-database/pinecone v6.0.1; lang=typescript; node v18.17.1","X-Pinecone-Api-Version":"2025-04","Content-Type":"application/json","Api-Key":"***REDACTED***"}
>>> Body: {"messages":[{"role":"user","content":"tell me a few basics about caching"}],"stream":false,"model":"claude-3-5-sonnet","filter":{"genre":"classical"},"json_response":true,"include_highlights":true,"context_options":{"top_k":20}}

<<< Status: 200
<<< Body: {"finish_reason":"stop","message":{"role":"assistant","content":"{\n  \"basics\":...ETC}

curl -X POST https://prod-1-data.ke.pinecone.io/assistant/chat/test-assistant -H "Api-Key: ***REDACTED***" -H "Content-Type: application/json" -d '{"messages":[{"role":"user","content":"tell me a few basics about caching"}],"stream":false,"model":"claude-3-5-sonnet","filter":{"genre":"classical"},"json_response":true,"include_highlights":true,"context_options":{"top_k":20}}'

{
  id: '00000000000000004e4877c14db3faa7',
  finishReason: 'stop',
  message: {
    role: 'assistant',
    content: '{\n' +
      '  "basics": [\n' +
      '    "Caches are used to store recently accessed data for faster retrieval, reducing the need to access slower main memory. A cache contains copies of data from frequently used main memory locations.",\n' +
      '    "There are typically multiple levels of caches in a system, including private level-one (L1) caches for each processor core and a shared last-level cache (LLC).",\n' +
      '    "Caches can be virtually addressed or physically addressed. Most modern systems use physically addressed caches, where the cache is accessed using physical memory addresses.",\n' +
      '    "Cache coherence is needed to ensure that multiple copies of data in different caches remain consistent. Coherence protocols define rules for maintaining consistency between caches.",\n' +
      '    "Two main types of coherence protocols are snooping protocols, which broadcast requests to all caches, and directory protocols, which use a centralized directory to track which caches have copies of data."\n' +
      '  ]\n' +
      '}'
  },
  model: 'arn:aws:bedrock:us-east-1::inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0',
  citations: [
    { position: 137, references: [Array] },
    { position: 380, references: [Array] },
    { position: 446, references: [Array] },
    { position: 560, references: [Array] },
    { position: 671, references: [Array] },
    { position: 748, references: [Array] },
    { position: 959, references: [Array] }
  ],
  usage: { promptTokens: 46851, completionTokens: 286, totalTokens: 47137 }
}

…hape, update chat, chatStream, and context to handle sending requests properly
…llow backwards compatibility with the previous interface
@austin-denoble austin-denoble marked this pull request as ready for review May 30, 2025 17:48
@austin-denoble austin-denoble merged commit f752c3a into main May 30, 2025
53 of 54 checks passed
@austin-denoble austin-denoble deleted the adenoble/fix-assistant-data-options branch May 30, 2025 21:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant