Skip to content

Conversation

kalpadhwaryu
Copy link
Contributor

@kalpadhwaryu kalpadhwaryu commented Sep 3, 2025

Description

Support JSON body input and allow specifying the path in Agent Chat API via API key.

Summary by CodeRabbit

  • New Features

    • Chat and search requests can include an optional path to auto-attach relevant documents as context.
    • Smarter link and spreadsheet handling: expands Google Sheets into subsheets; collects Gmail thread context when available.
    • Enhanced document lookup from collection/folder/file references for richer responses.
  • Refactor

    • Agent chat endpoint validates input from JSON body instead of query params.
    • Request schemas now require a modelId and support the optional path field.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 3, 2025

Walkthrough

Adds path-based context to chat agents: request bodies may include a path used to resolve Vespa IDs via knowledgeBase.getRecordBypath, which feed into extractFileIdsFromMessage and downstream RAG logic. Schemas updated to accept path and require modelId. Server route validates JSON body. Utils expand link/pill parsing and return threadIds. Minor formatting elsewhere.

Changes

Cohort / File(s) Summary
Path-based context in chat agents
server/api/chat/agents.ts
Accepts optional path in MessageReqType; looks up IDs via getRecordBypath(path, db); includes path-derived IDs in context checks and in extractFileIdsFromMessage(message, email, ids) across MessageWithToolsApi and AgentMessageApi.
Chat utilities: file/thread extraction
server/api/chat/utils.ts
extractFileIdsFromMessage now takes optional pathRefId, returns { totalValidFileIdsFromLinkCount, fileIds, threadIds }. Adds wholeSheet expansion for Sheets, collects Gmail threadIds, expands links to subsheets, swallows JSON parse errors, integrates collection/folder Vespa ID seeds.
Knowledge base enhancements
server/db/knowledgeBase.ts
Adds getRecordBypath(path, trx) to resolve Vespa docId from collection/path/name. Extends getAllCollectionAndFolderItems to accept Vespa IDs (clfd-/clf-) as seeds. Minor signature/formatting adjustments.
Schema and validation updates
server/api/search.ts
Adds optional path to agentChatMessageSchema and messageSchema. Makes modelId required (z.string().min(1)) in agentChatMessageSchema.
Routing validation source
server/server.ts
/api/v1/agent/chat now validates request body JSON with agentChatMessageSchema instead of query params.
Formatting only
server/api/chat/chat.ts
Import spacing change for GroundingSupport; no functional impact.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Server as API Route (/api/v1/agent/chat)
  participant Validator as zValidator(JSON)
  participant Agent as MessageWithToolsApi / AgentMessageApi
  participant KB as knowledgeBase.getRecordBypath
  participant Utils as extractFileIdsFromMessage
  participant RAG as Understand/Answer

  Client->>Server: POST body { message, modelId, path? }
  Server->>Validator: Validate agentChatMessageSchema
  alt valid
    Server->>Agent: Invoke with { message, email, path? }
    opt path provided
      Agent->>KB: getRecordBypath(path)
      KB-->>Agent: ids (vespaDocId or null)
    end
    Agent->>Utils: extractFileIdsFromMessage(message, email, ids?)
    Utils-->>Agent: { fileIds, threadIds, totalValidFileIdsFromLinkCount }
    Agent->>RAG: Decide/augment context with fileIds/threadIds
    RAG-->>Agent: Response
    Agent-->>Server: Stream/Return answer
    Server-->>Client: 200 OK + response
  else invalid
    Validator-->>Server: Error
    Server-->>Client: 400 Bad Request
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • junaid-shirur

Poem

I mapped a path through folders deep,
Hop-hop, I fetched the IDs we keep.
Threads and sheets now join the chat,
Tiny links expand—imagine that!
With JSON checked and models set,
This bunny ships the smartest set. 🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-custom-apis

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @kalpadhwaryu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly upgrades the Agent Chat API's capabilities. It introduces the ability to send chat requests with JSON bodies when using API keys, aligning with modern API design practices. Furthermore, it implements a powerful new feature that allows users to specify direct paths to knowledge base items, ensuring that the agent's responses are informed by highly relevant and precisely located information. These changes streamline API interactions and enhance the accuracy of agent-driven conversations.

Highlights

  • Enhanced API Key Integration: The Agent Chat API now accepts JSON request bodies when authenticated via API keys, providing a more flexible and standard way to send data.
  • Knowledge Base Path Support: Users can now specify a "path" in agent chat requests, allowing the system to retrieve relevant documents or folders from the knowledge base based on their hierarchical location.
  • Improved Data Retrieval for RAG: The system can now resolve knowledge base paths to specific document IDs, enabling more precise retrieval-augmented generation (RAG) by grounding agent responses in targeted information.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for JSON body input and a path parameter to the agent chat API. The changes are well-aligned with the goal, touching API schemas, request handlers, and database logic. My review focuses on improving the robustness and maintainability of these changes. I've identified a few areas for improvement, including a critical bug in the new path resolution logic that could lead to arbitrary behavior, a need for safer error handling when parsing JSON, and opportunities to improve type safety and remove fragile code patterns.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
server/db/knowledgeBase.ts (1)

782-843: Add workspaceId scoping and update usages

  • Change signature to getRecordBypath(path, trx, workspaceId), filter both collections and collectionItems on workspaceId, and return collection.vespaDocId when no item is specified.
  • Update all call sites to pass the new workspaceId argument (e.g. server/api/chat/agents.ts:3549).
  • Ensure the workspaceId column exists on both collections and collectionItems.
server/server.ts (2)

285-333: API key extraction does not read JSON body; error message contradicts behavior

  • Middleware reads x-api-key header or api_key query param, but the message says “provide apiKey in request body.”
  • PR goal is JSON body support; implement body extraction without breaking validators.

Minimal fix:

  • Try header → query → validated JSON (if available) in that order.
  • Keep backward compatibility for other routes.
 const ApiKeyMiddleware = async (c: Context, next: Next) => {
-  let apiKey: string
+  let apiKey: string | undefined
   try {
-    // Extract API key from request body
-    apiKey = c.req.header("x-api-key") || (c.req.query("api_key") as string)
+    // Extract API key: header → query → validated JSON (if a prior validator ran)
+    apiKey =
+      c.req.header("x-api-key") ||
+      (c.req.query("api_key") as string | undefined)
+    if (!apiKey) {
+      try {
+        // Will throw if no prior zValidator("json") has run
+        const body = c.req.valid?.("json") as Record<string, unknown>
+        apiKey = typeof body?.["apiKey"] === "string" ? (body["apiKey"] as string) : undefined
+      } catch {}
+    }
 
     if (!apiKey) {
       Logger.error(
-        "API key verification failed: Missing apiKey in request body",
+        "API key verification failed: Missing API key (header x-api-key, query api_key, or JSON apiKey)",
       )
       throw new HTTPException(401, {
-        message: "Missing API key. Please provide apiKey in request body.",
+        message:
+          "Missing API key. Provide it via header 'x-api-key', query 'api_key', or JSON 'apiKey'.",
       })
     }

725-733: Update Agent Chat endpoint and ApiKeyMiddleware to support JSON-body apiKey

  • Extend agentChatMessageSchema to include apiKey: z.string().min(1)
  • Swap middleware order in server/server.ts:726–732 so zValidator("json", agentChatMessageWithApiKeySchema) runs before ApiKeyMiddleware
  • In ApiKeyMiddleware (around line 285), after parsing JSON via c.req.valid("json"), fallback to body.apiKey before header/query extraction
server/api/chat/agents.ts (1)

3507-3558: Enforce workspace-scoped authorization for path lookups

getRecordBypath(path, db) does not appear to include workspace scoping; a crafted path could cross tenants. Pass workspace context and verify ownership before expanding to fileIds.

- let ids
- if (path) {
-   ids = await getRecordBypath(path, db)
- }
+ let ids: string | null = null
+ if (path) {
+   // TODO: ensure getRecordBypath filters by workspace; otherwise create a workspace-aware variant
+   ids = await getRecordBypath(path, db)
+   if (ids) {
+     // Optional: audit
+     loggerWithChild({ email }).info(`Resolved path to refId`, { path })
+   }
+ }

Follow-up: add workspaceId to the DB query (e.g., join on collections.workspaceId = current workspace.id).

🧹 Nitpick comments (9)
server/db/knowledgeBase.ts (1)

573-660: BFS traversal can reduce round-trips; consider workspace scoping in vespaDocId lookups

  • Performance: Each queue pop triggers a DB round-trip. Batch children fetches with inArray on parent IDs to reduce latency in large trees.
  • Security/consistency: When resolving cl-/clfd-/clf- seeds, optionally constrain by workspaceId to avoid accidental cross-tenant matches (docIds are opaque but scoping is safer).

Optional refactor sketch:

-  while (queue.length > 0) {
-    const { itemId } = queue.shift()!
-    const children = await trx
-      .select()
-      .from(collectionItems)
-      .where(and(eq(collectionItems.parentId, itemId), isNull(collectionItems.deletedAt)))
-    ...
-  }
+  while (queue.length > 0) {
+    const batch = queue.splice(0, 100).map(q => q.itemId)
+    const children = await trx
+      .select()
+      .from(collectionItems)
+      .where(
+        and(
+          inArray(collectionItems.parentId, batch),
+          isNull(collectionItems.deletedAt),
+        ),
+      )
+    // group by parentId, push folders back to queue, collect files
+  }
server/api/search.ts (1)

160-162: Add minimal validation for "path" to prevent malformed inputs

Recommend constraining allowed characters and rejecting path traversal patterns.

-  path: z.string().optional(),
+  path: z
+    .string()
+    .trim()
+    .regex(/^(\/)?[A-Za-z0-9._\-\/]+$/, "Invalid path format")
+    .refine((p) => !p?.includes("..") && !p?.includes("//"), "Invalid path")
+    .optional(),
server/api/chat/agents.ts (3)

3545-3546: Guard decodeURIComponent to handle plain-text messages

External API clients may send plain JSON (not URL-encoded). Avoid exceptions.

- message = decodeURIComponent(message)
+ try { message = decodeURIComponent(message) } catch { /* already plain */ }

3560-3560: Remove unused variable

agentDocs is declared but never used.

- const agentDocs = agentForDb?.docIds || []

4023-4077: Limit history carefully to control token growth

Slicing to last 8 is fine; consider making this a config (e.g., chatHistoryMessageLimit) to tune without code changes.

server/api/chat/utils.ts (4)

508-521: Type and name the new parameter; avoid any

pathRefId is a Vespa docId string (or null). Tighten types to prevent misuse.

-export const extractFileIdsFromMessage = async (
-  message: string,
-  email?: string,
-  pathRefId?: any,
+export const extractFileIdsFromMessage = async (
+  message: string,
+  email?: string,
+  pathRefId?: string | null,
 ): Promise<{

603-604: Log JSON parse failures for diagnostics (without throwing)

Swallowing errors hides malformed inputs during debugging.

-  } catch (error) {}
+  } catch (error) {
+    getLoggerWithChild(Subsystem.Chat)().debug(
+      "Failed to parse message as JSON for context extraction",
+      { err: getErrorMessage(error) },
+    )
+  }

646-654: De-duplicate fileIds before returning

Prevents redundant fetches and duplicate citations.

-  return { totalValidFileIdsFromLinkCount, fileIds, threadIds }
+  const uniqueFileIds = [...new Set(fileIds)]
+  return { totalValidFileIdsFromLinkCount, fileIds: uniqueFileIds, threadIds }

64-64: Remove unused import

"http.get" is not used.

-import { get } from "http"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f1b157a and 60b743c.

📒 Files selected for processing (6)
  • server/api/chat/agents.ts (13 hunks)
  • server/api/chat/chat.ts (1 hunks)
  • server/api/chat/utils.ts (3 hunks)
  • server/api/search.ts (2 hunks)
  • server/db/knowledgeBase.ts (18 hunks)
  • server/server.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
server/server.ts (1)
server/api/search.ts (1)
  • agentChatMessageSchema (157-177)
server/api/chat/agents.ts (5)
server/db/knowledgeBase.ts (1)
  • getRecordBypath (782-842)
server/api/chat/utils.ts (2)
  • isMessageWithContext (495-497)
  • extractFileIdsFromMessage (505-655)
server/ai/types.ts (1)
  • QueryRouterLLMResponse (212-212)
server/ai/provider/index.ts (1)
  • generateSearchQueryOrAnswerFromConversation (1459-1499)
server/api/chat/types.ts (1)
  • ragPipelineConfig (35-63)
server/api/chat/utils.ts (4)
server/ai/types.ts (1)
  • UserQuery (259-259)
server/shared/types.ts (2)
  • DriveEntity (26-26)
  • Apps (31-31)
server/integrations/google/sync.ts (1)
  • getDocumentOrSpreadsheet (91-109)
server/search/types.ts (1)
  • VespaFile (926-926)
server/db/knowledgeBase.ts (3)
server/db/schema/knowledgeBase.ts (3)
  • NewCollection (180-180)
  • Collection (179-179)
  • collections (19-58)
frontend/src/types/knowledgeBase.ts (1)
  • Collection (1-18)
server/types.ts (1)
  • TxnOrClient (301-301)
🔇 Additional comments (2)
server/api/chat/chat.ts (1)

229-229: Confirm GroundingSupport export in @google/genai ^1.9.0

server/package.json pins @google/genai to ^1.9.0 — verify that this version actually exports the GroundingSupport type to prevent TS import errors.

server/api/chat/agents.ts (1)

2772-2790: Good: JSON body for API-key flow, query for JWT flow

This aligns input handling with auth mode. No issues spotted.

@zereraz zereraz merged commit 58704db into main Sep 3, 2025
4 checks passed
@zereraz zereraz deleted the fix-custom-apis branch September 3, 2025 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants