Skip to content

Conversation

naSim087
Copy link
Contributor

@naSim087 naSim087 commented Oct 1, 2025

Description

if the path is passed to AgentMessageApi , while doing the vespa call we replace the kb integration with the path referenced folder or file

Testing

Additional Notes

Summary by CodeRabbit

  • New Features

    • Answers now use collection/folder/file context derived from the current path, improving relevance.
    • Path-based context applied consistently for both streamed and non-streamed responses.
    • Model selection now receives explicit model info for more accurate results.
  • Bug Fixes

    • More resilient handling of missing or invalid path context, gracefully falling back to previous behavior.
    • Reduced edge-case inconsistencies when selecting knowledge-base items from navigation paths.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 1, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Threaded path-derived collection identifiers through the chat/agent pipeline: added extractItemIdsFromPath, introduced PathExtractedInfo, updated multiple function signatures to accept actualModelId and pathExtractedInfo, and replaced prior folderIds/ids usage with path-derived collectionFileIds/collectionFolderIds/collectionIds where applicable.

Changes

Cohort / File(s) Summary
Agent flow updates
server/api/chat/agents.ts
Import extractItemIdsFromPath; compute pathExtractedInfo when a valid pathRefId is present; derive extractedInfo from message context only; call extractFileIdsFromMessage(message, email) without ids; pass actualModelId and pathExtractedInfo into UnderstandMessageAndAnswer / UnderstandMessageAndAnswerForGivenContext across stream and non-stream flows.
Chat core signatures & sourcing
server/api/chat/chat.ts
Add exported PathExtractedInfo type; add optional pathExtractedInfo parameter to generators and helpers (generateIterativeTimeFilterAndQueryRewrite, generatePointQueryTimeExpansion, generateMetadataQueryAnswer, UnderstandMessageAndAnswer, UnderstandMessageAndAnswerForGivenContext); introduce getCollectionSource(pathExtractedInfo, selectedItems) to prefer folder → file → collection IDs and thread source selection through existing flows while preserving defaults.
Utils: path ID extraction
server/api/chat/utils.ts
New extractItemIdsFromPath(pathRefId) that queries collectionItems and collections, returns { collectionFileIds, collectionFolderIds, collectionIds }, prefixes IDs (clf-, clfd-, cl-), logs errors without throwing, and returns empty arrays for null/failed lookups.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as User
  participant A as agents.ts
  participant UT as utils.extractItemIdsFromPath
  participant C as chat.ts (Understand...)
  participant KB as Knowledge Base

  U->>A: Send message (+ optional pathRefId)
  A->>UT: extractItemIdsFromPath(pathRefId)
  UT-->>A: pathExtractedInfo {collectionFileIds, collectionFolderIds, collectionIds}
  A->>C: UnderstandMessageAndAnswer(message, email, actualModelId, pathExtractedInfo)
  activate C
  Note over C: source = pathExtractedInfo ? (folder→file→collection) : selectedItems[KB]
  C->>KB: Fetch/process docs from chosen source
  KB-->>C: Context chunks / metadata
  C-->>A: Answer (stream or non-stream)
  A-->>U: Final/streamed response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • devesh-juspay
  • kalpadhwaryu

Poem

I’m a rabbit in the code-lined glade,
I sniff the path where IDs are laid.
I hand the map to models bright,
They fetch the context, shed some light.
Hooray—answers hop into sight! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The title, while using a conventional commit prefix, is awkwardly phrased and does not clearly communicate the primary change, which is that when a path is provided to the AgentMessageApi the corresponding folder or file is used in place of the knowledge-base integration reference. Its current wording is confusing and fails to convey the intent or impact of the change to reviewers scanning the history. Please rephrase the title to clearly and concisely state that the PR makes the AgentMessageApi use the folder or file identified by the provided path as the KB integration reference, for example: “fix(agentPathSupport): use folder or file from path as KB integration reference.”
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/agentPathSupport

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @naSim087, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves an issue where the AgentMessageApi did not correctly interpret and utilize knowledge base items (files, folders, or collections) when they were referenced via a path. The changes introduce a dedicated mechanism to extract and categorize these path-based references, ensuring they are properly integrated into the RAG (Retrieval Augmented Generation) process. This enhances the accuracy and relevance of AI agent responses when specific knowledge base locations are provided.

Highlights

  • Path-based ID Extraction: Introduced a new utility function, extractItemIdsFromPath, in server/api/chat/utils.ts. This function is responsible for parsing a given pathRefId and querying the database (collectionItems and collections tables) to determine if it corresponds to a knowledge base file, folder, or an entire collection, returning their respective IDs with appropriate prefixes.
  • AgentMessageApi Refactoring: The AgentMessageApi in server/api/chat/agents.ts has been refactored to leverage the new extractItemIdsFromPath utility. This change separates the logic for extracting IDs from message content versus those derived from a provided path, ensuring that path-referred items are correctly identified and processed.
  • Knowledge Base Integration Logic Update: Core chat functions, including generateIterativeTimeFilterAndQueryRewrite, generatePointQueryTimeExpansion, and generateMetadataQueryAnswer in server/api/chat/chat.ts, now accept a pathExtractedInfo parameter. This allows these functions to prioritize and utilize the specific knowledge base items (files, folders, or collections) identified by a path, overriding general message-based item selection when a path is provided.
  • Type Definition for Path Information: A new type, pathExtractedInfo, has been defined in server/api/chat/chat.ts to standardize the structure for passing extracted file, folder, and collection IDs related to a path throughout the chat processing pipeline.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the AgentMessageApi to support knowledge base paths by introducing a new extractItemIdsFromPath utility and modifying how context is passed to the RAG pipeline. The changes correctly separate path-based context extraction from message-based extraction. My review includes suggestions to improve code quality and maintainability. I've pointed out areas with code duplication that could be refactored into a helper function, inconsistencies in naming conventions and property access that should be standardized, and an opportunity to improve type safety by avoiding the any type. Addressing these points will make the code more robust and easier to maintain.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/chat/chat.ts (1)

1212-1226: Rename collectionfileIds to collectionFileIds throughout
Update the pathExtractedInfo type to use collectionFileIds (PascalCase) and replace all occurrences of collectionfileIds with collectionFileIds in both server/api/chat/chat.ts and server/api/chat/agents.ts.

🧹 Nitpick comments (2)
server/api/chat/utils.ts (1)

718-752: Add .limit(1) to DB lookups to avoid unnecessary scans

Both selects fetch a single row; limit the query explicitly for performance and clarity.

     const [item] = await db
       .select({ id: collectionItems.id, type: collectionItems.type })
       .from(collectionItems)
       .where(
         and(
           eq(collectionItems.vespaDocId, vespaId),
           isNull(collectionItems.deletedAt),
         ),
       )
+      .limit(1)

     if (item) {
       // ...
     } else {
       // If not found in collectionItems, check collections table
       const [collection] = await db
         .select({ id: collections.id })
         .from(collections)
         .where(
           and(
             eq(collections.vespaDocId, vespaId),
             isNull(collections.deletedAt),
           ),
         )
+        .limit(1)
server/api/chat/agents.ts (1)

3536-3546: Add explicit type annotation to pathExtractedInfo

Import the exported type from utils and annotate to lock in the shape:

-    const pathExtractedInfo = isValidPath
+    // import type { PathExtractedInfo } from "@/api/chat/utils"
+    const pathExtractedInfo: PathExtractedInfo = isValidPath
       ? await extractItemIdsFromPath(ids)
       : { collectionFileIds: [], collectionFolderIds: [], collectionIds: [] }

getRecordBypath returns the Vespa doc ID (string) as expected by extractItemIdsFromPath—no further verification needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 096480e and e80edce.

📒 Files selected for processing (3)
  • server/api/chat/agents.ts (3 hunks)
  • server/api/chat/chat.ts (11 hunks)
  • server/api/chat/utils.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/api/chat/chat.ts (1)
server/shared/types.ts (1)
  • Apps (38-38)
server/api/chat/utils.ts (2)
server/db/schema/knowledgeBase.ts (2)
  • collectionItems (69-140)
  • collections (20-66)
server/logger/index.ts (2)
  • getLoggerWithChild (192-200)
  • Subsystem (15-15)
server/api/chat/agents.ts (2)
server/api/chat/utils.ts (2)
  • extractFileIdsFromMessage (513-695)
  • extractItemIdsFromPath (696-767)
server/api/chat/chat.ts (1)
  • pathExtractedInfo (3832-3836)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (1)
server/api/chat/agents.ts (1)

124-125: Import addition looks good

extractItemIdsFromPath is correctly imported from ./utils and used below.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e80edce and 301d246.

📒 Files selected for processing (2)
  • server/api/chat/agents.ts (3 hunks)
  • server/api/chat/chat.ts (11 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/api/chat/chat.ts (1)
server/shared/types.ts (1)
  • Apps (38-38)
server/api/chat/agents.ts (2)
server/api/chat/utils.ts (2)
  • extractFileIdsFromMessage (513-695)
  • extractItemIdsFromPath (696-767)
server/api/chat/chat.ts (1)
  • pathExtractedInfo (3832-3836)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
server/api/chat/agents.ts (3)

124-124: LGTM!

The import of extractItemIdsFromPath is correctly added and aligns with its usage later in the file.


3536-3545: LGTM!

The refactored extraction logic correctly separates concerns:

  • extractFileIdsFromMessage handles message-embedded context (pills, links, folders) without path info
  • extractItemIdsFromPath handles path-derived collection context when a valid path is provided

The object literals use the correct field name collectionFileIds (capital F) matching the type definition.


4306-4307: LGTM!

The parameters passed to UnderstandMessageAndAnswer are correct:

  • actualModelId provides the model configuration
  • pathExtractedInfo (constructed earlier with correct field names) enables path-based collection filtering

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
server/api/chat/utils.ts (1)

761-764: [Duplicate] Logger argument order still inverted.

This issue was flagged in a previous review but not yet fixed. The logger call passes the message before the error, which breaks the project's convention and can drop stack/metadata.

Apply this diff:

     getLoggerWithChild(Subsystem.Chat)().error(
+      error,
       `Error extracting item IDs from pathRefId: ${vespaId}`,
-      error,
     )
server/api/chat/chat.ts (1)

1349-1376: Path-derived KB IDs are still dropped

When pathExtractedInfo is present we now return its raw IDs from getCollectionSource, but we still immediately run them through the legacy cl-/clfd-/clf- prefix parsing. IDs coming from extractItemIdsFromPath do not carry those prefixes, so every branch falls through and we end up pushing zero IDs into agentSpecificCollectionSelections, meaning path-based folder/file selections remain broken. The same regression exists in the duplicated blocks at Line 2607-2634 and Line 3171-3198.

We need to short-circuit the prefix parsing whenever pathExtractedInfo is populated and push its arrays straight into collectionIds / collectionFolderIds / collectionFileIds instead.

         const collectionIds: string[] = []
         const collectionFolderIds: string[] = []
         const collectionFileIds: string[] = []
-        const source = getCollectionSource(pathExtractedInfo, selectedItems)
-
-        for (const itemId of source) {
-          if (itemId.startsWith("cl-")) {
-            // Entire collection - remove cl- prefix
-            collectionIds.push(itemId.replace(/^cl[-_]/, ""))
-          } else if (itemId.startsWith("clfd-")) {
-            // Collection folder - remove clfd- prefix
-            collectionFolderIds.push(itemId.replace(/^clfd[-_]/, ""))
-          } else if (itemId.startsWith("clf-")) {
-            // Collection file - remove clf- prefix
-            collectionFileIds.push(itemId.replace(/^clf[-_]/, ""))
-          }
-        }
+        const hasPathCollections =
+          pathExtractedInfo &&
+          (pathExtractedInfo.collectionFolderIds.length ||
+            pathExtractedInfo.collectionFileIds.length ||
+            pathExtractedInfo.collectionIds.length)
+
+        if (hasPathCollections) {
+          collectionIds.push(...pathExtractedInfo!.collectionIds)
+          collectionFolderIds.push(...pathExtractedInfo!.collectionFolderIds)
+          collectionFileIds.push(...pathExtractedInfo!.collectionFileIds)
+        } else {
+          const source = selectedItems[Apps.KnowledgeBase] || []
+          for (const itemId of source) {
+            if (itemId.startsWith("cl-")) {
+              collectionIds.push(itemId.replace(/^cl[-_]/, ""))
+            } else if (itemId.startsWith("clfd-")) {
+              collectionFolderIds.push(itemId.replace(/^clfd[-_]/, ""))
+            } else if (itemId.startsWith("clf-")) {
+              collectionFileIds.push(itemId.replace(/^clf[-_]/, ""))
+            }
+          }
+        }

Apply the same fix in the corresponding blocks inside generatePointQueryTimeExpansion and generateMetadataQueryAnswer so path-based selections flow all the way through.

🧹 Nitpick comments (1)
server/api/chat/utils.ts (1)

758-758: Empty else block—clarify intent or remove.

Line 758 contains an empty else block. If no action is needed for unrecognized prefixes, consider adding a comment explaining why (e.g., "silently ignore unrecognized prefixes") or remove the block entirely for clarity.

Apply this diff if no action is intended:

       if (collection) {
         collectionIds.push(`cl-${collection.id}`) // Keep the original prefixed ID
       }
-    } else {
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 301d246 and fba2af2.

📒 Files selected for processing (3)
  • server/api/chat/agents.ts (3 hunks)
  • server/api/chat/chat.ts (11 hunks)
  • server/api/chat/utils.ts (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/api/chat/utils.ts (2)
server/db/schema/knowledgeBase.ts (2)
  • collectionItems (69-140)
  • collections (20-66)
server/logger/index.ts (2)
  • getLoggerWithChild (192-200)
  • Subsystem (15-15)
server/api/chat/agents.ts (1)
server/api/chat/utils.ts (2)
  • extractFileIdsFromMessage (513-695)
  • extractItemIdsFromPath (696-773)
server/api/chat/chat.ts (1)
server/shared/types.ts (1)
  • Apps (38-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (3)
server/api/chat/agents.ts (2)

3536-3545: Path extraction logic looks correct.

The extraction flow properly separates message-based context (extractedInfo) from path-based context (pathExtractedInfo). The conditional extraction based on isMsgWithContext and isValidPath ensures that each type of context is only extracted when applicable, with appropriate empty defaults when not.


4295-4308: UnderstandMessageAndAnswer signature and PathExtractedInfo casing are correct
The function in server/api/chat/chat.ts accepts modelId? followed by pathExtractedInfo?: PathExtractedInfo, and the PathExtractedInfo type defines collectionFileIds, collectionFolderIds, and collectionIds with proper casing.

server/api/chat/utils.ts (1)

720-757: Verify ID prefixing in utils.ts matches downstream search filters
Path‐extracted IDs are built as clf-<item.id>, clfd-<item.id>, cl-<item.id>, then later stripped back to raw UUIDs before calling the Vespa query layer. Confirm whether Vespa expects:

  • Vespa document IDs (original vespaDocId, e.g. clf-12345…), in which case use item.vespaDocId when prefixing in utils.ts, or
  • Raw Postgres UUIDs, in which case drop the clf-/clfd-/cl- prefixes in utils.ts to avoid passing invalid identifiers.

@junaid-shirur junaid-shirur merged commit cc74086 into main Oct 1, 2025
3 of 4 checks passed
@junaid-shirur junaid-shirur deleted the fix/agentPathSupport branch October 1, 2025 10:00
@coderabbitai coderabbitai bot mentioned this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants