Skip to content

Conversation

Himanshvarma
Copy link
Contributor

@Himanshvarma Himanshvarma commented Oct 16, 2025

Description

added chunk highlight feature in regular chat
now when user clicks on a citation that belongs to kb files

  1. it will open the preview of that file
  2. it will go to that specific chunk location and highlight it

Testing

  1. upload files in kb
  2. create an agent that includes one or more kb files (this agent can also have access to non kb files)
  3. chat in this agent mode
  4. test it by clicking on each citation if it belongs to kb it should highlight the relevant chunk if not it will open the url of that doc might be drive or mail or any other file

Summary by CodeRabbit

  • New Features

    • Chunk-level navigation for jumping to specific citation sections.
    • KB-item citations recognized and rendered as links.
    • Richer document preview viewers (PDF, DOCX, XLSX, CSV, TXT, Markdown) with back/close controls and external document operation hooks.
  • Bug Fixes

    • More reliable preview loading, cleanup on close, and highlight synchronization.
    • Refreshed citation numbering and improved tooltip/click behavior.
  • Refactor

    • Simplified and consolidated citation parsing for more consistent UI behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 16, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Removes public chunkIndex from citation types and derives chunk indices locally (including K[...] KB-item parsing), adds KB-item support end-to-end, exposes documentOperationsRef for viewers, adds waitForTextLayer to highlighting flows, and renames/threads isMsgWithKbItems through prompts and chat logic.

Changes

Cohort / File(s) Change Summary
Citation parsing & types
frontend/src/components/CitationLink.tsx, server/api/chat/types.ts
Removed public chunkIndex; CitationLink now parses identifiers from children (index, optional chunk part, spreadsheet-aware adjustments, K[...] KB items) and passes derived chunkIndex to click handlers; server MinimalCitationSchema no longer includes chunkIndex.
Citation preview & viewers
frontend/src/components/CitationPreview.tsx, frontend/src/hooks/useScopedFind.ts, frontend/src/contexts/DocumentOperationsContext.tsx
CitationPreview converted from memoized to stateful component, accepts documentOperationsRef and onDocumentLoaded; viewer selection expanded (PDF/MD/DOCX/XLSX/CSV/TXT fallback); highlightText gains optional waitForTextLayer and is propagated through DocumentOperations provider and useScopedFind, with wait-for-text-layer readiness logic.
Chat UI & routing plumbing
frontend/src/routes/_authenticated/chat.tsx, frontend/src/components/CitationList.tsx, frontend/src/components/Sources.tsx
Added selectedChunkIndex state and chunk navigation handlers; wrapped ChatPage with DocumentOperationsProvider; changed onCitationClick signatures to accept optional chunkIndex and pass documentOperationsRef into CitationPreview; effects to reset chunk state on navigation.
Client citation utilities
frontend/src/utils/chatUtils.tsx
Added textToKbItemCitationIndex for K[...]; tightened textToImageCitationIndex to exclude K[...] via negative lookbehind; cleanCitationsFromResponse and processMessage updated to resolve/format KB-item citations alongside image citations.
Server chat logic & utils
server/api/chat/chat.ts, server/api/chat/utils.ts
Replaced isMsgWithSources with isMsgWithKbItems across public signatures and call sites; threaded flag through build/context/generator flows; added textToKbItemCitationIndex and removed chunkIndex parameter/return in KB-items search/citation paths.
Agent prompts & provider
server/ai/agentPrompts.ts, server/ai/context.ts, server/ai/provider/index.ts
Renamed agentBaselineFileContextPromptJsonagentBaselineKbContextPromptJson, added optional agentPromptData, expanded file/chunk formatting and guidance, and select KB-specific prompt variants when isMsgWithKbItems is enabled.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CitationLink
    participant ChatPage
    participant DocOps as DocumentOperationsProvider
    participant ScopedFind as useScopedFind
    participant Preview as CitationPreview

    User->>CitationLink: click child id ("12" / "12_3" / "K[5_2]")
    CitationLink->>ChatPage: onCitationClick(citation, derivedChunkIndex)
    ChatPage->>ChatPage: set selectedChunkIndex / open preview
    ChatPage->>DocOps: clearHighlights()
    ChatPage->>Preview: open preview (pass documentOperationsRef)
    ChatPage->>DocOps: documentOperationsRef.highlightText(text, chunkIndex, page?, waitForTextLayer=true)
    DocOps->>ScopedFind: highlightText(..., waitForTextLayer)
    alt waitForTextLayer true
        ScopedFind->>ScopedFind: waitForTextLayerReady(container)
    end
    ScopedFind->>ScopedFind: extract text & apply highlights
    ScopedFind-->>DocOps: highlight result (success/failure)
    DocOps-->>Preview: navigate/highlight viewer
    Preview-->>User: preview shown with highlights
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • kalpadhwaryu
  • junaid-shirur
  • rahul1841

Poem

🐰 I hopped through K‑chunks with a cheerful cheer,

Split numbers, found pages, brought chunks near,
I waited on text layers to gently glow,
Highlighted sheets and PDFs in a row,
A rabbit's nibble — citations clear!

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Feat/chat chunk highlight" directly and clearly summarizes the main feature being added: chunk highlighting functionality in the chat interface. The changes across frontend and backend components—including citation parsing, document preview rendering, document operations context, chunk navigation, and KB item citation handling—all work together to implement this core feature. The title is specific (mentioning both "chat" and "chunk highlight") and avoids vague language, allowing a teammate to quickly understand the primary purpose of the changeset without needing to review the full implementation details.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/chatChunkHighlight

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Himanshvarma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant enhancement to the chat experience by enabling direct interaction with Knowledge Base document citations. Users can now click on a citation to not only open the relevant document but also to be automatically directed to and have the specific referenced content chunk highlighted. This feature aims to improve the contextual understanding of AI responses by providing immediate visual access to the source material.

Highlights

  • Chunk Highlighting in Chat: Implemented a new feature that allows users to click on citations from Knowledge Base (KB) files within the chat interface. This action will now open the corresponding document preview and automatically scroll to and highlight the specific content chunk referenced by the citation.
  • Enhanced Citation Handling: The system now parses citations that include both a document index and a chunk index (e.g., '1_2'), enabling precise navigation within documents. This also includes special handling for spreadsheet files to adjust chunk indexing.
  • Updated AI Prompting for KB Items: The AI agent's baseline prompt has been updated to specifically handle Knowledge Base items, introducing a new citation format (K[docId_chunkIndex]) to ensure accurate and traceable references to specific document chunks.
  • Document Viewer Integration: The CitationPreview component has been refactored to integrate with a new useScopedFind hook, allowing it to receive and execute highlight and scroll commands from the chat interface, facilitating the chunk highlighting functionality.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a chunk highlighting feature for citations from knowledge base files. The changes span across the frontend and backend to support a new citation format and the highlighting mechanism in the document previewer. The implementation is solid, introducing a DocumentOperationsContext for better component communication and updating AI prompts to generate the new citation format. I've identified a critical bug in the frontend citation processing logic that could lead to incorrect data being displayed, along with a couple of medium-severity suggestions for improving maintainability and performance.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/chat/chat.ts (1)

481-503: Retain the KB chunk index in SSE citations.

Right now the KB branch strips off the _chunk portion of K[doc_chunk] and never forwards it, so the CitationsUpdate payloads expose only the doc metadata. The UI can no longer know which chunk to scroll/highlight when a user clicks a KB citation (especially from the side-panel list where the raw K[...] token isn’t available). Please carry the chunk index through the generator, e.g. parse both parts and include a kbChunkIndex (or similar) on the emitted citation object, then extend the Citation type accordingly so the frontend can highlight the correct chunk.

-      if(match) {
-        citationIndex = parseInt(match[1], 10)
-      } else if (kbMatch) {
-        citationIndex = parseInt(kbMatch[1].split("_")[0], 10)
-      }
+      if (match) {
+        citationIndex = parseInt(match[1], 10)
+      } else if (kbMatch) {
+        const [docIndex, chunkIndex] = kbMatch[1].split("_")
+        citationIndex = parseInt(docIndex, 10)
+        currentKbChunkIndex = Number.isFinite(Number(chunkIndex))
+          ? parseInt(chunkIndex, 10)
+          : undefined
+      }
@@
-          yield {
-            citation: {
-              index: citationIndex,
-              item: searchToCitation(item as VespaSearchResults),
-            },
-          }
+          const citationPayload: any = {
+            index: citationIndex,
+            item: searchToCitation(item as VespaSearchResults),
+          }
+          if (currentKbChunkIndex !== undefined) {
+            citationPayload.kbChunkIndex = currentKbChunkIndex
+          }
+          yield { citation: citationPayload }

(Ensure currentKbChunkIndex is declared/reset each loop, and update the Citation type so this new field is typed.)

🧹 Nitpick comments (3)
server/ai/provider/index.ts (1)

1263-1292: Clarify logging and intent in specificFiles KB/files branch

  • Log message always says “Using baselineFilesContextPromptJson” even when KB prompt is used. Adjust for clarity.
  • No-agent path uses agentBaselineKbContextPromptJson; confirm this is intentional despite the “agent” name.

Apply for clearer logs:

-    Logger.info("Using baselineFilesContextPromptJson")
+    Logger.info(
+      isMsgWithKbItems
+        ? "Using agentBaselineKbContextPromptJson (KB chunks)"
+        : "Using *Files* baseline context prompt"
+    )
frontend/src/components/CitationPreview.tsx (2)

100-121: Expose waitForTextLayer in highlightText wrapper

Forward the optional flag so PDF/spreadsheet highlighting can wait for text-layer readiness.

-      documentOperationsRef.current.highlightText = async (text: string, chunkIndex: number, pageIndex?: number) => {
+      documentOperationsRef.current.highlightText = async (
+        text: string,
+        chunkIndex: number,
+        pageIndex?: number,
+        waitForTextLayer?: boolean,
+      ) => {
         if (!containerRef.current) {
           return false
         }
         try {
-          const success = await highlightText(text, chunkIndex, pageIndex)
+          const success = await highlightText(text, chunkIndex, pageIndex, waitForTextLayer)
           return success
         } catch (error) {
           console.error("Error calling highlightText:", error)
           return false
         }
       }

347-349: Attach containerRef to the actual scroll container; remove inner refs

useScopedFind’s scrolling works best when the ref points to the scrollable element. Currently, the outer container scrolls while refs are on inner wrappers.

-        {!loading && !error && documentContent && (
-          <div className="h-full overflow-auto">{viewerElement}</div>
-        )}
+        {!loading && !error && documentContent && (
+          <div ref={containerRef} data-container-ref="true" className="h-full overflow-auto">
+            {viewerElement}
+          </div>
+        )}

Remove inner refs from viewer wrappers:

-          <div ref={containerRef} data-container-ref="true" className="h-full">
+          <div className="h-full">
             <PdfViewer 
-          <div ref={containerRef} data-container-ref="true" className="h-full">
+          <div className="h-full">
             <ReadmeViewer
-          <div ref={containerRef} data-container-ref="true" className="h-full">
+          <div className="h-full">
             <DocxViewer
-          return(
-            <div ref={containerRef} data-container-ref="true" className="h-full">
+          return(
+            <div className="h-full">
               <ExcelViewer
-            return(
-              <div ref={containerRef} data-container-ref="true" className="h-full">
+            return(
+              <div className="h-full">
                 <CsvViewer
-              return(
-                <div ref={containerRef} data-container-ref="true" className="h-full">
+              return(
+                <div className="h-full">
                   <TxtViewer

Also applies to: 144-155, 160-168, 172-194, 198-206, 211-219, 223-231

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00d9fac and c6a17fe.

📒 Files selected for processing (12)
  • frontend/src/components/CitationLink.tsx (5 hunks)
  • frontend/src/components/CitationPreview.tsx (3 hunks)
  • frontend/src/contexts/DocumentOperationsContext.tsx (2 hunks)
  • frontend/src/hooks/useScopedFind.ts (2 hunks)
  • frontend/src/routes/_authenticated/chat.tsx (11 hunks)
  • frontend/src/utils/chatUtils.tsx (2 hunks)
  • server/ai/agentPrompts.ts (3 hunks)
  • server/ai/context.ts (4 hunks)
  • server/ai/provider/index.ts (3 hunks)
  • server/api/chat/chat.ts (33 hunks)
  • server/api/chat/types.ts (0 hunks)
  • server/api/chat/utils.ts (1 hunks)
💤 Files with no reviewable changes (1)
  • server/api/chat/types.ts
🧰 Additional context used
🧬 Code graph analysis (8)
frontend/src/components/CitationLink.tsx (1)
server/shared/fileUtils.ts (1)
  • getFileType (10-32)
server/ai/agentPrompts.ts (2)
server/ai/context.ts (1)
  • userContext (1072-1091)
server/ai/provider/index.ts (1)
  • AgentPromptData (128-133)
frontend/src/components/CitationPreview.tsx (7)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
frontend/src/contexts/DocumentOperationsContext.tsx (1)
  • DocumentOperations (4-9)
frontend/src/api.ts (1)
  • api (5-5)
frontend/src/utils/authFetch.ts (1)
  • authFetch (4-27)
frontend/src/hooks/useScopedFind.ts (1)
  • useScopedFind (43-512)
frontend/src/components/PdfViewer.tsx (1)
  • PdfViewer (62-704)
frontend/src/components/ReadmeViewer.tsx (1)
  • ReadmeViewer (18-272)
server/ai/provider/index.ts (2)
server/ai/agentPrompts.ts (2)
  • agentBaselineKbContextPromptJson (862-992)
  • agentBaselineFilesContextPromptJson (752-860)
server/ai/prompts.ts (1)
  • baselineFilesContextPromptJson (751-848)
server/api/chat/utils.ts (1)
frontend/src/utils/chatUtils.tsx (2)
  • textToImageCitationIndex (7-7)
  • textToKbItemCitationIndex (8-8)
frontend/src/utils/chatUtils.tsx (1)
server/api/chat/utils.ts (3)
  • textToImageCitationIndex (559-559)
  • textToKbItemCitationIndex (560-560)
  • textToCitationIndex (558-558)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
  • useDocumentOperations (38-44)
  • DocumentOperationsProvider (18-35)
frontend/src/api.ts (1)
  • api (5-5)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
server/api/chat/chat.ts (2)
server/utils.ts (1)
  • splitGroupedCitationsWithSpaces (267-290)
server/api/chat/utils.ts (4)
  • textToCitationIndex (558-558)
  • textToImageCitationIndex (559-559)
  • textToKbItemCitationIndex (560-560)
  • searchToCitation (453-549)
🔇 Additional comments (1)
server/api/chat/utils.ts (1)

558-561: Regex split for image vs KB citations looks good; verify runtime lookbehind support

The negative lookbehind in textToImageCitationIndex is correct for excluding K[...] and matches frontend. Please confirm the Node/runtime supports lookbehind; else consider an alternate pattern.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
frontend/src/components/CitationPreview.tsx (1)

133-205: Still missing MIME fallback for viewer detection

If a KB title has no extension (common in uploads), extension stays empty and we hit the default branch — users see “Preview not available” even though documentContent.type tells us it’s a PDF/DOCX/etc. This was flagged earlier and is still outstanding. Please derive the extension (and default MIME) from the blob when the title lacks one so previews continue to render.

-    const fileName = citation.title || ""
-    const extension = getFileExtension(fileName)
+    const fileName = citation.title || ""
+    let extension = getFileExtension(fileName)
+    if (!extension && documentContent.type) {
+      extension = getExtensionFromMimeType(documentContent.type)
+    }
 ...
-    const file = new File([documentContent], fileName, {
-      type: documentContent.type || getDefaultMimeType(extension),
-    })
+    const file = new File([documentContent], fileName, {
+      type:
+        documentContent.type ||
+        getDefaultMimeType(extension || getFileExtension(fileName)),
+    })
@@
-  const getDefaultMimeType = (extension: string): string => {
+  const getDefaultMimeType = (extension: string): string => {
     switch (extension) {
       case "pdf":
         return "application/pdf"
       case "docx":
         return "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
       case "doc":
         return "application/msword"
+      case "xlsx":
+        return "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+      case "xls":
+        return "application/vnd.ms-excel"
+      case "csv":
+        return "text/csv"
+      case "tsv":
+        return "text/tab-separated-values"
       case "md":
       case "markdown":
         return "text/markdown"
       case "txt":
         return "text/plain"
       default:
         return "application/octet-stream"
     }
   }
+
+  const getExtensionFromMimeType = (mime: string): string => {
+    const lowerMime = mime.toLowerCase()
+    if (lowerMime.includes("pdf")) return "pdf"
+    if (lowerMime.includes("msword")) return "doc"
+    if (lowerMime.includes("wordprocessingml")) return "docx"
+    if (lowerMime.includes("spreadsheetml")) return "xlsx"
+    if (lowerMime.includes("ms-excel")) return "xls"
+    if (lowerMime.includes("csv")) return "csv"
+    if (lowerMime.endsWith("/tsv") || lowerMime.includes("tab-separated")) return "tsv"
+    if (lowerMime.includes("markdown")) return "md"
+    if (lowerMime.includes("plain")) return "txt"
+    return ""
+  }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 07d9363 and dd81055.

📒 Files selected for processing (2)
  • frontend/src/components/CitationPreview.tsx (3 hunks)
  • frontend/src/routes/_authenticated/chat.tsx (12 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
🧬 Code graph analysis (2)
frontend/src/components/CitationPreview.tsx (4)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
frontend/src/contexts/DocumentOperationsContext.tsx (1)
  • DocumentOperations (4-9)
frontend/src/hooks/useScopedFind.ts (1)
  • useScopedFind (43-512)
frontend/src/components/PdfViewer.tsx (1)
  • PdfViewer (62-704)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
  • useDocumentOperations (38-44)
  • DocumentOperationsProvider (18-35)
frontend/src/api.ts (1)
  • api (5-5)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
frontend/src/components/CitationLink.tsx (1)

37-52: Guard against non-numeric KB citation labels before rewriting children.

When the incoming label follows the KB format (K0_3, per the KB citation spec), parseInt(parts[0]) returns NaN. We still run the block, so children = (citationIndex + 1).toString() renders "NaN" in the UI and drops the original KB identifier. Wrap that rewrite in a numeric check (e.g. if (citationIndex >= 0) { … }) or otherwise preserve parts[0] for the KB case before touching children. This keeps KB badges legible while retaining the spreadsheet chunk correction. Based on learnings

-    if(chunkIndex !== undefined && citation) {
-      children = (citationIndex + 1).toString()
+    if(chunkIndex !== undefined && citation) {
+      if (citationIndex >= 0) {
+        children = (citationIndex + 1).toString()
+      } else if (parts[0]) {
+        children = parts[0]
+      }
       if(getFileType({type: "", name: citation?.title ?? ""}) === FileType.SPREADSHEET) chunkIndex = Math.max(chunkIndex - 1, 0)
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd81055 and 418f776.

📒 Files selected for processing (3)
  • frontend/src/components/CitationLink.tsx (5 hunks)
  • frontend/src/components/CitationPreview.tsx (3 hunks)
  • frontend/src/routes/_authenticated/chat.tsx (12 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
📚 Learning: 2025-10-16T18:43:30.423Z
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.

Applied to files:

  • frontend/src/components/CitationLink.tsx
🧬 Code graph analysis (3)
frontend/src/components/CitationLink.tsx (1)
server/shared/fileUtils.ts (1)
  • getFileType (10-32)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
  • useDocumentOperations (38-44)
  • DocumentOperationsProvider (18-35)
frontend/src/api.ts (1)
  • api (5-5)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
frontend/src/components/CitationPreview.tsx (4)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
frontend/src/contexts/DocumentOperationsContext.tsx (1)
  • DocumentOperations (4-9)
frontend/src/hooks/useScopedFind.ts (1)
  • useScopedFind (43-512)
frontend/src/components/PdfViewer.tsx (1)
  • PdfViewer (62-704)

naSim087
naSim087 previously approved these changes Oct 17, 2025
@junaid-shirur junaid-shirur merged commit c48f04a into main Oct 17, 2025
3 of 4 checks passed
@junaid-shirur junaid-shirur deleted the feat/chatChunkHighlight branch October 17, 2025 11:58
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
frontend/src/components/CitationLink.tsx (1)

49-56: Explain spreadsheet chunkIndex decrement

Good call adjusting for spreadsheet 1‑based labels. Add a brief comment so future readers don’t regress this.

frontend/src/components/CitationPreview.tsx (1)

40-89: Still missing: Cancel stale document fetches when citation changes.

The document loading effect can complete out of order when switching between citations rapidly. The async loadDocument started for citation A can resolve after citation B is selected, causing the preview to display the wrong document.

Apply this diff to add cancellation:

   useEffect(() => {
     if (!citation || !isOpen) {
       setDocumentContent(null)
       setError(null)
       return
     }
+
+    let cancelled = false

     const loadDocument = async () => {
       setLoading(true)
       setError(null)
       try {
         if (
           citation.app === "KnowledgeBase" &&
           citation.itemId &&
           citation.clId
         ) {
           const response =
             await api.cl[citation.clId].files[citation.itemId].content.$get()

           if (!response.ok) {
             throw new Error(`Failed to fetch document: ${response.statusText}`)
           }

           const blob = await response.blob()
+          if (cancelled) return
           setDocumentContent(blob)
         } else if (citation.url) {
           // For external documents, try to fetch directly
           const response = await authFetch(citation.url, {
             method: "GET",
           })

           if (!response.ok) {
             throw new Error(`Failed to fetch document: ${response.statusText}`)
           }

           const blob = await response.blob()
+          if (cancelled) return
           setDocumentContent(blob)
         } else {
           throw new Error("No document source available")
         }
       } catch (err) {
+        if (cancelled) return
         console.error("Error loading document:", err)
         setError(err instanceof Error ? err.message : "Failed to load document")
       } finally {
+        if (cancelled) return
         setLoading(false)
       }
     }

     loadDocument()
+
+    return () => {
+      cancelled = true
+    }
   }, [citation, isOpen])
🧹 Nitpick comments (13)
server/ai/context.ts (1)

40-130: Consider extracting a type alias for the chunk union type.

The type (string | { chunk: string; score: number; index: number })[] is repeated multiple times throughout this function and others. Extracting it to a named type alias (e.g., type ChunkSummary = (string | { chunk: string; score: number; index: number })[]) would improve readability and maintainability.

Apply this diff at the top of the file after imports:

+type ChunkSummary = (string | { chunk: string; score: number; index: number })[]
+
 // Utility function to extract header from chunks and remove headers from each chunk
 const extractHeaderAndDataChunks = (
-  chunks_summary:
-    | (string | { chunk: string; score: number; index: number })[]
-    | undefined,
+  chunks_summary: ChunkSummary | undefined,
   matchfeatures?: any,
 ): {
-  chunks_summary: (string | { chunk: string; score: number; index: number })[]
+  chunks_summary: ChunkSummary
   matchfeatures?: any
 } => {

Then apply similar changes to processSheetQuery and other functions that use this type.

server/api/chat/chat.ts (2)

468-488: Preserve in-text citation order across [n], image, and K[...] matches

The while-condition short-circuits each regex family until exhaustion, then moves to the next. This can reorder emissions vs. their textual positions. Consider selecting the next earliest match among all three regexes per iteration to preserve original order. Keep current behavior as fallback if you prefer simplicity.

Also applies to: 475-480, 500-506, 514-565


488-506: Dedup key collision between [n] and K[n_m]

yieldedCitations uses the raw doc index for both [n] and K[...] paths. If the model outputs both forms for the same doc index, that’s fine; but if different sources share the same numeric index in different contexts, they’ll be coalesced. If this becomes noisy, include the match kind (e.g., "kb" | "std") in the dedup key.

frontend/src/hooks/useScopedFind.ts (3)

171-178: Avoid writing data-match-index twice

createHighlightMarks sets data-match-index="0" and later the caller rewrites it with the real matchIndex. Drop the initial attribute to reduce DOM churn.

-              mark.setAttribute("data-match-index", "0")
+              // attribute set by caller after marks are created
...
-                  mark.setAttribute("data-match-index", "0")
+                  // attribute set by caller after marks are created

Also applies to: 199-206


256-315: Text-layer wait loop is solid; add early exit on non-empty stable text

You already stabilize via three identical lengths; optionally short‑circuit when stableCount hits 1 and waitForTextLayer is false or when text contains the exact chunk substring we’re about to highlight. This trims latency on large pages.


338-371: Return early when pageIndex is undefined in PDF path

You already do; consider surfacing a debug warning (once) so callers know a pageIndex is required for PDFs/spreadsheets.

frontend/src/utils/chatUtils.tsx (2)

39-48: K[…] replacement: parse once, tighten guards

Functionally correct (0‑based doc index). Simplify and harden parsing to avoid multiple split/parse operations and NaN pitfalls.

-  text = text.replace(textToKbItemCitationIndex, (_, citationKey) => {
-    const index = citationMap
-      ? citationMap[parseInt(citationKey.split("_")[0], 10)]
-      : parseInt(citationKey.split("_")[0], 10)
-    const chunkIndex = parseInt(citationKey.split("_")[1], 10)
-    const url = citationUrls[index]
-    return typeof index === "number" && typeof chunkIndex === "number" && url
-      ? `[${index + 1}_${chunkIndex}](${url})`
-      : ""
-  })
+  text = text.replace(textToKbItemCitationIndex, (_, citationKey) => {
+    const [rawDocIdxStr, chunkIdxStr] = citationKey.split("_")
+    const rawDocIdx = Number.parseInt(rawDocIdxStr, 10)
+    const parsedChunkIdx = Number.parseInt(chunkIdxStr, 10)
+    const resolvedIdx =
+      Number.isInteger(rawDocIdx) ? (citationMap?.[rawDocIdx] ?? rawDocIdx) : undefined
+    const url =
+      typeof resolvedIdx === "number" && resolvedIdx >= 0 ? citationUrls[resolvedIdx] : undefined
+    return typeof resolvedIdx === "number" && Number.isInteger(parsedChunkIdx) && url
+      ? `[${resolvedIdx + 1}_${parsedChunkIdx}](${url})`
+      : ""
+  })

Based on learnings


21-61: Fallback without citationMap may misalign KB links

For standard [n], the no‑map fallback uses num‑1 into citationUrls; for K[n_m], the fallback uses the raw doc index. If citationUrls is ordered by encountered citations (not by doc index), K fallbacks could resolve to the wrong URL. If feasible, require citationMap for K[...] or document this invariant.

Also applies to: 39-48

frontend/src/components/CitationLink.tsx (1)

37-40: Harden parsing of children and radix

Guard against non-string children and NaN; always pass radix 10.

-    const parts = typeof children === "string" ? children.split("_") : []
-    const citationIndex = parts.length > 0 ? parseInt(parts[0]) - 1 : -1
-    let chunkIndex = parts.length > 1 ? parseInt(parts[1]) : undefined
+    const textChild = typeof children === "string" ? children.trim() : ""
+    const parts = textChild ? textChild.split("_") : []
+    const citationIndex = parts.length > 0 && /^\d+$/.test(parts[0])
+      ? parseInt(parts[0], 10) - 1
+      : -1
+    let chunkIndex = parts.length > 1 && /^\d+$/.test(parts[1])
+      ? parseInt(parts[1], 10)
+      : undefined
frontend/src/contexts/DocumentOperationsContext.tsx (1)

45-53: Memoize provider value to avoid unnecessary re-renders

Wrap the value object in useMemo with setGoToPageFn dependency to prevent child re-renders on every parent render.

-  return (
-    <DocumentOperationsContext.Provider
-      value={{
-        documentOperationsRef,
-        setGoToPage: setGoToPageFn,
-      }}
-    >
+  const contextValue = React.useMemo(
+    () => ({ documentOperationsRef, setGoToPage: setGoToPageFn }),
+    [setGoToPageFn],
+  )
+  return (
+    <DocumentOperationsContext.Provider value={contextValue}>
       {children}
     </DocumentOperationsContext.Provider>
   )
frontend/src/components/CitationPreview.tsx (2)

98-128: Optional: Remove stable ref from dependency array.

Including documentOperationsRef in the dependency array is unnecessary since refs are stable and don't trigger re-renders. The effect will work correctly with just the function dependencies.

Apply this diff:

-  }, [documentOperationsRef, highlightText, clearHighlights, scrollToMatch])
+  }, [highlightText, clearHighlights, scrollToMatch])

171-296: Consider passing documentOperationsRef to all viewers that support navigation.

Only PDF and Excel viewers receive documentOperationsRef, but if other viewers (ReadmeViewer, DocxViewer, etc.) support operations like goToPage in the future, they won't have access to the ref.

For future extensibility, you may want to pass documentOperationsRef to all viewers, even if they don't currently use it:

       case "md":
       case "markdown":
         return (
           <div ref={containerRef} data-container-ref="true" className="h-full">
             <ReadmeViewer
               key={citation.docId}
               source={file}
               className="h-full"
               style={{ height: "100%", overflow: "auto", padding: "16px" }}
+              documentOperationsRef={documentOperationsRef}
             />
           </div>
         )

Apply similarly to DocxViewer, CsvViewer, and TxtViewer if their interfaces support it.

frontend/src/routes/_authenticated/chat.tsx (1)

1269-1282: Optional: Remove handleChunkIndexChange from effect dependencies.

Including handleChunkIndexChange in the dependency array creates redundant effect runs when selectedCitation changes, since the callback already depends on selectedCitation. The effect will run twice: once when selectedCitation changes and once when handleChunkIndexChange is recreated.

You can safely remove it from the dependencies since the callback is stable for each citation:

   useEffect(() => {
     if (selectedCitation && isDocumentLoaded) {
       handleChunkIndexChange(
         selectedChunkIndex,
         selectedCitation?.itemId ?? "",
         selectedCitation?.docId ?? "",
       )
     }
   }, [
     selectedChunkIndex,
     selectedCitation,
     isDocumentLoaded,
-    handleChunkIndexChange,
   ])

React Hook exhaustive-deps may warn, but this is safe because handleChunkIndexChange is memoized with the same dependencies already in the array.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 418f776 and 17ce7ad.

📒 Files selected for processing (10)
  • frontend/src/components/CitationLink.tsx (6 hunks)
  • frontend/src/components/CitationPreview.tsx (3 hunks)
  • frontend/src/contexts/DocumentOperationsContext.tsx (3 hunks)
  • frontend/src/hooks/useScopedFind.ts (3 hunks)
  • frontend/src/routes/_authenticated/chat.tsx (12 hunks)
  • frontend/src/utils/chatUtils.tsx (2 hunks)
  • server/ai/context.ts (19 hunks)
  • server/ai/provider/index.ts (3 hunks)
  • server/api/chat/chat.ts (36 hunks)
  • server/api/chat/utils.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/ai/provider/index.ts
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
📚 Learning: 2025-10-16T18:43:30.423Z
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.

Applied to files:

  • frontend/src/components/CitationLink.tsx
  • frontend/src/utils/chatUtils.tsx
  • server/api/chat/utils.ts
  • server/api/chat/chat.ts
🧬 Code graph analysis (8)
frontend/src/components/CitationLink.tsx (1)
server/shared/fileUtils.ts (1)
  • getFileType (10-32)
frontend/src/hooks/useScopedFind.ts (1)
frontend/src/contexts/DocumentOperationsContext.tsx (1)
  • useDocumentOperations (58-66)
frontend/src/components/CitationPreview.tsx (6)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
frontend/src/contexts/DocumentOperationsContext.tsx (1)
  • DocumentOperations (10-20)
frontend/src/utils/authFetch.ts (1)
  • authFetch (4-27)
frontend/src/hooks/useScopedFind.ts (1)
  • useScopedFind (43-575)
frontend/src/components/PdfViewer.tsx (1)
  • PdfViewer (62-704)
frontend/src/components/ReadmeViewer.tsx (1)
  • ReadmeViewer (18-272)
frontend/src/utils/chatUtils.tsx (1)
server/api/chat/utils.ts (3)
  • textToImageCitationIndex (557-557)
  • textToKbItemCitationIndex (558-558)
  • textToCitationIndex (556-556)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
  • useDocumentOperations (58-66)
  • DocumentOperationsProvider (29-55)
frontend/src/api.ts (1)
  • api (5-5)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
server/api/chat/utils.ts (3)
frontend/src/components/CitationLink.tsx (1)
  • Citation (11-17)
server/api/chat/types.ts (1)
  • Citation (77-77)
frontend/src/utils/chatUtils.tsx (2)
  • textToImageCitationIndex (7-7)
  • textToKbItemCitationIndex (8-8)
server/api/chat/chat.ts (2)
server/utils.ts (1)
  • splitGroupedCitationsWithSpaces (267-290)
server/api/chat/utils.ts (4)
  • textToCitationIndex (556-556)
  • textToImageCitationIndex (557-557)
  • textToKbItemCitationIndex (558-558)
  • searchToCitation (453-547)
server/ai/context.ts (3)
server/lib/duckdb.ts (1)
  • querySheetChunks (32-251)
server/sheetChunk.ts (1)
  • chunkSheetWithHeaders (408-465)
server/utils.ts (1)
  • getRelativeTime (109-133)
🔇 Additional comments (23)
server/ai/context.ts (2)

909-1034: LGTM! Parameter threading and conditional logic look correct.

The isMsgWithKbItems parameter is properly threaded through to constructCollectionFileContext, and the sheet processing logic correctly handles the type unions for different schema types.


768-792: The concern is already addressed by the existing code logic.

The conditional at line 768 protects against the scenario: if (isMsgWithKbItems && fields.chunks_pos_summary) short-circuits if chunks_pos_summary is undefined. Additionally, line 771 uses optional chaining with a fallback (fields.chunks_pos_summary?.[v.index] ?? v.index), providing defense-in-depth. The code does not require changes.

server/api/chat/utils.ts (1)

453-547: LGTM! Removal of chunkIndex aligns with the PR objective.

The function now returns Citation objects without chunkIndex, correctly delegating chunk index derivation to the frontend via the K[n_m] citation format. This change is consistent across all schema branches.

server/api/chat/chat.ts (1)

482-490: KB citation index parsing looks correct (0‑based doc index)

Parsing K[n_m] with a 0‑based doc index aligns with our KB semantics; feeding that through to results[...] is consistent with startIndex/baseIndex usage. LGTM.

Based on learnings

frontend/src/hooks/useScopedFind.ts (1)

380-386: Cache keys and TTL look good

Cache key uses documentId|chunkIndex and TTL cleanup runs periodically. LGTM.

Also applies to: 388-396, 424-432

frontend/src/components/CitationLink.tsx (1)

146-165: Badge rendering LGTM

Numeric badge UX, dark mode classes, and no-underline behavior look good.

frontend/src/contexts/DocumentOperationsContext.tsx (1)

11-20: API extension for waitForTextLayer is clean

Signature threading and ref exposure are consistent. LGTM.

Also applies to: 77-114

frontend/src/utils/chatUtils.tsx (1)

7-8: Negative lookbehind support is confirmed for this project

Negative lookbehind (?<!) is supported in current Chrome, Firefox, Edge, and Safari releases. The project targets ES2020 with no transpilation layer and no browserslist configuration for legacy browser support, confirming it expects modern browsers where this feature is fully available.

frontend/src/components/CitationPreview.tsx (4)

91-96: LGTM!

The integration of useScopedFind correctly provides highlight and navigation capabilities, passing the container ref and document ID for proper scoping.


130-132: LGTM!

Properly clears highlights when switching between citations to avoid showing stale highlights from the previous document.


134-169: LGTM!

The extension detection prioritizes the blob's MIME type over the filename, which is more reliable. This addresses the past review concern about KB titles lacking extensions, since the MIME type is checked first and covers all major document types.


317-327: LGTM!

The effect correctly notifies the parent component only when the document is fully loaded and the viewer is rendered, with comprehensive guard conditions to prevent premature notifications.

frontend/src/routes/_authenticated/chat.tsx (11)

108-111: LGTM!

Correct import and usage of the DocumentOperationsContext to enable document interaction capabilities throughout the chat route.


232-232: LGTM!

Properly extracts documentOperationsRef from the DocumentOperations context for passing to CitationPreview.


416-420: LGTM!

The new state variables properly track chunk navigation (selectedChunkIndex) and document readiness (isDocumentLoaded) to coordinate highlighting operations.


1189-1267: Well-structured with proper race condition guards.

The callback correctly handles chunk navigation with multiple guards to prevent stale updates when the user switches citations rapidly. The double-check at lines 1230-1232 is especially important for avoiding highlights on the wrong document.

Optional: Consider removing documentOperationsRef from the dependency array since refs are stable:

-  [selectedCitation, toast, documentOperationsRef],
+  [selectedCitation, toast],

1286-1311: LGTM!

The citation click handler properly routes external citations to new windows and KB citations to the preview panel. The chunk index reset pattern (lines 1305-1308) is intentional: resetting to null first ensures the effect at lines 1269-1282 fires even when clicking the same chunk again, which is correct behavior for re-highlighting.


1314-1320: LGTM!

Properly cleans up all chunk navigation state when closing the citation preview, preventing stale highlights or navigation state from persisting.


1322-1337: LGTM!

The document lifecycle callbacks and cleanup effects properly coordinate the document loading and chunk highlighting sequence, ensuring highlights only appear after the document is fully loaded and that state is cleaned up when navigating between chats.


1742-1743: LGTM!

Correctly passes documentOperationsRef and onDocumentLoaded to CitationPreview, enabling the preview component to expose highlight operations and notify when the document is ready.


1817-1831: LGTM!

The updated CitationList signature correctly supports optional chunkIndex parameter, and line 1831 properly passes undefined for chunkIndex and true for fromSources when clicked from the Sources panel, maintaining correct navigation context.


1867-1871: LGTM!

The Sources component signature is updated to support chunk-aware citation clicks, maintaining consistency with the rest of the citation handling flow.


2865-2871: LGTM!

Wrapping the ChatPage in DocumentOperationsProvider at the route level is the correct pattern, enabling document operations context throughout the entire chat interface and its child components.

Comment on lines 556 to +558
export const textToCitationIndex = /\[(\d+)\]/g
export const textToImageCitationIndex = /\[(\d+_\d+)\]/g
export const textToImageCitationIndex = /(?<!K)\[(\d+_\d+)\]/g
export const textToKbItemCitationIndex = /K\[(\d+_\d+)\]/g
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

Verify the regex patterns work correctly for all citation types.

The updated textToImageCitationIndex with negative lookbehind (?<!K) and the new textToKbItemCitationIndex should correctly separate image citations [N_M] from KB citations K[N_M]. However, ensure these patterns don't conflict or miss edge cases.

Run the following script to verify usage and test the patterns:


🏁 Script executed:

#!/bin/bash
# Test the regex patterns with sample citations
cat << 'EOF' > /tmp/test_citations.js
// Test regex patterns
const textToCitationIndex = /\[(\d+)\]/g
const textToImageCitationIndex = /(?<!K)\[(\d+_\d+)\]/g
const textToKbItemCitationIndex = /K\[(\d+_\d+)\]/g

const testCases = [
  { input: "See [1] and [2]", expected: { citation: ["1", "2"], image: [], kb: [] } },
  { input: "Image [0_5] here", expected: { citation: [], image: ["0_5"], kb: [] } },
  { input: "KB item K[1_3]", expected: { citation: [], image: [], kb: ["1_3"] } },
  { input: "Mixed [1] and [2_3] and K[4_5]", expected: { citation: ["1"], image: ["2_3"], kb: ["4_5"] } },
]

testCases.forEach((tc, i) => {
  const citations = [...tc.input.matchAll(textToCitationIndex)].map(m => m[1])
  const images = [...tc.input.matchAll(textToImageCitationIndex)].map(m => m[1])
  const kbs = [...tc.input.matchAll(textToKbItemCitationIndex)].map(m => m[1])
  
  const passed = 
    JSON.stringify(citations) === JSON.stringify(tc.expected.citation) &&
    JSON.stringify(images) === JSON.stringify(tc.expected.image) &&
    JSON.stringify(kbs) === JSON.stringify(tc.expected.kb)
  
  console.log(`Test ${i+1}: ${passed ? "PASS" : "FAIL"}`)
  if (!passed) {
    console.log(`  Expected: ${JSON.stringify(tc.expected)}`)
    console.log(`  Got: { citation: ${JSON.stringify(citations)}, image: ${JSON.stringify(images)}, kb: ${JSON.stringify(kbs)} }`)
  }
})
EOF

node /tmp/test_citations.js

Length of output: 826


I need to verify the regex patterns with a corrected approach. The error suggests an issue with how the negative lookbehind is being parsed. Let me check the actual codebase and test with a safer method:

Node.js version 8.10.0 and later support negative lookbehind by default, so the regex syntax should work in Node v24. Let me verify the actual regex patterns in the codebase:


🏁 Script executed:

#!/bin/bash
# Check the actual regex patterns in the codebase
cat server/api/chat/utils.ts | grep -A 2 "textToCitationIndex\|textToImageCitationIndex\|textToKbItemCitationIndex" | head -10

Length of output: 302


Now let me verify how these patterns are used in the codebase and check for any potential issues:


🏁 Script executed:

#!/bin/bash
# Search for usage of the regex patterns
echo "=== textToImageCitationIndex usage ==="
rg -n "textToImageCitationIndex" server/api/chat/utils.ts -A 3 -B 1

echo -e "\n=== textToKbItemCitationIndex usage ==="
rg -n "textToKbItemCitationIndex" server/api/chat/utils.ts -A 3 -B 1

echo -e "\n=== Check for any test files ==="
fd -e test.ts -e test.js -e spec.ts -e spec.js | head -20

Length of output: 684


Let me check the test file to verify these patterns are properly tested:


🏁 Script executed:

#!/bin/bash
# Check the citations test file
cat server/tests/citations.test.ts

Length of output: 4450


Let me check if these new patterns are actually used elsewhere in the codebase:


🏁 Script executed:

#!/bin/bash
# Search for actual usage of the new citation patterns
echo "=== Searching for textToImageCitationIndex usage (excluding export) ==="
rg "textToImageCitationIndex" --type ts -B 2 -A 2 | grep -v "^export const textToImageCitationIndex"

echo -e "\n=== Searching for textToKbItemCitationIndex usage (excluding export) ==="
rg "textToKbItemCitationIndex" --type ts -B 2 -A 2 | grep -v "^export const textToKbItemCitationIndex"

echo -e "\n=== Check the processMessage function implementation ==="
rg -A 30 "export const processMessage" server/api/chat/utils.ts

Length of output: 5602


Add test coverage for the new image and KB citation regex patterns.

The regex patterns are syntactically correct and actively used in production code (server/api/chat/chat.ts, agents.ts, and frontend). However, textToImageCitationIndex and textToKbItemCitationIndex have no test coverage, unlike textToCitationIndex. Add tests to server/tests/citations.test.ts to verify:

  • textToImageCitationIndex correctly matches [N_M] but excludes K[N_M]
  • textToKbItemCitationIndex correctly matches K[N_M]
  • The patterns don't conflict when used together in the same text

rahul1841 pushed a commit that referenced this pull request Oct 19, 2025
## [3.18.8](v3.18.7...v3.18.8) (2025-10-17)

### Bug Fixes

* chat chunk highlight ([#1122](#1122)) ([c48f04a](c48f04a))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants