-
Notifications
You must be signed in to change notification settings - Fork 56
Feat/chat chunk highlight #1122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughRemoves public Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CitationLink
participant ChatPage
participant DocOps as DocumentOperationsProvider
participant ScopedFind as useScopedFind
participant Preview as CitationPreview
User->>CitationLink: click child id ("12" / "12_3" / "K[5_2]")
CitationLink->>ChatPage: onCitationClick(citation, derivedChunkIndex)
ChatPage->>ChatPage: set selectedChunkIndex / open preview
ChatPage->>DocOps: clearHighlights()
ChatPage->>Preview: open preview (pass documentOperationsRef)
ChatPage->>DocOps: documentOperationsRef.highlightText(text, chunkIndex, page?, waitForTextLayer=true)
DocOps->>ScopedFind: highlightText(..., waitForTextLayer)
alt waitForTextLayer true
ScopedFind->>ScopedFind: waitForTextLayerReady(container)
end
ScopedFind->>ScopedFind: extract text & apply highlights
ScopedFind-->>DocOps: highlight result (success/failure)
DocOps-->>Preview: navigate/highlight viewer
Preview-->>User: preview shown with highlights
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Himanshvarma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the chat experience by enabling direct interaction with Knowledge Base document citations. Users can now click on a citation to not only open the relevant document but also to be automatically directed to and have the specific referenced content chunk highlighted. This feature aims to improve the contextual understanding of AI responses by providing immediate visual access to the source material. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a chunk highlighting feature for citations from knowledge base files. The changes span across the frontend and backend to support a new citation format and the highlighting mechanism in the document previewer. The implementation is solid, introducing a DocumentOperationsContext
for better component communication and updating AI prompts to generate the new citation format. I've identified a critical bug in the frontend citation processing logic that could lead to incorrect data being displayed, along with a couple of medium-severity suggestions for improving maintainability and performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/api/chat/chat.ts (1)
481-503
: Retain the KB chunk index in SSE citations.Right now the KB branch strips off the
_chunk
portion ofK[doc_chunk]
and never forwards it, so theCitationsUpdate
payloads expose only the doc metadata. The UI can no longer know which chunk to scroll/highlight when a user clicks a KB citation (especially from the side-panel list where the rawK[...]
token isn’t available). Please carry the chunk index through the generator, e.g. parse both parts and include akbChunkIndex
(or similar) on the emitted citation object, then extend theCitation
type accordingly so the frontend can highlight the correct chunk.- if(match) { - citationIndex = parseInt(match[1], 10) - } else if (kbMatch) { - citationIndex = parseInt(kbMatch[1].split("_")[0], 10) - } + if (match) { + citationIndex = parseInt(match[1], 10) + } else if (kbMatch) { + const [docIndex, chunkIndex] = kbMatch[1].split("_") + citationIndex = parseInt(docIndex, 10) + currentKbChunkIndex = Number.isFinite(Number(chunkIndex)) + ? parseInt(chunkIndex, 10) + : undefined + } @@ - yield { - citation: { - index: citationIndex, - item: searchToCitation(item as VespaSearchResults), - }, - } + const citationPayload: any = { + index: citationIndex, + item: searchToCitation(item as VespaSearchResults), + } + if (currentKbChunkIndex !== undefined) { + citationPayload.kbChunkIndex = currentKbChunkIndex + } + yield { citation: citationPayload }(Ensure
currentKbChunkIndex
is declared/reset each loop, and update theCitation
type so this new field is typed.)
🧹 Nitpick comments (3)
server/ai/provider/index.ts (1)
1263-1292
: Clarify logging and intent in specificFiles KB/files branch
- Log message always says “Using baselineFilesContextPromptJson” even when KB prompt is used. Adjust for clarity.
- No-agent path uses agentBaselineKbContextPromptJson; confirm this is intentional despite the “agent” name.
Apply for clearer logs:
- Logger.info("Using baselineFilesContextPromptJson") + Logger.info( + isMsgWithKbItems + ? "Using agentBaselineKbContextPromptJson (KB chunks)" + : "Using *Files* baseline context prompt" + )frontend/src/components/CitationPreview.tsx (2)
100-121
: Expose waitForTextLayer in highlightText wrapperForward the optional flag so PDF/spreadsheet highlighting can wait for text-layer readiness.
- documentOperationsRef.current.highlightText = async (text: string, chunkIndex: number, pageIndex?: number) => { + documentOperationsRef.current.highlightText = async ( + text: string, + chunkIndex: number, + pageIndex?: number, + waitForTextLayer?: boolean, + ) => { if (!containerRef.current) { return false } try { - const success = await highlightText(text, chunkIndex, pageIndex) + const success = await highlightText(text, chunkIndex, pageIndex, waitForTextLayer) return success } catch (error) { console.error("Error calling highlightText:", error) return false } }
347-349
: Attach containerRef to the actual scroll container; remove inner refsuseScopedFind’s scrolling works best when the ref points to the scrollable element. Currently, the outer container scrolls while refs are on inner wrappers.
- {!loading && !error && documentContent && ( - <div className="h-full overflow-auto">{viewerElement}</div> - )} + {!loading && !error && documentContent && ( + <div ref={containerRef} data-container-ref="true" className="h-full overflow-auto"> + {viewerElement} + </div> + )}Remove inner refs from viewer wrappers:
- <div ref={containerRef} data-container-ref="true" className="h-full"> + <div className="h-full"> <PdfViewer- <div ref={containerRef} data-container-ref="true" className="h-full"> + <div className="h-full"> <ReadmeViewer- <div ref={containerRef} data-container-ref="true" className="h-full"> + <div className="h-full"> <DocxViewer- return( - <div ref={containerRef} data-container-ref="true" className="h-full"> + return( + <div className="h-full"> <ExcelViewer- return( - <div ref={containerRef} data-container-ref="true" className="h-full"> + return( + <div className="h-full"> <CsvViewer- return( - <div ref={containerRef} data-container-ref="true" className="h-full"> + return( + <div className="h-full"> <TxtViewerAlso applies to: 144-155, 160-168, 172-194, 198-206, 211-219, 223-231
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
frontend/src/components/CitationLink.tsx
(5 hunks)frontend/src/components/CitationPreview.tsx
(3 hunks)frontend/src/contexts/DocumentOperationsContext.tsx
(2 hunks)frontend/src/hooks/useScopedFind.ts
(2 hunks)frontend/src/routes/_authenticated/chat.tsx
(11 hunks)frontend/src/utils/chatUtils.tsx
(2 hunks)server/ai/agentPrompts.ts
(3 hunks)server/ai/context.ts
(4 hunks)server/ai/provider/index.ts
(3 hunks)server/api/chat/chat.ts
(33 hunks)server/api/chat/types.ts
(0 hunks)server/api/chat/utils.ts
(1 hunks)
💤 Files with no reviewable changes (1)
- server/api/chat/types.ts
🧰 Additional context used
🧬 Code graph analysis (8)
frontend/src/components/CitationLink.tsx (1)
server/shared/fileUtils.ts (1)
getFileType
(10-32)
server/ai/agentPrompts.ts (2)
server/ai/context.ts (1)
userContext
(1072-1091)server/ai/provider/index.ts (1)
AgentPromptData
(128-133)
frontend/src/components/CitationPreview.tsx (7)
frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)frontend/src/contexts/DocumentOperationsContext.tsx (1)
DocumentOperations
(4-9)frontend/src/api.ts (1)
api
(5-5)frontend/src/utils/authFetch.ts (1)
authFetch
(4-27)frontend/src/hooks/useScopedFind.ts (1)
useScopedFind
(43-512)frontend/src/components/PdfViewer.tsx (1)
PdfViewer
(62-704)frontend/src/components/ReadmeViewer.tsx (1)
ReadmeViewer
(18-272)
server/ai/provider/index.ts (2)
server/ai/agentPrompts.ts (2)
agentBaselineKbContextPromptJson
(862-992)agentBaselineFilesContextPromptJson
(752-860)server/ai/prompts.ts (1)
baselineFilesContextPromptJson
(751-848)
server/api/chat/utils.ts (1)
frontend/src/utils/chatUtils.tsx (2)
textToImageCitationIndex
(7-7)textToKbItemCitationIndex
(8-8)
frontend/src/utils/chatUtils.tsx (1)
server/api/chat/utils.ts (3)
textToImageCitationIndex
(559-559)textToKbItemCitationIndex
(560-560)textToCitationIndex
(558-558)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
useDocumentOperations
(38-44)DocumentOperationsProvider
(18-35)frontend/src/api.ts (1)
api
(5-5)frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)
server/api/chat/chat.ts (2)
server/utils.ts (1)
splitGroupedCitationsWithSpaces
(267-290)server/api/chat/utils.ts (4)
textToCitationIndex
(558-558)textToImageCitationIndex
(559-559)textToKbItemCitationIndex
(560-560)searchToCitation
(453-549)
🔇 Additional comments (1)
server/api/chat/utils.ts (1)
558-561
: Regex split for image vs KB citations looks good; verify runtime lookbehind supportThe negative lookbehind in textToImageCitationIndex is correct for excluding K[...] and matches frontend. Please confirm the Node/runtime supports lookbehind; else consider an alternate pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
frontend/src/components/CitationPreview.tsx (1)
133-205
: Still missing MIME fallback for viewer detectionIf a KB title has no extension (common in uploads),
extension
stays empty and we hit the default branch — users see “Preview not available” even thoughdocumentContent.type
tells us it’s a PDF/DOCX/etc. This was flagged earlier and is still outstanding. Please derive the extension (and default MIME) from the blob when the title lacks one so previews continue to render.- const fileName = citation.title || "" - const extension = getFileExtension(fileName) + const fileName = citation.title || "" + let extension = getFileExtension(fileName) + if (!extension && documentContent.type) { + extension = getExtensionFromMimeType(documentContent.type) + } ... - const file = new File([documentContent], fileName, { - type: documentContent.type || getDefaultMimeType(extension), - }) + const file = new File([documentContent], fileName, { + type: + documentContent.type || + getDefaultMimeType(extension || getFileExtension(fileName)), + }) @@ - const getDefaultMimeType = (extension: string): string => { + const getDefaultMimeType = (extension: string): string => { switch (extension) { case "pdf": return "application/pdf" case "docx": return "application/vnd.openxmlformats-officedocument.wordprocessingml.document" case "doc": return "application/msword" + case "xlsx": + return "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" + case "xls": + return "application/vnd.ms-excel" + case "csv": + return "text/csv" + case "tsv": + return "text/tab-separated-values" case "md": case "markdown": return "text/markdown" case "txt": return "text/plain" default: return "application/octet-stream" } } + + const getExtensionFromMimeType = (mime: string): string => { + const lowerMime = mime.toLowerCase() + if (lowerMime.includes("pdf")) return "pdf" + if (lowerMime.includes("msword")) return "doc" + if (lowerMime.includes("wordprocessingml")) return "docx" + if (lowerMime.includes("spreadsheetml")) return "xlsx" + if (lowerMime.includes("ms-excel")) return "xls" + if (lowerMime.includes("csv")) return "csv" + if (lowerMime.endsWith("/tsv") || lowerMime.includes("tab-separated")) return "tsv" + if (lowerMime.includes("markdown")) return "md" + if (lowerMime.includes("plain")) return "txt" + return "" + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src/components/CitationPreview.tsx
(3 hunks)frontend/src/routes/_authenticated/chat.tsx
(12 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
🧬 Code graph analysis (2)
frontend/src/components/CitationPreview.tsx (4)
frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)frontend/src/contexts/DocumentOperationsContext.tsx (1)
DocumentOperations
(4-9)frontend/src/hooks/useScopedFind.ts (1)
useScopedFind
(43-512)frontend/src/components/PdfViewer.tsx (1)
PdfViewer
(62-704)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
useDocumentOperations
(38-44)DocumentOperationsProvider
(18-35)frontend/src/api.ts (1)
api
(5-5)frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/src/components/CitationLink.tsx (1)
37-52
: Guard against non-numeric KB citation labels before rewritingchildren
.When the incoming label follows the KB format (
K0_3
, per the KB citation spec),parseInt(parts[0])
returnsNaN
. We still run the block, sochildren = (citationIndex + 1).toString()
renders"NaN"
in the UI and drops the original KB identifier. Wrap that rewrite in a numeric check (e.g.if (citationIndex >= 0) { … }
) or otherwise preserveparts[0]
for the KB case before touchingchildren
. This keeps KB badges legible while retaining the spreadsheet chunk correction. Based on learnings- if(chunkIndex !== undefined && citation) { - children = (citationIndex + 1).toString() + if(chunkIndex !== undefined && citation) { + if (citationIndex >= 0) { + children = (citationIndex + 1).toString() + } else if (parts[0]) { + children = parts[0] + } if(getFileType({type: "", name: citation?.title ?? ""}) === FileType.SPREADSHEET) chunkIndex = Math.max(chunkIndex - 1, 0) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
frontend/src/components/CitationLink.tsx
(5 hunks)frontend/src/components/CitationPreview.tsx
(3 hunks)frontend/src/routes/_authenticated/chat.tsx
(12 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
📚 Learning: 2025-10-16T18:43:30.423Z
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
Applied to files:
frontend/src/components/CitationLink.tsx
🧬 Code graph analysis (3)
frontend/src/components/CitationLink.tsx (1)
server/shared/fileUtils.ts (1)
getFileType
(10-32)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
useDocumentOperations
(38-44)DocumentOperationsProvider
(18-35)frontend/src/api.ts (1)
api
(5-5)frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)
frontend/src/components/CitationPreview.tsx (4)
frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)frontend/src/contexts/DocumentOperationsContext.tsx (1)
DocumentOperations
(4-9)frontend/src/hooks/useScopedFind.ts (1)
useScopedFind
(43-512)frontend/src/components/PdfViewer.tsx (1)
PdfViewer
(62-704)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
frontend/src/components/CitationLink.tsx (1)
49-56
: Explain spreadsheet chunkIndex decrementGood call adjusting for spreadsheet 1‑based labels. Add a brief comment so future readers don’t regress this.
frontend/src/components/CitationPreview.tsx (1)
40-89
: Still missing: Cancel stale document fetches when citation changes.The document loading effect can complete out of order when switching between citations rapidly. The async
loadDocument
started for citation A can resolve after citation B is selected, causing the preview to display the wrong document.Apply this diff to add cancellation:
useEffect(() => { if (!citation || !isOpen) { setDocumentContent(null) setError(null) return } + + let cancelled = false const loadDocument = async () => { setLoading(true) setError(null) try { if ( citation.app === "KnowledgeBase" && citation.itemId && citation.clId ) { const response = await api.cl[citation.clId].files[citation.itemId].content.$get() if (!response.ok) { throw new Error(`Failed to fetch document: ${response.statusText}`) } const blob = await response.blob() + if (cancelled) return setDocumentContent(blob) } else if (citation.url) { // For external documents, try to fetch directly const response = await authFetch(citation.url, { method: "GET", }) if (!response.ok) { throw new Error(`Failed to fetch document: ${response.statusText}`) } const blob = await response.blob() + if (cancelled) return setDocumentContent(blob) } else { throw new Error("No document source available") } } catch (err) { + if (cancelled) return console.error("Error loading document:", err) setError(err instanceof Error ? err.message : "Failed to load document") } finally { + if (cancelled) return setLoading(false) } } loadDocument() + + return () => { + cancelled = true + } }, [citation, isOpen])
🧹 Nitpick comments (13)
server/ai/context.ts (1)
40-130
: Consider extracting a type alias for the chunk union type.The type
(string | { chunk: string; score: number; index: number })[]
is repeated multiple times throughout this function and others. Extracting it to a named type alias (e.g.,type ChunkSummary = (string | { chunk: string; score: number; index: number })[]
) would improve readability and maintainability.Apply this diff at the top of the file after imports:
+type ChunkSummary = (string | { chunk: string; score: number; index: number })[] + // Utility function to extract header from chunks and remove headers from each chunk const extractHeaderAndDataChunks = ( - chunks_summary: - | (string | { chunk: string; score: number; index: number })[] - | undefined, + chunks_summary: ChunkSummary | undefined, matchfeatures?: any, ): { - chunks_summary: (string | { chunk: string; score: number; index: number })[] + chunks_summary: ChunkSummary matchfeatures?: any } => {Then apply similar changes to
processSheetQuery
and other functions that use this type.server/api/chat/chat.ts (2)
468-488
: Preserve in-text citation order across [n], image, and K[...] matchesThe while-condition short-circuits each regex family until exhaustion, then moves to the next. This can reorder emissions vs. their textual positions. Consider selecting the next earliest match among all three regexes per iteration to preserve original order. Keep current behavior as fallback if you prefer simplicity.
Also applies to: 475-480, 500-506, 514-565
488-506
: Dedup key collision between [n] and K[n_m]yieldedCitations uses the raw doc index for both [n] and K[...] paths. If the model outputs both forms for the same doc index, that’s fine; but if different sources share the same numeric index in different contexts, they’ll be coalesced. If this becomes noisy, include the match kind (e.g., "kb" | "std") in the dedup key.
frontend/src/hooks/useScopedFind.ts (3)
171-178
: Avoid writing data-match-index twicecreateHighlightMarks sets data-match-index="0" and later the caller rewrites it with the real matchIndex. Drop the initial attribute to reduce DOM churn.
- mark.setAttribute("data-match-index", "0") + // attribute set by caller after marks are created ... - mark.setAttribute("data-match-index", "0") + // attribute set by caller after marks are createdAlso applies to: 199-206
256-315
: Text-layer wait loop is solid; add early exit on non-empty stable textYou already stabilize via three identical lengths; optionally short‑circuit when stableCount hits 1 and waitForTextLayer is false or when text contains the exact chunk substring we’re about to highlight. This trims latency on large pages.
338-371
: Return early when pageIndex is undefined in PDF pathYou already do; consider surfacing a debug warning (once) so callers know a pageIndex is required for PDFs/spreadsheets.
frontend/src/utils/chatUtils.tsx (2)
39-48
: K[…] replacement: parse once, tighten guardsFunctionally correct (0‑based doc index). Simplify and harden parsing to avoid multiple split/parse operations and NaN pitfalls.
- text = text.replace(textToKbItemCitationIndex, (_, citationKey) => { - const index = citationMap - ? citationMap[parseInt(citationKey.split("_")[0], 10)] - : parseInt(citationKey.split("_")[0], 10) - const chunkIndex = parseInt(citationKey.split("_")[1], 10) - const url = citationUrls[index] - return typeof index === "number" && typeof chunkIndex === "number" && url - ? `[${index + 1}_${chunkIndex}](${url})` - : "" - }) + text = text.replace(textToKbItemCitationIndex, (_, citationKey) => { + const [rawDocIdxStr, chunkIdxStr] = citationKey.split("_") + const rawDocIdx = Number.parseInt(rawDocIdxStr, 10) + const parsedChunkIdx = Number.parseInt(chunkIdxStr, 10) + const resolvedIdx = + Number.isInteger(rawDocIdx) ? (citationMap?.[rawDocIdx] ?? rawDocIdx) : undefined + const url = + typeof resolvedIdx === "number" && resolvedIdx >= 0 ? citationUrls[resolvedIdx] : undefined + return typeof resolvedIdx === "number" && Number.isInteger(parsedChunkIdx) && url + ? `[${resolvedIdx + 1}_${parsedChunkIdx}](${url})` + : "" + })Based on learnings
21-61
: Fallback without citationMap may misalign KB linksFor standard [n], the no‑map fallback uses num‑1 into citationUrls; for K[n_m], the fallback uses the raw doc index. If citationUrls is ordered by encountered citations (not by doc index), K fallbacks could resolve to the wrong URL. If feasible, require citationMap for K[...] or document this invariant.
Also applies to: 39-48
frontend/src/components/CitationLink.tsx (1)
37-40
: Harden parsing of children and radixGuard against non-string children and NaN; always pass radix 10.
- const parts = typeof children === "string" ? children.split("_") : [] - const citationIndex = parts.length > 0 ? parseInt(parts[0]) - 1 : -1 - let chunkIndex = parts.length > 1 ? parseInt(parts[1]) : undefined + const textChild = typeof children === "string" ? children.trim() : "" + const parts = textChild ? textChild.split("_") : [] + const citationIndex = parts.length > 0 && /^\d+$/.test(parts[0]) + ? parseInt(parts[0], 10) - 1 + : -1 + let chunkIndex = parts.length > 1 && /^\d+$/.test(parts[1]) + ? parseInt(parts[1], 10) + : undefinedfrontend/src/contexts/DocumentOperationsContext.tsx (1)
45-53
: Memoize provider value to avoid unnecessary re-rendersWrap the value object in useMemo with setGoToPageFn dependency to prevent child re-renders on every parent render.
- return ( - <DocumentOperationsContext.Provider - value={{ - documentOperationsRef, - setGoToPage: setGoToPageFn, - }} - > + const contextValue = React.useMemo( + () => ({ documentOperationsRef, setGoToPage: setGoToPageFn }), + [setGoToPageFn], + ) + return ( + <DocumentOperationsContext.Provider value={contextValue}> {children} </DocumentOperationsContext.Provider> )frontend/src/components/CitationPreview.tsx (2)
98-128
: Optional: Remove stable ref from dependency array.Including
documentOperationsRef
in the dependency array is unnecessary since refs are stable and don't trigger re-renders. The effect will work correctly with just the function dependencies.Apply this diff:
- }, [documentOperationsRef, highlightText, clearHighlights, scrollToMatch]) + }, [highlightText, clearHighlights, scrollToMatch])
171-296
: Consider passing documentOperationsRef to all viewers that support navigation.Only PDF and Excel viewers receive
documentOperationsRef
, but if other viewers (ReadmeViewer, DocxViewer, etc.) support operations likegoToPage
in the future, they won't have access to the ref.For future extensibility, you may want to pass
documentOperationsRef
to all viewers, even if they don't currently use it:case "md": case "markdown": return ( <div ref={containerRef} data-container-ref="true" className="h-full"> <ReadmeViewer key={citation.docId} source={file} className="h-full" style={{ height: "100%", overflow: "auto", padding: "16px" }} + documentOperationsRef={documentOperationsRef} /> </div> )
Apply similarly to DocxViewer, CsvViewer, and TxtViewer if their interfaces support it.
frontend/src/routes/_authenticated/chat.tsx (1)
1269-1282
: Optional: Remove handleChunkIndexChange from effect dependencies.Including
handleChunkIndexChange
in the dependency array creates redundant effect runs whenselectedCitation
changes, since the callback already depends onselectedCitation
. The effect will run twice: once whenselectedCitation
changes and once whenhandleChunkIndexChange
is recreated.You can safely remove it from the dependencies since the callback is stable for each citation:
useEffect(() => { if (selectedCitation && isDocumentLoaded) { handleChunkIndexChange( selectedChunkIndex, selectedCitation?.itemId ?? "", selectedCitation?.docId ?? "", ) } }, [ selectedChunkIndex, selectedCitation, isDocumentLoaded, - handleChunkIndexChange, ])
React Hook exhaustive-deps may warn, but this is safe because
handleChunkIndexChange
is memoized with the same dependencies already in the array.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
frontend/src/components/CitationLink.tsx
(6 hunks)frontend/src/components/CitationPreview.tsx
(3 hunks)frontend/src/contexts/DocumentOperationsContext.tsx
(3 hunks)frontend/src/hooks/useScopedFind.ts
(3 hunks)frontend/src/routes/_authenticated/chat.tsx
(12 hunks)frontend/src/utils/chatUtils.tsx
(2 hunks)server/ai/context.ts
(19 hunks)server/ai/provider/index.ts
(3 hunks)server/api/chat/chat.ts
(36 hunks)server/api/chat/utils.ts
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/ai/provider/index.ts
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
📚 Learning: 2025-10-16T18:43:30.423Z
Learnt from: Himanshvarma
PR: xynehq/xyne#1122
File: frontend/src/utils/chatUtils.tsx:40-47
Timestamp: 2025-10-16T18:43:30.423Z
Learning: KB item citations (K[n_m] format) use 0-based indexing for the document index in frontend/src/utils/chatUtils.tsx, unlike regular citations which use 1-based indexing. When processing textToKbItemCitationIndex replacements, the raw parsed index should be used directly with citationUrls without subtracting 1.
Applied to files:
frontend/src/components/CitationLink.tsx
frontend/src/utils/chatUtils.tsx
server/api/chat/utils.ts
server/api/chat/chat.ts
🧬 Code graph analysis (8)
frontend/src/components/CitationLink.tsx (1)
server/shared/fileUtils.ts (1)
getFileType
(10-32)
frontend/src/hooks/useScopedFind.ts (1)
frontend/src/contexts/DocumentOperationsContext.tsx (1)
useDocumentOperations
(58-66)
frontend/src/components/CitationPreview.tsx (6)
frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)frontend/src/contexts/DocumentOperationsContext.tsx (1)
DocumentOperations
(10-20)frontend/src/utils/authFetch.ts (1)
authFetch
(4-27)frontend/src/hooks/useScopedFind.ts (1)
useScopedFind
(43-575)frontend/src/components/PdfViewer.tsx (1)
PdfViewer
(62-704)frontend/src/components/ReadmeViewer.tsx (1)
ReadmeViewer
(18-272)
frontend/src/utils/chatUtils.tsx (1)
server/api/chat/utils.ts (3)
textToImageCitationIndex
(557-557)textToKbItemCitationIndex
(558-558)textToCitationIndex
(556-556)
frontend/src/routes/_authenticated/chat.tsx (3)
frontend/src/contexts/DocumentOperationsContext.tsx (2)
useDocumentOperations
(58-66)DocumentOperationsProvider
(29-55)frontend/src/api.ts (1)
api
(5-5)frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)
server/api/chat/utils.ts (3)
frontend/src/components/CitationLink.tsx (1)
Citation
(11-17)server/api/chat/types.ts (1)
Citation
(77-77)frontend/src/utils/chatUtils.tsx (2)
textToImageCitationIndex
(7-7)textToKbItemCitationIndex
(8-8)
server/api/chat/chat.ts (2)
server/utils.ts (1)
splitGroupedCitationsWithSpaces
(267-290)server/api/chat/utils.ts (4)
textToCitationIndex
(556-556)textToImageCitationIndex
(557-557)textToKbItemCitationIndex
(558-558)searchToCitation
(453-547)
server/ai/context.ts (3)
server/lib/duckdb.ts (1)
querySheetChunks
(32-251)server/sheetChunk.ts (1)
chunkSheetWithHeaders
(408-465)server/utils.ts (1)
getRelativeTime
(109-133)
🔇 Additional comments (23)
server/ai/context.ts (2)
909-1034
: LGTM! Parameter threading and conditional logic look correct.The
isMsgWithKbItems
parameter is properly threaded through toconstructCollectionFileContext
, and the sheet processing logic correctly handles the type unions for different schema types.
768-792
: The concern is already addressed by the existing code logic.The conditional at line 768 protects against the scenario:
if (isMsgWithKbItems && fields.chunks_pos_summary)
short-circuits ifchunks_pos_summary
is undefined. Additionally, line 771 uses optional chaining with a fallback (fields.chunks_pos_summary?.[v.index] ?? v.index
), providing defense-in-depth. The code does not require changes.server/api/chat/utils.ts (1)
453-547
: LGTM! Removal ofchunkIndex
aligns with the PR objective.The function now returns
Citation
objects withoutchunkIndex
, correctly delegating chunk index derivation to the frontend via the K[n_m] citation format. This change is consistent across all schema branches.server/api/chat/chat.ts (1)
482-490
: KB citation index parsing looks correct (0‑based doc index)Parsing K[n_m] with a 0‑based doc index aligns with our KB semantics; feeding that through to results[...] is consistent with startIndex/baseIndex usage. LGTM.
Based on learnings
frontend/src/hooks/useScopedFind.ts (1)
380-386
: Cache keys and TTL look goodCache key uses documentId|chunkIndex and TTL cleanup runs periodically. LGTM.
Also applies to: 388-396, 424-432
frontend/src/components/CitationLink.tsx (1)
146-165
: Badge rendering LGTMNumeric badge UX, dark mode classes, and no-underline behavior look good.
frontend/src/contexts/DocumentOperationsContext.tsx (1)
11-20
: API extension for waitForTextLayer is cleanSignature threading and ref exposure are consistent. LGTM.
Also applies to: 77-114
frontend/src/utils/chatUtils.tsx (1)
7-8
: Negative lookbehind support is confirmed for this projectNegative lookbehind (?<!) is supported in current Chrome, Firefox, Edge, and Safari releases. The project targets ES2020 with no transpilation layer and no browserslist configuration for legacy browser support, confirming it expects modern browsers where this feature is fully available.
frontend/src/components/CitationPreview.tsx (4)
91-96
: LGTM!The integration of
useScopedFind
correctly provides highlight and navigation capabilities, passing the container ref and document ID for proper scoping.
130-132
: LGTM!Properly clears highlights when switching between citations to avoid showing stale highlights from the previous document.
134-169
: LGTM!The extension detection prioritizes the blob's MIME type over the filename, which is more reliable. This addresses the past review concern about KB titles lacking extensions, since the MIME type is checked first and covers all major document types.
317-327
: LGTM!The effect correctly notifies the parent component only when the document is fully loaded and the viewer is rendered, with comprehensive guard conditions to prevent premature notifications.
frontend/src/routes/_authenticated/chat.tsx (11)
108-111
: LGTM!Correct import and usage of the DocumentOperationsContext to enable document interaction capabilities throughout the chat route.
232-232
: LGTM!Properly extracts
documentOperationsRef
from the DocumentOperations context for passing to CitationPreview.
416-420
: LGTM!The new state variables properly track chunk navigation (
selectedChunkIndex
) and document readiness (isDocumentLoaded
) to coordinate highlighting operations.
1189-1267
: Well-structured with proper race condition guards.The callback correctly handles chunk navigation with multiple guards to prevent stale updates when the user switches citations rapidly. The double-check at lines 1230-1232 is especially important for avoiding highlights on the wrong document.
Optional: Consider removing
documentOperationsRef
from the dependency array since refs are stable:- [selectedCitation, toast, documentOperationsRef], + [selectedCitation, toast],
1286-1311
: LGTM!The citation click handler properly routes external citations to new windows and KB citations to the preview panel. The chunk index reset pattern (lines 1305-1308) is intentional: resetting to
null
first ensures the effect at lines 1269-1282 fires even when clicking the same chunk again, which is correct behavior for re-highlighting.
1314-1320
: LGTM!Properly cleans up all chunk navigation state when closing the citation preview, preventing stale highlights or navigation state from persisting.
1322-1337
: LGTM!The document lifecycle callbacks and cleanup effects properly coordinate the document loading and chunk highlighting sequence, ensuring highlights only appear after the document is fully loaded and that state is cleaned up when navigating between chats.
1742-1743
: LGTM!Correctly passes
documentOperationsRef
andonDocumentLoaded
to CitationPreview, enabling the preview component to expose highlight operations and notify when the document is ready.
1817-1831
: LGTM!The updated
CitationList
signature correctly supports optionalchunkIndex
parameter, and line 1831 properly passesundefined
for chunkIndex andtrue
for fromSources when clicked from the Sources panel, maintaining correct navigation context.
1867-1871
: LGTM!The
Sources
component signature is updated to support chunk-aware citation clicks, maintaining consistency with the rest of the citation handling flow.
2865-2871
: LGTM!Wrapping the ChatPage in
DocumentOperationsProvider
at the route level is the correct pattern, enabling document operations context throughout the entire chat interface and its child components.
export const textToCitationIndex = /\[(\d+)\]/g | ||
export const textToImageCitationIndex = /\[(\d+_\d+)\]/g | ||
export const textToImageCitationIndex = /(?<!K)\[(\d+_\d+)\]/g | ||
export const textToKbItemCitationIndex = /K\[(\d+_\d+)\]/g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
🧩 Analysis chain
Verify the regex patterns work correctly for all citation types.
The updated textToImageCitationIndex
with negative lookbehind (?<!K)
and the new textToKbItemCitationIndex
should correctly separate image citations [N_M]
from KB citations K[N_M]
. However, ensure these patterns don't conflict or miss edge cases.
Run the following script to verify usage and test the patterns:
🏁 Script executed:
#!/bin/bash
# Test the regex patterns with sample citations
cat << 'EOF' > /tmp/test_citations.js
// Test regex patterns
const textToCitationIndex = /\[(\d+)\]/g
const textToImageCitationIndex = /(?<!K)\[(\d+_\d+)\]/g
const textToKbItemCitationIndex = /K\[(\d+_\d+)\]/g
const testCases = [
{ input: "See [1] and [2]", expected: { citation: ["1", "2"], image: [], kb: [] } },
{ input: "Image [0_5] here", expected: { citation: [], image: ["0_5"], kb: [] } },
{ input: "KB item K[1_3]", expected: { citation: [], image: [], kb: ["1_3"] } },
{ input: "Mixed [1] and [2_3] and K[4_5]", expected: { citation: ["1"], image: ["2_3"], kb: ["4_5"] } },
]
testCases.forEach((tc, i) => {
const citations = [...tc.input.matchAll(textToCitationIndex)].map(m => m[1])
const images = [...tc.input.matchAll(textToImageCitationIndex)].map(m => m[1])
const kbs = [...tc.input.matchAll(textToKbItemCitationIndex)].map(m => m[1])
const passed =
JSON.stringify(citations) === JSON.stringify(tc.expected.citation) &&
JSON.stringify(images) === JSON.stringify(tc.expected.image) &&
JSON.stringify(kbs) === JSON.stringify(tc.expected.kb)
console.log(`Test ${i+1}: ${passed ? "PASS" : "FAIL"}`)
if (!passed) {
console.log(` Expected: ${JSON.stringify(tc.expected)}`)
console.log(` Got: { citation: ${JSON.stringify(citations)}, image: ${JSON.stringify(images)}, kb: ${JSON.stringify(kbs)} }`)
}
})
EOF
node /tmp/test_citations.js
Length of output: 826
I need to verify the regex patterns with a corrected approach. The error suggests an issue with how the negative lookbehind is being parsed. Let me check the actual codebase and test with a safer method:
Node.js version 8.10.0 and later support negative lookbehind by default, so the regex syntax should work in Node v24. Let me verify the actual regex patterns in the codebase:
🏁 Script executed:
#!/bin/bash
# Check the actual regex patterns in the codebase
cat server/api/chat/utils.ts | grep -A 2 "textToCitationIndex\|textToImageCitationIndex\|textToKbItemCitationIndex" | head -10
Length of output: 302
Now let me verify how these patterns are used in the codebase and check for any potential issues:
🏁 Script executed:
#!/bin/bash
# Search for usage of the regex patterns
echo "=== textToImageCitationIndex usage ==="
rg -n "textToImageCitationIndex" server/api/chat/utils.ts -A 3 -B 1
echo -e "\n=== textToKbItemCitationIndex usage ==="
rg -n "textToKbItemCitationIndex" server/api/chat/utils.ts -A 3 -B 1
echo -e "\n=== Check for any test files ==="
fd -e test.ts -e test.js -e spec.ts -e spec.js | head -20
Length of output: 684
Let me check the test file to verify these patterns are properly tested:
🏁 Script executed:
#!/bin/bash
# Check the citations test file
cat server/tests/citations.test.ts
Length of output: 4450
Let me check if these new patterns are actually used elsewhere in the codebase:
🏁 Script executed:
#!/bin/bash
# Search for actual usage of the new citation patterns
echo "=== Searching for textToImageCitationIndex usage (excluding export) ==="
rg "textToImageCitationIndex" --type ts -B 2 -A 2 | grep -v "^export const textToImageCitationIndex"
echo -e "\n=== Searching for textToKbItemCitationIndex usage (excluding export) ==="
rg "textToKbItemCitationIndex" --type ts -B 2 -A 2 | grep -v "^export const textToKbItemCitationIndex"
echo -e "\n=== Check the processMessage function implementation ==="
rg -A 30 "export const processMessage" server/api/chat/utils.ts
Length of output: 5602
Add test coverage for the new image and KB citation regex patterns.
The regex patterns are syntactically correct and actively used in production code (server/api/chat/chat.ts, agents.ts, and frontend). However, textToImageCitationIndex
and textToKbItemCitationIndex
have no test coverage, unlike textToCitationIndex
. Add tests to server/tests/citations.test.ts
to verify:
textToImageCitationIndex
correctly matches[N_M]
but excludesK[N_M]
textToKbItemCitationIndex
correctly matchesK[N_M]
- The patterns don't conflict when used together in the same text
## [3.18.8](v3.18.7...v3.18.8) (2025-10-17) ### Bug Fixes * chat chunk highlight ([#1122](#1122)) ([c48f04a](c48f04a))
Description
added chunk highlight feature in regular chat
now when user clicks on a citation that belongs to kb files
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Refactor