-
Notifications
You must be signed in to change notification settings - Fork 56
added multi file attachment support #774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughSeparates image vs non-image attachments end-to-end, centralizes file-type enums/mappings and getFileType/getFileIcon utilities, routes non-image files into a FileProcessor/Vespa ingestion path while images are stored with thumbnails, centralizes provider image-name regex, and adds describeImages toggles in chunkers and processor. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant FE as Frontend (ChatBox)
participant API as /api/files
participant FS as File System
participant FP as FileProcessorService
participant V as Vespa
U->>FE: Attach or paste file
FE->>FE: getFileType(file) → getFileIcon
FE-->>U: show preview (icon + type)
alt Image
FE->>API: upload image
API->>FS: save image + thumbnail
API-->>FE: response att_<uuid> (isImage:true)
else Non-image
FE->>API: upload non-image
API->>FP: processFile(buffer,..., extractImages?, describeImages?)
FP->>V: insert document (chunks, metadata)
V-->>API: ok
API-->>FE: response att_<uuid> (isImage:false)
end
sequenceDiagram
autonumber
participant Client as Client
participant ChatSvc as server/api/chat
participant Agents as server/api/chat/agents
participant Provider as server/ai/provider/vertex_ai
participant V as Vespa
Client->>ChatSvc: message + attachments
ChatSvc->>ChatSvc: split ImageAttachmentFileIds / NonImageAttachmentFileIds
ChatSvc->>V: search using fileIds (+ non-image IDs)
ChatSvc->>Agents: build planningContext (finalImageFileNames)
Agents->>Provider: request with imageFileNames
Provider->>Provider: parse token via regex (docIndex_docId_imageNumber), validate docId/path, load imageNumber
Provider-->>Agents: image data (media_type)
Agents-->>ChatSvc: synthesized answer
ChatSvc-->>Client: streamed reply
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Himanshvarma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a robust multi-file attachment system, enabling users to upload and share a wider array of document types. The changes span both the frontend, for an improved user experience with diverse file previews, and the backend, where a new processing pipeline ensures that non-image files are intelligently parsed and integrated into the system's RAG capabilities. This enhancement broadens the scope of information that can be leveraged by the AI, moving beyond simple image recognition to comprehensive document understanding.
Highlights
- Expanded File Attachment Support: The system now supports attaching multiple file types beyond just images, including documents (DOCX, DOC), spreadsheets (XLSX, XLS, CSV), presentations (PPTX, PPT), PDFs, and plain text files (TXT, MD). This significantly enhances the types of content users can share and process.
- Frontend UI/UX Enhancements: The user interface for attachments has been updated to better display multiple files. This includes visual adjustments in the attachment gallery and chatbox to show appropriate icons and file type labels for various document types. The file input now explicitly lists supported file types.
- Backend File Processing Integration: Non-image attachments are now processed by a dedicated
FileProcessorService
on the backend. This service extracts text and other relevant data from documents, spreadsheets, and presentations, and ingests this content into Vespa, integrating them into the system's knowledge base for RAG (Retrieval-Augmented Generation). - Image and Non-Image Attachment Separation: Backend logic has been refined to differentiate between image and non-image attachments. Image files continue to be stored on disk with thumbnails, while non-image files are processed for their content and stored in Vespa, allowing for more intelligent handling based on file type.
- Improved Image Handling in LLM Context: The system now passes image attachment IDs more explicitly to the LLM providers (like Vertex AI) for visual understanding, ensuring that attached images are correctly incorporated into the model's context for multimodal reasoning.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for attaching multiple file types, not just images. The frontend changes include UI updates for displaying various file attachments and refactoring file utility functions. The backend is updated to process and ingest non-image files into Vespa for RAG, while images are stored on disk as before. The changes are generally well-implemented, but I've identified a critical regression where the ability to download non-image attachments was removed. There is also a code duplication issue that should be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (6)
server/api/chat/utils.ts (1)
321-346
: Fix docId parsing and support shorthand image numbers in "Image File Names" blockCurrent docId extraction uses the entire block and lastIndexOf("_"), which breaks when multiple filenames are listed. It also fails to reconstruct entries like "1 2 3" (image numbers) back into docIndex_docId_imageNumber. Parse docId from the first token, and prefix numeric-only tokens with that docId.
Apply this diff:
- const docId = imageContent.substring(0, imageContent.lastIndexOf("_")) + // Derive docId from the first non-empty token (supports docIds with underscores) + const tokens = imageContent + .split(/\s+/) + .map((t) => t.trim()) + .filter((t) => t.length > 0) + if (tokens.length === 0) { + console.warn( + "No tokens found in image content for image content extraction.", + ) + continue + } + const first = tokens[0] + const us = first.lastIndexOf("_") + if (us === -1) { + console.warn( + `No underscore found in first token; cannot derive docId from "${first}".`, + ) + continue + } + const docId = first.substring(0, us) @@ - const fileNames = imageContent - .split("\n") - .map((name) => name.trim()) - .filter((name) => name.length > 0) - // Additional safety: split by spaces and filter out empty strings - // in case multiple filenames are on the same line - .flatMap((name) => - name.split(/\s+/).filter((part) => part.length > 0), - ) - .map((name) => `${docIndex}_${name}`) + const fileNames = tokens + .map((name) => { + // If token already has <docId>_<n>, keep it; else if numeric, attach docId + if (name.includes("_")) return `${docIndex}_${name}` + if (/^\d+$/.test(name)) return `${docIndex}_${docId}_${name}` + // Unrecognized token — skip + return null + }) + .filter((v): v is string => v !== null)server/api/chat/agents.ts (2)
420-424
: Image citation index collision for attachments.You prepend attachments to finalImageFileNames using indices starting at 0. checkAndYieldCitationsForAgent indexes into results by docIndex, so attachment citations will collide with result indices and yield wrong/failed image citations. Offset attachment indices by results.length.
Apply:
- if (attachmentFileIds?.length) { - finalImageFileNames.push( - ...attachmentFileIds.map((fileid, index) => `${index}_${fileid}_${0}`), - ) - } + if (attachmentFileIds?.length) { + const base = results.length + finalImageFileNames.push( + ...attachmentFileIds.map((fileid, index) => `${base + index}_${fileid}_${0}`), + ) + }
558-561
: Same index-offset bug in synthesis path.Use results.length (or gatheredFragments.length where appropriate) as base to avoid collisions.
Apply:
- imageFileNames: attachmentFileIds?.map( - (fileId, index) => `${index}_${fileId}_${0}`, - ), + imageFileNames: attachmentFileIds?.map( + (fileId, index) => `${gatheredFragments.length + index}_${fileId}_${0}`, + ),server/api/chat/chat.ts (3)
4290-4318
: Guard and naming for image-only path.Use the camelCase array name and optional chaining to simplify the guard; pass the correctly typed array to the generator.
- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds?.length ?? 0) > 0- ImageAttachmentFileIds, + imageAttachmentFileIds,
5608-5640
: Use the fixed, narrowed imageAttachmentFileIds in retry path.- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds?.length ?? 0) > 0- ImageAttachmentFileIds, + imageAttachmentFileIds,
757-786
: Harden path validation for attachment deletion (path traversal).String checks for "..", "/" and "" are insufficient. Validate the ID and ensure the resolved path stays within the base directory.
- // Validate fileId to prevent path traversal - if ( - fileId.includes("..") || - fileId.includes("/") || - fileId.includes("\\") - ) { + // Strong validation: allow only safe chars and enforce base-dir containment + const SAFE_ID = /^[A-Za-z0-9_-]+$/ + if (!SAFE_ID.test(fileId)) { loggerWithChild({ email: email }).error( `Invalid fileId detected: ${fileId}. Skipping deletion for security.`, ) continue } const imageBaseDir = path.resolve( process.env.IMAGE_DIR || "downloads/xyne_images_db", ) - - const imageDir = path.join(imageBaseDir, fileId) - try { - await fs.access(imageDir) - await fs.rm(imageDir, { recursive: true, force: true }) + const imageDir = path.resolve(path.join(imageBaseDir, fileId)) + if (!imageDir.startsWith(imageBaseDir + path.sep)) { + loggerWithChild({ email: email }).error( + `Resolved path escapes base directory for fileId: ${fileId}`, + ) + continue + } + try { + await fs.rm(imageDir, { recursive: true, force: true }) loggerWithChild({ email: email }).info( `Deleted image attachment directory: ${imageDir}`, ) } catch (attachmentError) { loggerWithChild({ email: email }).warn( - `Image attachment file ${fileId} not found in either directory during chat deletion`, + `Image attachment directory ${imageDir} not found during chat deletion`, ) }
🧹 Nitpick comments (21)
frontend/src/utils/fileUtils.ts (1)
22-55
: Type safety: return a narrowed union instead of stringReturning a specific union improves call-site safety (e.g., getFileIcon mapping).
For example:
export type FileKind = "Image" | "Document" | "Spreadsheet" | "Presentation" | "PDF" | "Text" | "File" export const getFileType = (file: File | { type: string; name: string }): FileKind => { /* … */ }frontend/src/components/AttachmentGallery.tsx (1)
125-136
: Make “Other Files” layout responsiveHardcoding w-[40%] leaves unused space and can wrap awkwardly on small screens.
Apply this diff:
- <div className="flex justify-end flex-wrap gap-2"> + <div className="flex justify-end flex-wrap gap-2"> {otherFiles.map((file) => ( <AttachmentPreview key={file.fileId} attachment={file} - className="w-[40%]" + className="w-full sm:w-1/2 md:w-[40%]" /> ))} </div>frontend/src/components/AttachmentPreview.tsx (2)
13-15
: Avoid cross-component dependency for iconsImporting getFileIcon from ChatBox couples unrelated UI components. Extract the icon mapper to a shared util (e.g., utils/fileIcons.ts) and import from there in both places.
I can provide a small patch to introduce utils/fileIcons.ts and update imports if you’d like.
74-86
: Use a single computed type label (and it’ll be correct after getFileType fix)You call getFileType twice; compute once for consistency and render icon + label from it.
Example:
const typeLabel = getFileType({ type: attachment.fileType, name: attachment.fileName }) … {getFileIcon(typeLabel)} … {formatFileSize(attachment.fileSize)} • {typeLabel}frontend/src/components/ChatBox.tsx (3)
90-91
: Narrow SelectedFile.fileType to a unionStrengthen typing to match the finite set of labels produced by getFileType.
Apply this diff:
- fileType?: string + fileType?: "Image" | "Document" | "Spreadsheet" | "Presentation" | "PDF" | "Text" | "File"
2146-2183
: Avoid double validation on paste pathYou validate with validateAndDeduplicateFiles, then call processFiles which validates again and may duplicate toasts.
Consider relying on processFiles only and emitting the “File pasted” toast after it accepts the file.
3360-3362
: Align accepted types with validationIf you restrict images to jpeg/png/gif/webp in validation, the accept list looks consistent. Consider dropping non-standard image/jpg.
- accept="image/jpeg,image/jpg,image/png,image/gif,image/webp,… + accept="image/jpeg,image/png,image/gif,image/webp,…server/pptChunks.ts (2)
865-869
: Guarded LLM calls look good; minor style/readability nitThe conditional is correct. Consider spacing and consistent placeholder text to keep logs uniform across formats.
Apply this tiny style diff:
- if(describeImages) { + if (describeImages) {
890-893
: Avoid variable shadowing for imageExtensionYou compute imageExtension earlier (Line 840). Redeclaring it here is confusing.
- const imageExtension = path.extname(imagePath) || ".png" - const imageFilename = `${globalSeq}${imageExtension}` + const resolvedExt = imageExtension || ".png" + const imageFilename = `${globalSeq}${resolvedExt}`server/pdfChunks.ts (1)
461-465
: Conditional LLM description is correct; tiny style nitThe branch is fine. Keep spacing consistent.
- if(describeImages) { + if (describeImages) {server/ai/provider/vertex_ai.ts (4)
35-47
: Strict image path parsing is good; handle bad inputs without throwing the whole batchThe regex and validation look solid. Consider skipping invalid entries (log + continue) instead of throwing, so one bad path doesn't drop all images.
- if (!match) { - Logger.error( - `Invalid image path format: ${imgPath}. Expected format: docIndex_docId_imageNumber`, - ) - throw new Error(`Invalid image path: ${imgPath}`) - } + if (!match) { + Logger.error( + `Invalid image path format: ${imgPath}. Expected format: docIndex_docId_imageNumber`, + ) + return null + } @@ - const docIndex = match[1] + const docIndex = match[1] // (unused) remove if not neededRemove unused docIndex if not used later.
74-79
: Path containment check: prefer path.relative for robustnessUsing startsWith can be brittle across platforms. path.relative avoids case/sep issues.
- const resolvedPath = path.resolve(imageDir) - if (!resolvedPath.startsWith(baseDir)) { + const resolvedPath = path.resolve(imageDir) + const rel = path.relative(baseDir, resolvedPath) + if (rel.startsWith("..") || path.isAbsolute(rel)) { Logger.error(`Path traversal attempt detected: ${imageDir}`) throw new Error(`Invalid path: ${imageDir}`) }
81-93
: Add diagnostic details for read/size failuresLog the size overflow and include the error object to aid debugging.
- if (imgBuffer.length > MAX_IMAGE_SIZE_BYTES) return null + if (imgBuffer.length > MAX_IMAGE_SIZE_BYTES) { + Logger.warn( + `Skipping image exceeding MAX_IMAGE_SIZE_BYTES (${imgBuffer.length} > ${MAX_IMAGE_SIZE_BYTES}): ${absolutePath}`, + ) + return null + } @@ - } catch (err) { - Logger.error(`Failed to read image: ${absolutePath}`) + } catch (err) { + Logger.error(`Failed to read image: ${absolutePath}`, err) return null }
503-505
: Fix stray quote and spacing in injected prompt textThere’s an extra double-quote and a missing space (“image(s)as”) in the user-facing prompt string.
- text: `You may receive image(s)as part of the conversation. If images are attached, treat them as essential context for the user's question.\n\n" - ${userText}`, + text: `You may receive image(s) as part of the conversation. If images are attached, treat them as essential context for the user's question.\n\n${userText}`,server/docxChunks.ts (1)
2627-2631
: Guard against LLM unavailability; avoid accidental image drops.If describeImages is true but LLM_API_ENDPOINT is unset, images get skipped due to "No description returned." Consider falling back to the static description instead of skipping.
Apply:
- if(describeImages) { - description = await describeImageWithllm(imageBuffer) - } else { - description = "This is an image." - } + if (describeImages) { + const resp = await describeImageWithllm(imageBuffer) + description = + resp === "No description returned." + ? "This is an image." + : resp + } else { + description = "This is an image." + }server/services/fileProcessor.ts (1)
77-82
: Spreadsheet source handling: OK; minor enhancement.Preferring storagePath when present is fine. Optionally validate that storagePath exists/readable before attempting XLSX.readFile to produce a clearer error.
server/api/files.ts (2)
33-34
: Top-level await mkdir: consider moving to startup/bootstrap.Top-level await can delay module load and may misbehave in some runtimes. Prefer initializing directories in an app bootstrap.
300-303
: thumbnailPath relative to outputDir may be of limited use.Frontend likely needs a URL; consider returning the dedicated thumbnail API URL instead.
Apply:
- thumbnailPath: (thumbnailPath && outputDir) - ? path.relative(outputDir, thumbnailPath) - : "", + thumbnailPath: thumbnailPath ? `/api/v1/attachments/thumbnail/${fileId}` : "",server/api/chat/chat.ts (3)
2125-2125
: Avoid mutating imageFileNames; copy before augmenting.If baselineRAGJsonStream reuses the array, later pushes can leak across calls. Use a shallow copy when building finalImageFileNames.
- imageFileNames: finalImageFileNames, + imageFileNames: [...finalImageFileNames],
2142-2147
: Early-return on attachments may skip useful fallback.When attachments are present and no answer is found, you return immediately, unlike the non-attachment path that performs a targeted vespa search. Consider keeping the same fallback for parity.
- } else if (!answer) { - if(attachmentFileIds && attachmentFileIds.length > 0) { - yield { text: "From the selected context, I could not find any information to answer it, please change your query" } - return - } + } else if (!answer) { + // Keep fallback search even when attachments are present for better recall.
721-745
: Deduplicate attachment IDs to avoid redundant IO and logs.Multiple messages may reference the same attachment. Dedup before deletion and TODO logging.
- const imageAttachmentFileIds: string[] = [] - const nonImageAttachmentFileIds: string[] = [] + const imageAttachmentFileIdsSet = new Set<string>() + const nonImageAttachmentFileIdsSet = new Set<string>() ... - if (isImageAttachment) { - imageAttachmentFileIds.push(attachment.fileId) - } else { - nonImageAttachmentFileIds.push(attachment.fileId) + if (isImageAttachment) { + imageAttachmentFileIdsSet.add(attachment.fileId) + } else { + nonImageAttachmentFileIdsSet.add(attachment.fileId) ... - if (imageAttachmentFileIds.length > 0) { + const imageAttachmentFileIds = Array.from(imageAttachmentFileIdsSet) + if (imageAttachmentFileIds.length > 0) { ... - if (nonImageAttachmentFileIds.length > 0) { + const nonImageAttachmentFileIds = Array.from(nonImageAttachmentFileIdsSet) + if (nonImageAttachmentFileIds.length > 0) {Also applies to: 795-809
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (14)
frontend/src/components/AttachmentGallery.tsx
(1 hunks)frontend/src/components/AttachmentPreview.tsx
(3 hunks)frontend/src/components/ChatBox.tsx
(11 hunks)frontend/src/utils/fileUtils.ts
(1 hunks)server/ai/provider/vertex_ai.ts
(2 hunks)server/api/chat/agents.ts
(15 hunks)server/api/chat/chat.ts
(10 hunks)server/api/chat/utils.ts
(1 hunks)server/api/files.ts
(5 hunks)server/docxChunks.ts
(2 hunks)server/integrations/dataSource/index.ts
(1 hunks)server/pdfChunks.ts
(2 hunks)server/pptChunks.ts
(2 hunks)server/services/fileProcessor.ts
(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.
Applied to files:
server/integrations/dataSource/index.ts
🧬 Code graph analysis (12)
frontend/src/components/AttachmentGallery.tsx (1)
frontend/src/components/AttachmentPreview.tsx (1)
AttachmentPreview
(29-126)
frontend/src/utils/fileUtils.ts (2)
server/db/schema/knowledgeBase.ts (1)
File
(186-186)frontend/src/types/knowledgeBase.ts (1)
File
(61-71)
server/pptChunks.ts (1)
server/lib/describeImageWithllm.ts (1)
describeImageWithllm
(56-112)
server/pdfChunks.ts (1)
server/lib/describeImageWithllm.ts (1)
describeImageWithllm
(56-112)
server/ai/provider/vertex_ai.ts (1)
server/ai/provider/base.ts (1)
findImageByName
(42-49)
server/docxChunks.ts (1)
server/lib/describeImageWithllm.ts (1)
describeImageWithllm
(56-112)
server/services/fileProcessor.ts (1)
server/integrations/dataSource/config.ts (1)
isSheetFile
(84-87)
frontend/src/components/ChatBox.tsx (1)
frontend/src/utils/fileUtils.ts (2)
getFileType
(23-55)validateAndDeduplicateFiles
(109-153)
server/api/chat/chat.ts (3)
server/shared/types.ts (1)
AttachmentMetadata
(604-604)server/api/chat/utils.ts (2)
isMessageWithContext
(495-497)extractFileIdsFromMessage
(505-655)server/db/attachment.ts (1)
getAttachmentsByMessageId
(54-120)
frontend/src/components/AttachmentPreview.tsx (2)
frontend/src/components/ChatBox.tsx (1)
getFileIcon
(93-110)frontend/src/utils/fileUtils.ts (1)
getFileType
(23-55)
server/api/chat/agents.ts (2)
server/shared/types.ts (1)
AttachmentMetadata
(604-604)server/api/chat/utils.ts (3)
extractImageFileNames
(307-356)isMessageWithContext
(495-497)extractFileIdsFromMessage
(505-655)
server/api/files.ts (6)
server/utils/image.ts (3)
isImageFile
(49-58)getThumbnailPath
(60-62)generateThumbnail
(18-47)server/services/fileProcessor.ts (1)
FileProcessorService
(22-161)server/search/types.ts (3)
VespaDataSourceFile
(390-390)datasourceSchema
(22-22)dataSourceFileSchema
(23-23)server/search/vespa.ts (2)
NAMESPACE
(82-82)insert
(158-169)server/integrations/dataSource/index.ts (1)
createFileMetadata
(96-109)server/search/vespaClient.ts (1)
insert
(247-284)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test
🔇 Additional comments (24)
frontend/src/components/AttachmentPreview.tsx (1)
85-86
: Guard fileSize if nullableIf AttachmentMetadata.fileSize can be null/undefined, formatFileSize will misbehave.
Confirm fileSize is always a number here, or change to:
{formatFileSize(attachment.fileSize || 0)} • {typeLabel}frontend/src/components/ChatBox.tsx (1)
521-527
: Nice: set fileType at source of truthPopulating fileType when files are added keeps rendering simple downstream.
server/integrations/dataSource/index.ts (1)
96-109
: createFileMetadata export and import sites verifiedcreateFileMetadata is only imported in server/api/files.ts (line 27) and used internally in its own module; no duplicate metadata builders detected.
server/pptChunks.ts (1)
647-653
: All call sites use default flags; no changes needed
All existing calls only pass (data, docid), so extractImages=false and describeImages=true remain intact. Only supply describeImages=false when you deliberately want to suppress LLM image descriptions.server/pdfChunks.ts (1)
124-131
: Confirm explicitextractImages
/describeImages
flags at all extractor call sites
- server/services/fileProcessor.ts (≈ lines 41–43, 53–55, 65–67): PDF/DOCX/PPTX calls omit flags (defaults →
extractImages=false
,describeImages=true
).- server/integrations/google/worker-utils.ts (≈ lines 48–50, 67–69, 29–31): same.
- server/integrations/dataSource/index.ts:
- PDF (≈ lines 205–209) passes
true
forextractImages
.- DOCX (≈ lines 241–245) & PPTX (≈ lines 289–293) omit flags (defaults). Ensure each site’s flag usage matches intended image extraction.
server/docxChunks.ts (1)
2416-2418
: Signature change looks good; confirm defaults across formats.Defaulting describeImages to true here is fine, but FileProcessorService passes false by default. Please confirm PDF/PPTX have the same semantics so callers don’t get surprising behavior differences.
server/services/fileProcessor.ts (4)
41-46
: PDF forwarding: OK.Flags are forwarded correctly.
56-58
: DOCX forwarding: OK.Flags are forwarded correctly.
66-70
: PPTX forwarding: OK.Flags are forwarded correctly.
28-31
: Extended signature: OK – call sites updated correctly
All callers now pass the new parameters (defaults used where flags aren’t supplied).server/api/chat/agents.ts (8)
671-674
: Merging fileIds: OK; ensure defined.extractedInfo.fileIds defaults to [], so merge is safe.
1450-1451
: Passing ImageAttachmentFileIds into synthesis: OK.This aligns with the new image handling.
2230-2230
: Propagation to synthesis: OK.Same comment as above; once index-offset fix is applied, this will be consistent.
3516-3519
: RAG-off bypass with attachments: OK; verify desired UX.This changes behavior: ragOff now disabled when attachments exist. Confirm this matches product intent.
3535-3541
: Path-aware extractFileIdsFromMessage: OK; ensure ids format.Since pathRefId is forwarded, confirm getRecordBypath returns what extractFileIdsFromMessage expects.
3543-3547
: Context gating includes non-image attachments: OK.Merging NonImageAttachmentFileIds into fileIds is correct for Vespa-backed context.
3737-3771
: Attachment-aware context answer path: OK.Passing ImageAttachmentFileIds through UnderstandMessageAndAnswerForGivenContext aligns with new separation.
4677-4694
: Non-streaming path mirrors streaming: OK.Same attachment flow mirrored. Apply the earlier index-offset fix wherever imageFileNames are constructed in non-streaming flows too (if applicable).
server/api/files.ts (6)
15-17
: Ensure NAMESPACE is configured.NAMESPACE currently defaults to "namespace". If not overridden, Vespa insert will target a placeholder. Validate env/config before inserts.
24-28
: New imports: OK.FileProcessorService + Vespa types wired correctly.
228-231
: Vars initialized: OK.thumbnailPath/outputDir optional handling is clear.
234-241
: Disk write + thumbnail: OK; minor consistency.Writes to IMAGE_DIR/xyne_images_db; ensure this matches handleAttachmentServe’s lookup path.
245-291
: Non-image ingestion into Vespa: verify dataSourceRef and schema.dataSourceRef points to datasource schema but no actual datasource linkage is created here. Confirm Vespa schema accepts this reference and downstream features rely on it. Also consider setting a stable title/description convention for discoverability.
383-388
: 404 message for non-image attachments: good DX.Clear guidance that non-images are not stored on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/api/chat/utils.ts (1)
746-757
: Alternatively (or additionally), make getCitationToImage parser tolerant to both formats.If other code now relies on
${docIndex}_${docId}_${imageNumber}
, accept that too by taking the last numeric segment.- const parts = citationIndex.split("_") - if (parts.length < 2) { + const parts = citationIndex.split("_") + if (parts.length < 2) { loggerWithChild({ email: email }).error( "Invalid citation index format, expected docIndex_imageNumber", citationIndex, ) return null } - const docIndex = parseInt(parts[0], 10) - const imageNumber = parseInt(parts[1], 10) + const docIndex = Number.parseInt(parts[0], 10) + const lastPart = parts[parts.length - 1] ?? "" + const imageNumMatch = lastPart.match(/^(\d+)/) + const imageNumber = imageNumMatch ? Number.parseInt(imageNumMatch[1], 10) : NaN
🧹 Nitpick comments (3)
server/api/chat/utils.ts (3)
322-326
: Make tokenization resilient to commas/quotes.Emails/LLM output often comma-separate items or wrap them in quotes. Split on commas too and strip quotes.
- .split(/\s+/) - .map((name) => name.trim()) + .split(/[,\s]+/) + .map((name) => name.trim().replace(/^["'`]|["'`]+$/g, "")) .filter((name) => name.length > 0)
336-338
: Avoid O(n*m) scans by pre-indexing results.When many images,
findIndex
per token can be costly. Precompute a docId → index map once.Example (outside this hunk):
const docIdToIndex = new Map(results?.map((r, i) => [((r.fields as any).docId as string), i]) ?? []); const docIndex = docIdToIndex.get(docId) ?? -1;
350-353
: Use structured logger instead of console.error for consistency/observability.Replace with:
getLoggerWithChild(Subsystem.Chat)().error( `Error processing image content: ${getErrorMessage(error)}`, { imageContent } )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
server/api/chat/utils.ts
(1 hunks)
🔇 Additional comments (1)
server/api/chat/utils.ts (1)
307-358
: Verified no remaining consumers of the old${docIndex}_${fileName}
or${docIndex}_${docId}_${imageNumber}
formats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
server/search/types.ts (1)
176-181
: Add KB.Attachment is fine, but include it in isValidEntity or KB entities will fail validationWithout updating isValidEntity, KnowledgeBase entities (including Attachment) won’t be recognized by validators that rely on it.
Apply:
export const isValidEntity = (entity: string): boolean => { const normalizedEntity = entity?.toLowerCase() return normalizedEntity ? Object.values(DriveEntity) .map((v) => v.toLowerCase()) .includes(normalizedEntity) || Object.values(MailEntity) .map((v) => v.toLowerCase()) .includes(normalizedEntity) || Object.values(CalendarEntity) .map((v) => v.toLowerCase()) .includes(normalizedEntity) || Object.values(MailAttachmentEntity) .map((v) => v.toLowerCase()) .includes(normalizedEntity) || Object.values(GooglePeopleEntity) .map((v) => v.toLowerCase()) .includes(normalizedEntity) || Object.values(SlackEntity) .map((v) => v.toLowerCase()) .includes(normalizedEntity) || Object.values(WebSearchEntity) .map((v) => v.toLowerCase()) - .includes(normalizedEntity) + .includes(normalizedEntity) || + Object.values(KnowledgeBaseEntity) + .map((v) => v.toLowerCase()) + .includes(normalizedEntity) : // Object.values(NotionEntity).map(v => v.toLowerCase()).includes(normalizedEntity) false }server/api/chat/agents.ts (1)
426-431
: Image citation key collision for attachments (index starts at 0).Citation keys like
${index}_...
must not overlap with result indices;checkAndYieldCitationsForAgent
indexes intoresults[docIndex]
. Offset attachment indices byresults.length
.Apply:
- if (attachmentFileIds?.length) { - finalImageFileNames.push( - ...attachmentFileIds.map((fileid, index) => `${index}_${fileid}_${0}`), - ) - } + if (attachmentFileIds?.length) { + const base = results.length + finalImageFileNames.push( + ...attachmentFileIds.map((fileid, index) => `${base + index}_${fileid}_0`), + ) + }server/api/files.ts (2)
357-365
: Validate fileId to prevent path traversal.Route param is concatenated into paths. Enforce a strict pattern (
att_
+ UUID) before file IO.try { const fileId = c.req.param("fileId") if (!fileId) { throw new HTTPException(400, { message: "File ID is required" }) } + // Enforce expected format: att_<uuid> + if (!/^att_[a-f0-9-]{36}$/i.test(fileId)) { + throw new HTTPException(400, { message: "Invalid File ID format" }) + }
431-447
: Also validate fileId for thumbnail route.Mirror the same regex check here.
try { const fileId = c.req.param("fileId") if (!fileId) { throw new HTTPException(400, { message: "File ID is required" }) } + if (!/^att_[a-f0-9-]{36}$/i.test(fileId)) { + throw new HTTPException(400, { message: "Invalid File ID format" }) + }
♻️ Duplicate comments (9)
server/api/chat/chat.ts (6)
4034-4036
: Rename to camelCase and produce string[] without nullsKeep naming consistent and avoid (string | null)[] shapes. Use flatMap to ensure string[].
- const ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) - const NonImageAttachmentFileIds = attachmentMetadata.filter(m => !m.isImage).map(m => m.fileId) + const imageAttachmentFileIds: string[] = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + ) + const nonImageAttachmentFileIds: string[] = attachmentMetadata.flatMap((m) => + !m?.isImage && m.fileId ? [m.fileId] : [] + )
4064-4076
: Keep fileIds typed as string[]; append non-image IDs safelyInitialize with fallback and concat safely; also use the camelCase names.
- let isMsgWithContext = isMessageWithContext(message) + let isMsgWithContext = isMessageWithContext(message) const extractedInfo = isMsgWithContext ? await extractFileIdsFromMessage(message, email) : { totalValidFileIdsFromLinkCount: 0, fileIds: [], threadIds: [], } - isMsgWithContext = isMsgWithContext || (NonImageAttachmentFileIds && NonImageAttachmentFileIds.length > 0) - let fileIds = extractedInfo?.fileIds - if (NonImageAttachmentFileIds && NonImageAttachmentFileIds.length > 0) { - fileIds = fileIds.concat(NonImageAttachmentFileIds) - } + isMsgWithContext = + isMsgWithContext || (nonImageAttachmentFileIds?.length ?? 0) > 0 + let fileIds: string[] = extractedInfo.fileIds ?? [] + if (nonImageAttachmentFileIds.length > 0) { + fileIds = fileIds.concat(nonImageAttachmentFileIds) + }
4286-4288
: Use camelCase variable in conditions and callUpdate usages to match the renamed identifiers and ensure correct types.
- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds && imageAttachmentFileIds.length > 0)- ImageAttachmentFileIds, + imageAttachmentFileIds,Also applies to: 4314-4315
5465-5470
: Retry flow: same naming + type fixAvoid (string | null)[]; mirror the main path logic.
- let ImageAttachmentFileIds: string[] = [] + let imageAttachmentFileIds: string[] = []- ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + )
5523-5526
: Retry flow: remove nulls via flatMapThe map-to-null pattern widens to (string | null)[]; use flatMap.
- ImageAttachmentFileIds = attachmentMetadata.map( - (m: AttachmentMetadata) => m.isImage ? m.fileId : null, - ).filter((m: string | null) => m !== null) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + )
5601-5604
: Retry flow: update usages to camelCaseUse the renamed variable consistently.
- if ( - (fileIds && fileIds?.length > 0) || - (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) - ) { + if ((fileIds?.length ?? 0) > 0 || (imageAttachmentFileIds.length > 0)) {- ImageAttachmentFileIds, + imageAttachmentFileIds,Also applies to: 5634-5635
server/api/chat/agents.ts (2)
661-663
: Type-safe split of attachment IDs.Switched to filter→map; avoids
(string|null)[]
. Nice.
1408-1418
: Repeat: attachment image indices can collide with planning results.Use a base offset from current planning results length before pushing attachment image names.
- if (ImageAttachmentFileIds?.length) { - finalImageFileNames.push( - ...ImageAttachmentFileIds.map((fileid, index) => `${index}_${fileid}_${0}`), - ) - } + if (ImageAttachmentFileIds?.length) { + const base = results.root?.children?.length ?? 0 + finalImageFileNames.push( + ...ImageAttachmentFileIds.map((fileid, index) => `${base + index}_${fileid}_0`), + ) + }server/api/files.ts (1)
221-224
: randomUUID source portability.Using
crypto.randomUUID()
can break in some runtimes. Prefernode:crypto
import or a safe fallback.+import { randomUUID } from "node:crypto" ... - const fileId = `att_${crypto.randomUUID()}` + const fileId = `att_${randomUUID()}`
🧹 Nitpick comments (6)
server/api/chat/chat.ts (1)
2087-2093
: Mixed imageFileNames key formats — confirm provider expectationsYou’re pushing both “docIndex_imageIndex” and “index__0”-style keys into imageFileNames. If the provider expects strictly numeric pairs for citation mapping, attachments won’t ever emit image citations (by design?). If that’s intentional, all good; otherwise, align formats or adapt the regex.
server/api/chat/agents.ts (2)
564-566
: Optional: keep attachment image indices consistent during synthesis.Use the same index offset strategy as above to avoid future mismatches if image citations appear in synthesis outputs.
- imageFileNames: attachmentFileIds?.map( - (fileId, index) => `${index}_${fileId}_${0}`, - ), + imageFileNames: (() => { + const base = gatheredFragments.length + return attachmentFileIds?.map((fileId, index) => `${base + index}_${fileId}_0`) + })(),
673-676
: De-duplicate combined fileIds to avoid redundant fetches.When appending NonImageAttachmentFileIds, remove duplicates.
- let fileIds = extractedInfo?.fileIds - if (NonImageAttachmentFileIds && NonImageAttachmentFileIds.length > 0) { - fileIds = [...fileIds, ...NonImageAttachmentFileIds] - } + let fileIds = extractedInfo?.fileIds + if (NonImageAttachmentFileIds?.length) { + fileIds = Array.from(new Set([...fileIds, ...NonImageAttachmentFileIds])) + }server/api/files.ts (3)
15-17
: Unused import.
NAMESPACE
is imported but not used. Drop it.- insert, - NAMESPACE, + insert,
298-311
: Non-image attachments return 404 at serve URL. Consider signaling in metadata.
url
points to/api/v1/attachments/:fileId
for all files, but non-images aren’t on disk and 404 here. Either seturl
only for images or add a flag so UI avoids broken links.- const metadata: AttachmentMetadata = { + const metadata: AttachmentMetadata = { fileId, fileName: file.name, fileType: file.type, fileSize: file.size, isImage, thumbnailPath: (thumbnailPath && outputDir) ? path.relative(outputDir, thumbnailPath) : "", createdAt: new Date(), - url: `/api/v1/attachments/${fileId}`, + url: isImage ? `/api/v1/attachments/${fileId}` : "", + // optionally: + // downloadable: isImage, }
232-243
: Runtime dependency on Bun APIs.
Bun.write
is fine if Bun is guaranteed. If Node is also a target, fallback tofs.promises.writeFile
.- await Bun.write(filePath, new Uint8Array(fileBuffer)) + if ((globalThis as any).Bun?.write) { + await (globalThis as any).Bun.write(filePath, new Uint8Array(fileBuffer)) + } else { + const { writeFile } = await import("node:fs/promises") + await writeFile(filePath, Buffer.from(fileBuffer)) + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
server/api/chat/agents.ts
(16 hunks)server/api/chat/chat.ts
(12 hunks)server/api/chat/utils.ts
(2 hunks)server/api/files.ts
(5 hunks)server/search/types.ts
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/api/chat/utils.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-11T08:44:08.412Z
Learnt from: MayankBansal2004
PR: xynehq/xyne#719
File: server/api/chat/chat.ts:4023-4029
Timestamp: 2025-08-11T08:44:08.412Z
Learning: In server/api/chat/chat.ts and similar files, MayankBansal2004 prefers not to add explicit type definitions for metadata.usage structures when the receiving array is already typed and default values ensure type safety through the || 0 pattern.
Applied to files:
server/api/chat/agents.ts
server/api/chat/chat.ts
🧬 Code graph analysis (3)
server/api/chat/agents.ts (1)
server/api/chat/utils.ts (3)
extractImageFileNames
(307-358)isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)
server/api/chat/chat.ts (3)
server/search/types.ts (1)
dataSourceFileSchema
(23-23)server/api/chat/utils.ts (2)
isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)server/db/attachment.ts (1)
getAttachmentsByMessageId
(54-120)
server/api/files.ts (6)
server/integrations/dataSource/config.ts (2)
isImageFile
(89-92)getBaseMimeType
(74-76)server/utils/image.ts (3)
isImageFile
(49-58)getThumbnailPath
(60-62)generateThumbnail
(18-47)server/services/fileProcessor.ts (1)
FileProcessorService
(22-161)server/search/vespa.ts (1)
insert
(158-169)server/search/vespaClient.ts (1)
insert
(247-284)server/search/types.ts (1)
KbItemsSchema
(25-25)
🔇 Additional comments (7)
server/search/types.ts (2)
200-201
: LGTM: Runtime validator addedKnowledgeBaseEntitySchema export is correct and consistent with the enum.
230-231
: LGTM: Public unions extendedentitySchema and Entity now include KnowledgeBaseEntity; downstream narrowing will work as expected.
Also applies to: 244-245
server/api/chat/chat.ts (2)
158-159
: LGTM: Importing KnowledgeBaseEntityImport is correct given its use in citation filtering.
440-442
: Skip citations for DS and KB attachments — good guardAvoids emitting citations for non-searchable attachment content.
server/api/chat/agents.ts (3)
145-146
: Good: Attachment entity import wired for citation filtering.
KnowledgeBaseEntity
import is correct and used consistently below.
298-302
: Correct: exclude attachment docs from citations.Prevents noisy/irrelevant citations for uploads.
3513-3515
: RagOff bypass when attachments exist — confirm intent.This forces RAG path if any attachments are present (even non-image). If intended only for image attachments, gate on
ImageAttachmentFileIds.length
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/src/components/ChatBox.tsx (1)
1-1
: Remove duplicated getFileIcon and update imports
getFileIcon remains defined in ChatBox.tsx (around line 95) and imported in AttachmentPreview.tsx (line 14); remove the local export and switch AttachmentPreview (and any other consumers) to import from "@/utils/fileIcons".
♻️ Duplicate comments (10)
frontend/src/components/ChatBox.tsx (1)
95-112
: Extract getFileIcon into a shared util to avoid odd cross-component dependencyThis helper is exported from ChatBox and imported by other components, which bloats bundles and creates an awkward dependency. Centralize it in utils and import from there. This was flagged previously as well.
Apply these diffs:
@@ -export const getFileIcon = (fileType: string | undefined) => { - switch (fileType) { - case "Image": - return <FileImage size={24} className="text-blue-500 dark:text-blue-400 flex-shrink-0" /> - case "Document": - return <FileText size={24} className="text-blue-600 dark:text-blue-400 flex-shrink-0" /> - case "Spreadsheet": - return <FileSpreadsheet size={24} className="text-green-600 dark:text-green-400 flex-shrink-0" /> - case "Presentation": - return <Presentation size={24} className="text-orange-600 dark:text-orange-400 flex-shrink-0" /> - case "PDF": - return <FileText size={24} className="text-red-600 dark:text-red-400 flex-shrink-0" /> - case "Text": - return <FileText size={24} className="text-gray-600 dark:text-gray-400 flex-shrink-0" /> - default: - return <File size={24} className="text-gray-500 dark:text-gray-400 flex-shrink-0" /> - } -} +// import from shared util instead +// (see import change above)server/api/chat/chat.ts (7)
4069-4071
: Use camelCase names and produce string[] safely.Rename to camelCase and avoid (string | undefined)[] if fileId may be absent.
- const ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) - const NonImageAttachmentFileIds = attachmentMetadata.filter(m => !m.isImage).map(m => m.fileId) + const imageAttachmentFileIds: string[] = attachmentMetadata.flatMap( + (m) => (m?.isImage && m.fileId ? [m.fileId] : []) + ) + const nonImageAttachmentFileIds: string[] = attachmentMetadata.flatMap( + (m) => (!m?.isImage && m.fileId ? [m.fileId] : []) + )Run to catch remaining PascalCase usages:
#!/bin/bash rg -nP --type ts -C2 '\b(ImageAttachmentFileIds|NonImageAttachmentFileIds)\b'
4104-4116
: Keep fileIds typed and append non-image IDs safely.Ensure string[] type and null-safe length checks.
- let isMsgWithContext = isMessageWithContext(message) + let isMsgWithContext = isMessageWithContext(message) @@ - isMsgWithContext = isMsgWithContext || (NonImageAttachmentFileIds && NonImageAttachmentFileIds.length > 0) - let fileIds = extractedInfo?.fileIds - if (NonImageAttachmentFileIds && NonImageAttachmentFileIds.length > 0) { - fileIds = fileIds.concat(NonImageAttachmentFileIds) - } + isMsgWithContext = + isMsgWithContext || (nonImageAttachmentFileIds?.length ?? 0) > 0 + let fileIds: string[] = extractedInfo.fileIds ?? [] + if (nonImageAttachmentFileIds.length > 0) { + fileIds = fileIds.concat(nonImageAttachmentFileIds) + }
4327-4355
: Update usages to camelCase arrays.Rename in conditions and when passing to UnderstandMessageAndAnswerForGivenContext.
- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds && imageAttachmentFileIds.length > 0) @@ - ImageAttachmentFileIds, + imageAttachmentFileIds,
5612-5617
: Retry flow: use camelCase and initialize once.Unify naming with main flow.
- let ImageAttachmentFileIds: string[] = [] + let imageAttachmentFileIds: string[] = [] @@ - ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) + imageAttachmentFileIds = attachmentMetadata.flatMap( + (m) => (m?.isImage && m.fileId ? [m.fileId] : []) + )
5758-5792
: Retry flow: update usages to camelCase.Keep variable names consistent.
- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds && imageAttachmentFileIds.length > 0) @@ - ImageAttachmentFileIds, + imageAttachmentFileIds,
2169-2174
: End tracing span before early return (trace leak).When no answer and attachments exist, the span isn’t closed.
} else if (!answer) { - if(attachmentFileIds && attachmentFileIds.length > 0) { + if (attachmentFileIds && attachmentFileIds.length > 0) { yield { text: "From the selected context, I could not find any information to answer it, please change your query", } + generateAnswerSpan?.end() return }
5670-5673
: Fix (string | null)[] narrowing bug.Use flatMap to return string[]; current map+filter leaves a union.
- ImageAttachmentFileIds = attachmentMetadata.map( - (m: AttachmentMetadata) => m.isImage ? m.fileId : null, - ).filter((m: string | null) => m !== null) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + )server/api/chat/agents.ts (2)
3603-3604
: Consistent implementation across API methodsThe attachment filtering logic is consistently applied in the
AgentMessageApi
method, maintaining the same separation between image and non-image attachments.
1492-1503
: Potential index collision when appending attachment file namesWhen appending
ImageAttachmentFileIds
tofinalImageFileNames
, the indices start from 0, which could collide with existing image names from the planning context. This may cause confusion in image references.const { imageFileNames } = extractImageFileNames( planningContext, [...results.root.children, ...chatContexts, ...threadContexts], ) const finalImageFileNames = imageFileNames || [] if (ImageAttachmentFileIds?.length) { + const baseIndex = finalImageFileNames.length finalImageFileNames.push( - ...ImageAttachmentFileIds.map((fileid, index) => `${index}_${fileid}_${0}`), + ...ImageAttachmentFileIds.map((fileid, index) => `${baseIndex + index}_${fileid}_${0}`), ) }
🧹 Nitpick comments (7)
frontend/src/components/ChatBox.tsx (5)
31-36
: Remove icon imports no longer needed after extractionOnce getFileIcon lives in utils, ChatBox should not import icon variants it doesn’t use directly.
-import { - File, - Loader2, - FileText, - FileSpreadsheet, - Presentation, - FileImage, - Globe, -} from "lucide-react" +import { File, Loader2, Globe } from "lucide-react"
81-82
: Import getFileIcon from utilsWire the shared helper.
import { generateFileId, @@ cleanupPreviewUrls, - getFileType, + getFileType, } from "@/utils/fileUtils" +import { getFileIcon } from "@/utils/fileIcons"
2354-2390
: Avoid double validation and conflicting toasts in paste handlerYou pre-validate with validateAndDeduplicateFiles and then call processFiles which validates and toasts again, causing duplicate/misleading feedback (e.g., “File pasted” even when dedup skips it). Let processFiles own validation and UX.
- // Handle file paste (all supported file types) + // Handle file paste (all supported file types) for (let i = 0; i < items.length; i++) { const item = items[i] - if (item.kind === "file") { + if (item.kind === "file") { // Check attachment limit before processing if (selectedFiles.length >= MAX_ATTACHMENTS) { showToast( "Attachment limit reached", `You can only attach up to ${MAX_ATTACHMENTS} files at a time.`, true, ) return } const file = item.getAsFile() if (file) { - // Check if the file type is supported - const isValid = validateAndDeduplicateFiles([file], showToast) - if (isValid.length > 0) { - // Process the pasted file - processFiles([file]) - const fileType = getFileType(file) - - showToast( - "File pasted", - `${fileType} has been added to your message.`, - false, - ) - return // Exit early since we handled the file - } else { - showToast( - "Unsupported file type", - "This file type is not supported for attachments.", - true, - ) - return - } + // Delegate validation, dedup, limit checks, and toasts to processFiles + processFiles([file]) + return } } }
3787-3789
: Consolidate accept list and drop nonstandard image/jpgMaintain a single source of truth for supported types (shared constant) to prevent drift with validateAndDeduplicateFiles; also image/jpg is redundant with image/jpeg.
- accept="image/jpeg,image/jpg,image/png,image/gif,image/webp,text/plain,text/csv,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-powerpoint,application/vnd.openxmlformats-officedocument.presentationml.presentation,text/markdown,.txt,.csv,.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx,.md" + accept="image/jpeg,image/png,image/gif,image/webp,text/plain,text/csv,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-powerpoint,application/vnd.openxmlformats-officedocument.presentationml.presentation,text/markdown,.txt,.csv,.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx,.md"Optional: export a SUPPORTED_FILE_ACCEPT string from a utils module and use it here and anywhere else we present supported types.
85-93
: Type-safety: tighten fileType to a discriminated unionfileType is optional and a plain string. Since getFileType returns a known set, prefer a union type for compile-time safety and exhaustive switching.
Example (in a shared types file):
export type FileType = "Image" | "Document" | "Spreadsheet" | "Presentation" | "PDF" | "Text" | "File"Then:
interface SelectedFile { ... fileType: FileType }And have getFileType return FileType.
server/api/chat/agents.ts (2)
744-745
: Improve type safety when creating attachment arraysThe current approach using
map
followed by filtering creates intermediate arrays with mixed types. Usefilter
first, thenmap
to ensure better type safety.-const ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) -const NonImageAttachmentFileIds = attachmentMetadata.filter(m => !m.isImage).map(m => m.fileId) +const ImageAttachmentFileIds = attachmentMetadata + .filter((m: AttachmentMetadata) => m.isImage) + .map((m) => m.fileId) +const NonImageAttachmentFileIds = attachmentMetadata + .filter((m: AttachmentMetadata) => !m.isImage) + .map((m) => m.fileId)
3937-3938
: Consider extracting attachment check into a helper functionThe condition checking for attachments is repeated multiple times. Consider creating a helper function for better maintainability.
+const hasAttachments = ( + fileIds: string[] | undefined, + imageAttachmentFileIds: string[] | undefined +): boolean => { + return (fileIds && fileIds.length > 0) || + (imageAttachmentFileIds && imageAttachmentFileIds.length > 0) +} if ( - (isMsgWithContext && fileIds && fileIds?.length > 0) || - (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (isMsgWithContext && fileIds && fileIds.length > 0) || + hasAttachments(fileIds, ImageAttachmentFileIds) ) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
frontend/src/components/ChatBox.tsx
(10 hunks)server/ai/provider/vertex_ai.ts
(2 hunks)server/api/chat/agents.ts
(16 hunks)server/api/chat/chat.ts
(12 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/ai/provider/vertex_ai.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-11T08:44:08.412Z
Learnt from: MayankBansal2004
PR: xynehq/xyne#719
File: server/api/chat/chat.ts:4023-4029
Timestamp: 2025-08-11T08:44:08.412Z
Learning: In server/api/chat/chat.ts and similar files, MayankBansal2004 prefers not to add explicit type definitions for metadata.usage structures when the receiving array is already typed and default values ensure type safety through the || 0 pattern.
Applied to files:
server/api/chat/agents.ts
server/api/chat/chat.ts
🧬 Code graph analysis (3)
frontend/src/components/ChatBox.tsx (1)
frontend/src/utils/fileUtils.ts (2)
getFileType
(23-55)validateAndDeduplicateFiles
(109-153)
server/api/chat/agents.ts (1)
server/api/chat/utils.ts (3)
extractImageFileNames
(307-358)isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)
server/api/chat/chat.ts (3)
server/search/types.ts (1)
dataSourceFileSchema
(23-23)server/api/chat/utils.ts (2)
isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)server/db/attachment.ts (1)
getAttachmentsByMessageId
(54-120)
🔇 Additional comments (13)
frontend/src/components/ChatBox.tsx (3)
731-738
: Populate SelectedFile.fileType on add — goodStoring a normalized fileType upfront simplifies rendering and toasts. LGTM.
2720-2739
: Clearer attachment preview with type label and icon — good UXShowing the inferred type alongside the icon improves clarity for non-image files. Looks good.
2799-2803
: Helpful, specific attach tooltip — goodAccurately reflects supported categories. LGTM.
server/api/chat/chat.ts (3)
121-122
: LGTM: Import for KnowledgeBaseEntity is correct.This enables skipping KB attachments in citation processing downstream.
465-467
: Citations: correct to skip datasource files and KB attachments.Prevents noisy/irrelevant citations.
2152-2153
: LGTM: Pass finalImageFileNames.Propagating the consolidated image list is the right call.
server/api/chat/agents.ts (7)
317-320
: Good implementation of attachment citation filteringThe logic correctly prevents attachment citations from appearing in the chat by checking for
KnowledgeBaseEntity.Attachment
and continuing without yielding.
756-759
: Good implementation of non-image attachment handlingThe logic correctly merges non-image attachments into the general
fileIds
array for link-based context processing while keeping image attachments separate.
3715-3717
: RAG-off condition properly excludes attachment scenariosThe logic correctly ensures that when attachments are present, the RAG-off path is not taken, which is appropriate since attachments should trigger the full RAG pipeline.
3733-3745
: Proper integration of non-image attachments with contextThe code correctly:
- Checks for context using
isMessageWithContext
- Extracts file IDs from the message
- Merges non-image attachment file IDs into the main
fileIds
array- Updates the context check to include non-image attachments
537-564
: Consistent use of ImageAttachmentFileIds in synthesisThe
performSynthesis
function correctly receives and usesImageAttachmentFileIds
for image file name generation when performing context synthesis.
2130-2318
: Proper propagation of ImageAttachmentFileIds throughout the agent loopThe code consistently passes
ImageAttachmentFileIds
through all synthesis and continuation calls within the agent iteration loop, ensuring image attachments are properly handled at each step.
146-146
: KnowledgeBaseEntity import verified
TheKnowledgeBaseEntity
enum inserver/search/types.ts
includes theAttachment
member, so the import and its usage are correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
server/api/chat/chat.ts (2)
2113-2120
: Attachment image citation encoding breaks parser.You push strings like
${index}_${fileid}_${0}
into imageFileNames. checkAndYieldCitations expects “[docIndex_imageIndex]” where both parts are numbers and uses results[docIndex]. The current format makes parseInt(parts[1]) NaN and results lookups fail.Apply: append synthetic attachment “results” and use numeric keys so citations resolve.
- const finalImageFileNames = imageFileNames || [] - if (attachmentFileIds?.length) { - finalImageFileNames.push( - ...attachmentFileIds.map((fileid, index) => `${index}_${fileid}_${0}`), - ) - } + const finalImageFileNames = imageFileNames || [] + if (attachmentFileIds?.length) { + const baseIndex = combinedSearchResponse.length + for (const fileId of attachmentFileIds) { + // Minimal stub so getCitationToImage can resolve via item + fileId + combinedSearchResponse.push({ + fields: { entity: KnowledgeBaseEntity.Attachment, docId: fileId }, + } as any) + } + finalImageFileNames.push( + ...attachmentFileIds.map((_, i) => `${baseIndex + i}_0`), + ) + }If you want a dedicated “att__” scheme instead, extend checkAndYieldCitations to branch on that prefix and resolve directly from disk.
747-835
: Tighten fileId validation to prevent path-escape and odd inputs.String contains checks miss unicode/encoded separators and allow weird ids. Prefer strict allowlist and a post-join baseDir check.
Apply:
- // Validate fileId to prevent path traversal - if ( - fileId.includes("..") || - fileId.includes("/") || - fileId.includes("\\") - ) { - loggerWithChild({ email: email }).error( - `Invalid fileId detected: ${fileId}. Skipping deletion for security.`, - ) - continue - } - const imageBaseDir = path.resolve( + // Validate fileId (allow only safe chars) and prevent base-dir escape + if (!/^[A-Za-z0-9_-]+$/.test(fileId)) { + loggerWithChild({ email }).error( + `Invalid fileId detected: ${fileId}. Skipping deletion for security.`, + ) + continue + } + const imageBaseDir = path.resolve( process.env.IMAGE_DIR || "downloads/xyne_images_db", ) - - const imageDir = path.join(imageBaseDir, fileId) + const imageDir = path.join(imageBaseDir, fileId) + const resolved = path.resolve(imageDir) + if (!resolved.startsWith(imageBaseDir + path.sep)) { + loggerWithChild({ email }).error( + `Resolved path escapes base dir: ${resolved}. Skipping.`, + ) + continue + }Also update the warning text on Line 809 to drop “either directory” since there’s now a single base directory.
frontend/src/components/FileUpload.tsx (1)
361-367
: Directory drops: readEntries is called only once (misses >100 entries).FileSystemDirectoryReader returns at most 100 entries per call; loop until empty to avoid silently skipping files in larger folders.
Apply:
- const reader = dirEntry.createReader() - return new Promise<void>((resolve) => { - reader.readEntries(async (entries) => { - await Promise.all(entries.map(processEntry)) - resolve() - }) - }) + const reader = dirEntry.createReader() + return new Promise<void>((resolve) => { + const acc: FileSystemEntry[] = [] + const readAll = () => { + reader.readEntries(async (entries) => { + if (!entries.length) { + await Promise.all(acc.map(processEntry)) + resolve() + } else { + acc.push(...entries) + readAll() + } + }) + } + readAll() + })server/api/chat/agents.ts (4)
446-451
: Image citation key collisions: offset attachment indices.Appending
${index}_...
can collide with existing image doc indices; offset by results length.- if (attachmentFileIds?.length) { - finalImageFileNames.push( - ...attachmentFileIds.map((fileid, index) => `${index}_${fileid}_${0}`), - ) - } + if (attachmentFileIds?.length) { + const base = results?.length || 0 + finalImageFileNames.push( + ...attachmentFileIds.map((fileid, index) => `${base + index}_${fileid}_${0}`), + ) + }
575-589
: Same collision in synthesis path.Use gatheredFragments length as base to keep keys unique.
- imageFileNames: attachmentFileIds?.map( - (fileId, index) => `${index}_${fileId}_${0}`, - ), + imageFileNames: attachmentFileIds?.map( + (fileId, index) => `${(gatheredFragments?.length || 0) + index}_${fileId}_${0}`, + ),
2625-2663
: Guard stream writes and lastMessage accesses in error callback.stream may be undefined and lastMessage may not exist; both can throw during error handling.
- if (errorChatId !== "unknown" && errorMsgId !== "unknown") { + if (stream && errorChatId !== "unknown" && errorMsgId !== "unknown") { @@ - if (allMessages.length > 0) { - const lastMessage = allMessages[allMessages.length - 1] - await addErrMessageToMessage(lastMessage, errFromMap) - } + if (allMessages.length > 0) { + const lm = allMessages[allMessages.length - 1] + await addErrMessageToMessage(lm, errFromMap) + } @@ - await stream.writeSSE({ + if (stream) await stream.writeSSE({ event: ChatSSEvents.Error, data: errFromMap, }) - await addErrMessageToMessage(lastMessage, errFromMap) + if (typeof lastMessage !== "undefined") { + await addErrMessageToMessage(lastMessage, errFromMap) + }
5048-5066
: Top-level catch: stream may be undefined.Protect writeSSE with a null-check to avoid cascading errors outside the streaming context.
- if (stream) { - await stream.writeSSE({ - event: ChatSSEvents.Error, - data: errFromMap, - }) - } + if (stream) { + await stream.writeSSE({ + event: ChatSSEvents.Error, + data: errFromMap, + }) + }Note: apply same guard anywhere writeSSE is used in catch paths.
♻️ Duplicate comments (11)
frontend/src/components/ChatBox.tsx (1)
96-113
: Centralize getFileIcon in a shared util to avoid cross-component coupling.AttachmentPreview importing this from ChatBox is awkward; extract once and import where needed.
Apply in this file (remove local impl):
-export const getFileIcon = (fileType: FileType | string | undefined) => { - switch (fileType) { - case FileType.IMAGE: - return <FileImage size={24} className="text-blue-500 dark:text-blue-400 flex-shrink-0" /> - case FileType.DOCUMENT: - return <FileText size={24} className="text-blue-600 dark:text-blue-400 flex-shrink-0" /> - case FileType.SPREADSHEET: - return <FileSpreadsheet size={24} className="text-green-600 dark:text-green-400 flex-shrink-0" /> - case FileType.PRESENTATION: - return <Presentation size={24} className="text-orange-600 dark:text-orange-400 flex-shrink-0" /> - case FileType.PDF: - return <FileText size={24} className="text-red-600 dark:text-red-400 flex-shrink-0" /> - case FileType.TEXT: - return <FileText size={24} className="text-gray-600 dark:text-gray-400 flex-shrink-0" /> - default: - return <File size={24} className="text-gray-500 dark:text-gray-400 flex-shrink-0" /> - } -} +// Moved to '@/utils/fileIcons'. Import and reuse: +// import { getFileIcon } from "@/utils/fileIcons"New util (add file frontend/src/utils/fileIcons.ts):
import React from "react" import { File, FileImage, FileSpreadsheet, FileText, Presentation } from "lucide-react" import { FileType } from "shared/types" export const getFileIcon = (fileType?: FileType) => { switch (fileType) { case FileType.IMAGE: return <FileImage size={24} className="text-blue-500 dark:text-blue-400 flex-shrink-0" /> case FileType.DOCUMENT: return <FileText size={24} className="text-blue-600 dark:text-blue-400 flex-shrink-0" /> case FileType.SPREADSHEET: return <FileSpreadsheet size={24} className="text-green-600 dark:text-green-400 flex-shrink-0" /> case FileType.PRESENTATION: return <Presentation size={24} className="text-orange-600 dark:text-orange-400 flex-shrink-0" /> case FileType.PDF: return <FileText size={24} className="text-red-600 dark:text-red-400 flex-shrink-0" /> case FileType.TEXT: return <FileText size={24} className="text-gray-600 dark:text-gray-400 flex-shrink-0" /> default: return <File size={24} className="text-gray-500 dark:text-gray-400 flex-shrink-0" /> } }server/api/chat/chat.ts (9)
4327-4329
: Use camelCase variable in condition.Apply:
- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds && imageAttachmentFileIds?.length > 0)
4355-4356
: Pass camelCase attachment IDs to the given-context handler.Apply:
- ImageAttachmentFileIds, + imageAttachmentFileIds,
5612-5612
: Rename for consistency and future strictness.Apply:
- let ImageAttachmentFileIds: string[] = [] + let imageAttachmentFileIds: string[] = []
5759-5759
: Condition should reference camelCase array.Apply:
- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds && imageAttachmentFileIds?.length > 0)
5791-5793
: Pass camelCase array to retry flow.Apply:
- ImageAttachmentFileIds, + imageAttachmentFileIds,
4070-4072
: Rename to camelCase and fix type narrowing for attachment IDs.map(...).filter(...) leaves (string | undefined)[] under strict TS; use flatMap and idiomatic camelCase.
Apply:
- const ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) - const NonImageAttachmentFileIds = attachmentMetadata.filter(m => !m.isImage).map(m => m.fileId) + const imageAttachmentFileIds: string[] = attachmentMetadata.flatMap( + (m) => (m?.isImage && m.fileId ? [m.fileId] : []), + ) + const nonImageAttachmentFileIds: string[] = attachmentMetadata.flatMap( + (m) => (!m?.isImage && m.fileId ? [m.fileId] : []), + )
4113-4117
: Ensure fileIds stays string[] and use the camelCase IDs.Avoid possibly undefined and keep type stable.
Apply:
- isMsgWithContext = isMsgWithContext || (NonImageAttachmentFileIds && NonImageAttachmentFileIds.length > 0) - let fileIds = extractedInfo?.fileIds - if (NonImageAttachmentFileIds && NonImageAttachmentFileIds.length > 0) { - fileIds = fileIds.concat(NonImageAttachmentFileIds) - } + isMsgWithContext = + isMsgWithContext || (nonImageAttachmentFileIds?.length ?? 0) > 0 + let fileIds: string[] = extractedInfo.fileIds ?? [] + if (nonImageAttachmentFileIds.length > 0) { + fileIds = fileIds.concat(nonImageAttachmentFileIds) + }
5618-5618
: Use flatMap to keep string[] and skip falsy IDs.Apply:
- ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) + imageAttachmentFileIds = attachmentMetadata.flatMap( + (m) => (m?.isImage && m.fileId ? [m.fileId] : []), + )
5671-5674
: Same narrowing bug here; avoid (string | null)[].Apply:
- ImageAttachmentFileIds = attachmentMetadata.map( - (m: AttachmentMetadata) => m.isImage ? m.fileId : null, - ).filter((m: string | null) => m !== null) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [], + )server/api/files.ts (1)
223-223
: randomUUID source portability.Use node:crypto’s randomUUID to avoid runtime differences; fall back if needed.
+import { randomUUID } from "node:crypto" @@ - const fileId = `att_${crypto.randomUUID()}` + const fileId = `att_${(typeof randomUUID === 'function' ? randomUUID() : (globalThis.crypto?.randomUUID?.() ?? Math.random().toString(36).slice(2)))}`
🧹 Nitpick comments (19)
server/ai/provider/base.ts (2)
47-49
: Name the regex explicitly and add a tiny parser helper."regex" is too generic for a public export. A clearer name reduces misuse across providers and enables typed parsing in one place.
Apply:
-// format: docIndex_docId_imageNumber -export const regex = /^([0-9]+)_(.+)_([0-9]+)$/ +// format: docIndex_docId_imageNumber +export const IMAGE_NAME_REGEX = /^([0-9]+)_(.+)_([0-9]+)$/ + +/** + * Parse "docIndex_docId_imageNumber" and validate docId for traversal chars. + */ +export function parseImageRef(imgPath: string): { + docIndex: string; docId: string; imageNumber: string +} { + const m = imgPath.match(IMAGE_NAME_REGEX) + if (!m) throw new Error(`Invalid image path: ${imgPath} (expected docIndex_docId_imageNumber)`) + const [, docIndex, docId, imageNumber] = m + if (docId.includes("..") || docId.includes("/") || docId.includes("\\")) { + throw new Error(`Invalid docId: ${docId}`) + } + return { docIndex, docId, imageNumber } +}Follow-ups: replace imports of
{ regex }
with{ IMAGE_NAME_REGEX, parseImageRef }
in providers and useparseImageRef(imgPath)
.
50-57
: Consider making file lookup resilient and non-blocking.
readdirSync
blocks the event loop and throws; most call sites don’t catch it. An async variant that returnsnull
on miss would make multi-attachment flows degrade gracefully.For example:
export function findImageByName(directory: string, imageName: string) { - const files = fs.readdirSync(directory) + const files = fs.readdirSync(directory) const match = files.find((file) => path.parse(file).name === imageName) if (!match) { throw new Error(`Image "${imageName}" not found`) } return path.join(directory, match) } + +export async function findImageByNameAsync(directory: string, imageName: string) { + try { + const files = await fs.promises.readdir(directory) + const match = files.find((file) => path.parse(file).name === imageName) + return match ? path.join(directory, match) : null + } catch { + return null + } +}Then providers can skip missing images instead of failing the whole request.
server/ai/provider/gemini.ts (2)
34-49
: Guard findImageByName and skip missing images instead of failing all attachments.
findImageByName
throws and is called outside any try/catch; one missing image will reject the entire Promise.all.Apply:
- const imageDir = path.join(baseDir, docId) - const absolutePath = findImageByName(imageDir, imageNumber) - const extension = path.extname(absolutePath).toLowerCase() + const imageDir = path.join(baseDir, docId) + let absolutePath: string + try { + absolutePath = findImageByName(imageDir, imageNumber) + } catch (e) { + Logger.warn(`Image not found: ${path.join(imageDir, imageNumber)}.* Skipping.`) + return null + } + const extension = path.extname(absolutePath).toLowerCase()
69-75
: Harden baseDir containment check.
startsWith
can be brittle on similar prefixes. Usepath.relative
to robustly detect escapes.- const resolvedPath = path.resolve(imageDir) - if (!resolvedPath.startsWith(baseDir)) { + const resolvedPath = path.resolve(imageDir) + const rel = path.relative(baseDir, resolvedPath) + if (rel.startsWith("..")) { Logger.error(`Path traversal attempt detected: ${imageDir}`) throw new Error(`Invalid path: ${imageDir}`) }server/ai/provider/bedrock.ts (2)
33-39
: Avoid failing the whole request on a single missing image.Same rationale:
findImageByName
may throw; wrap and skip.- const docIndex = match[1] - const docId = match[2] - const imageNumber = match[3] + const docIndex = match[1] + const docId = match[2] + const imageNumber = match[3] @@ - const imageDir = path.join(baseDir, docId) - const absolutePath = findImageByName(imageDir, imageNumber) - const extension = path.extname(absolutePath).toLowerCase() + const imageDir = path.join(baseDir, docId) + let absolutePath: string + try { + absolutePath = findImageByName(imageDir, imageNumber) + } catch { + Logger.warn(`Image not found: ${path.join(imageDir, imageNumber)}. Skipping.`) + return null + } + const extension = path.extname(absolutePath).toLowerCase()
71-76
: Use path.relative for containment.- const resolvedPath = path.resolve(imageDir) - if (!resolvedPath.startsWith(baseDir)) { + const resolvedPath = path.resolve(imageDir) + const rel = path.relative(baseDir, resolvedPath) + if (rel.startsWith("..")) { Logger.error(`Path traversal attempt detected: ${imageDir}`) throw new Error(`Invalid path: ${imageDir}`) }server/ai/provider/openai.ts (2)
23-29
: Gracefully handle missing images before reading.Wrap
findImageByName
to avoid rejecting all images when one is absent.- const match = imgPath.match(regex) + const match = imgPath.match(regex) @@ - const docIndex = match[1] - const docId = match[2] - const imageNumber = match[3] + const docIndex = match[1] + const docId = match[2] + const imageNumber = match[3] @@ - const imageDir = path.join(baseDir, docId) - const absolutePath = findImageByName(imageDir, imageNumber) + const imageDir = path.join(baseDir, docId) + let absolutePath: string + try { + absolutePath = findImageByName(imageDir, imageNumber) + } catch { + Logger.warn(`Image not found: ${path.join(imageDir, imageNumber)}. Skipping.`) + return null + }
61-67
: Robust path containment check.- const resolvedPath = path.resolve(imageDir) - if (!resolvedPath.startsWith(baseDir)) { + const resolvedPath = path.resolve(imageDir) + const rel = path.relative(baseDir, resolvedPath) + if (rel.startsWith("..")) { Logger.error(`Path traversal attempt detected: ${imageDir}`) throw new Error(`Invalid path: ${imageDir}`) }server/ai/provider/vertex_ai.ts (3)
52-55
: Handle not-found images and log oversize skips.Move
findImageByName
under try/catch so one missing file doesn’t fail all attachments; also log when skipping due to size.- const imageDir = path.join(baseDir, docId) - const absolutePath = findImageByName(imageDir, imageNumber) - const extension = path.extname(absolutePath).toLowerCase() + const imageDir = path.join(baseDir, docId) + let absolutePath: string + try { + absolutePath = findImageByName(imageDir, imageNumber) + } catch { + Logger.warn(`Image not found: ${path.join(imageDir, imageNumber)}. Skipping.`) + return null + } + const extension = path.extname(absolutePath).toLowerCase() @@ - await fs.promises.access(absolutePath, fs.constants.F_OK) - const imgBuffer = await fs.promises.readFile(absolutePath) - if (imgBuffer.length > MAX_IMAGE_SIZE_BYTES) return null + await fs.promises.access(absolutePath, fs.constants.F_OK) + const imgBuffer = await fs.promises.readFile(absolutePath) + if (imgBuffer.length > MAX_IMAGE_SIZE_BYTES) { + Logger.warn( + `Image buffer too large after read (${imgBuffer.length} bytes): ${absolutePath}. Skipping.`, + ) + return null + }Also applies to: 80-88
73-78
: Use path.relative for containment.- const resolvedPath = path.resolve(imageDir) - if (!resolvedPath.startsWith(baseDir)) { + const resolvedPath = path.resolve(imageDir) + const rel = path.relative(baseDir, resolvedPath) + if (rel.startsWith("..")) { Logger.error(`Path traversal attempt detected: ${imageDir}`) throw new Error(`Invalid path: ${imageDir}`) }
56-57
: Comment tweak for accuracy.This maps extensions to media types (Vertex), not “Bedrock format values.”
- // Map file extensions to Bedrock format values + // Map file extensions to media types for Vertexfrontend/src/components/ChatBox.tsx (2)
2355-2391
: Avoid double validation and duplicate toasts on paste.You validate here and again inside processFiles → validateAndDeduplicateFiles, causing duplicate “Invalid file(s)” toasts.
Apply:
- // Check if the file type is supported - const isValid = validateAndDeduplicateFiles([file], showToast) + // Check support without surfacing duplicate toasts + const silentToast = (_t: string, _d: string, _e?: boolean) => {} + const isValid = validateAndDeduplicateFiles([file], silentToast) if (isValid.length > 0) { // Process the pasted file processFiles([file]) const fileType = getFileType({ type: file.type, name: file.name })
3788-3790
: Drop non-standard image/jpg from accept.Browsers recognize image/jpeg; image/jpg is redundant/non-standard.
-accept="image/jpeg,image/jpg,image/png,image/gif,image/webp, ... +accept="image/jpeg,image/png,image/gif,image/webp, ...frontend/src/components/FileUpload.tsx (2)
381-386
: User message under-specifies validation rules.When all dropped items are rejected, the toast mentions only “under 40MB” but ignores image 5MB cap and type allowlist, which will confuse users.
- "No valid files found", - "Files must be under 40MB. All oversized files were ignored.", + "No valid files found", + "Files must be under 40MB (images under 5MB) and of supported types. Oversized/unsupported files were ignored.",
390-391
: Unnecessary hook dependency.isValidFile is a module import and won’t change; remove it from the dependency array to avoid churn.
- [processFiles, showToast, isValidFile], + [processFiles, showToast],frontend/src/utils/fileUtils.ts (2)
32-38
: createImagePreview should rely on base MIME.Guard against parameters (e.g., "image/jpeg; charset=binary").
- if (isImageFile(file.type)) { + if (isImageFile(file.type.split(";")[0])) { return URL.createObjectURL(file) }
149-202
: Tree builder: maintain folder aggregates and metadata.Currently files count/updated fields aren’t aggregated up the tree; optional but improves UX for folder rows.
I can provide a small helper to bubble up files and latest updatedAt if you want this behavior.
server/shared/types.ts (1)
125-131
: Consider adding svg/bmp if desired.If you plan to support vector/legacy images, add "image/svg+xml" and "image/bmp" and their extensions.
server/api/files.ts (1)
15-17
: Remove unused import.NAMESPACE is unused.
-import { - insert, - NAMESPACE, -} from "../search/vespa" +import { insert } from "../search/vespa"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
frontend/src/components/ChatBox.tsx
(11 hunks)frontend/src/components/FileUpload.tsx
(1 hunks)frontend/src/utils/fileUtils.ts
(1 hunks)server/ai/provider/base.ts
(1 hunks)server/ai/provider/bedrock.ts
(2 hunks)server/ai/provider/gemini.ts
(2 hunks)server/ai/provider/openai.ts
(2 hunks)server/ai/provider/vertex_ai.ts
(3 hunks)server/api/chat/agents.ts
(15 hunks)server/api/chat/chat.ts
(13 hunks)server/api/files.ts
(5 hunks)server/shared/fileUtils.ts
(1 hunks)server/shared/types.ts
(1 hunks)server/utils/image.ts
(0 hunks)shared/filesutils.ts
(0 hunks)
💤 Files with no reviewable changes (2)
- server/utils/image.ts
- shared/filesutils.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-11T08:44:08.412Z
Learnt from: MayankBansal2004
PR: xynehq/xyne#719
File: server/api/chat/chat.ts:4023-4029
Timestamp: 2025-08-11T08:44:08.412Z
Learning: In server/api/chat/chat.ts and similar files, MayankBansal2004 prefers not to add explicit type definitions for metadata.usage structures when the receiving array is already typed and default values ensure type safety through the || 0 pattern.
Applied to files:
server/api/chat/agents.ts
server/api/chat/chat.ts
🧬 Code graph analysis (10)
server/ai/provider/bedrock.ts (1)
server/ai/provider/base.ts (1)
regex
(48-48)
server/shared/fileUtils.ts (1)
server/shared/types.ts (2)
MIME_TYPE_MAPPINGS
(124-152)EXTENSION_MAPPINGS
(155-162)
server/ai/provider/gemini.ts (1)
server/ai/provider/base.ts (1)
regex
(48-48)
server/ai/provider/vertex_ai.ts (1)
server/ai/provider/base.ts (2)
regex
(48-48)findImageByName
(50-57)
frontend/src/utils/fileUtils.ts (3)
server/shared/types.ts (2)
MIME_TYPE_MAPPINGS
(124-152)EXTENSION_MAPPINGS
(155-162)frontend/src/types/knowledgeBase.ts (1)
File
(61-71)server/shared/fileUtils.ts (1)
isImageFile
(4-8)
server/ai/provider/openai.ts (1)
server/ai/provider/base.ts (2)
regex
(48-48)findImageByName
(50-57)
server/api/chat/agents.ts (1)
server/api/chat/utils.ts (2)
isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)
server/api/files.ts (6)
server/shared/fileUtils.ts (1)
isImageFile
(4-8)server/integrations/dataSource/config.ts (2)
isImageFile
(89-92)getBaseMimeType
(74-76)server/utils/image.ts (2)
getThumbnailPath
(49-51)generateThumbnail
(18-47)server/services/fileProcessor.ts (1)
FileProcessorService
(22-161)server/search/vespa.ts (1)
insert
(158-169)server/search/types.ts (1)
KbItemsSchema
(25-25)
server/api/chat/chat.ts (4)
server/shared/types.ts (2)
dataSourceFileSchema
(34-34)AttachmentMetadata
(657-657)server/search/types.ts (1)
dataSourceFileSchema
(23-23)server/api/chat/utils.ts (2)
isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)server/db/attachment.ts (1)
getAttachmentsByMessageId
(54-120)
frontend/src/components/ChatBox.tsx (1)
frontend/src/utils/fileUtils.ts (2)
getFileType
(9-30)validateAndDeduplicateFiles
(84-128)
🔇 Additional comments (9)
frontend/src/components/ChatBox.tsx (3)
31-35
: Icon and type imports look good.These are minimal and directly used; no concerns.
Also applies to: 50-50, 82-83
737-738
: LGTM: Persisting detected fileType.Correctly storing the derived FileType alongside preview metadata.
2721-2722
: Verify label rendering for fileType.If FileType is a numeric enum, UI will show numbers. Confirm it’s a string enum or map to labels before rendering.
If needed:
const fileTypeLabel = (t?: FileType) => ({ [FileType.IMAGE]:"Image", [FileType.DOCUMENT]:"Document", [FileType.SPREADSHEET]:"Spreadsheet", [FileType.PRESENTATION]:"Presentation", [FileType.PDF]:"PDF", [FileType.TEXT]:"Text", [FileType.FILE]:"File" } as const)[t ?? FileType.FILE]Use {fileTypeLabel(selectedFile.fileType)}.
Also applies to: 2739-2740
server/api/chat/chat.ts (3)
2153-2153
: LGTM: passes computed finalImageFileNames to RAG.Correctly threads the new image names list into the model call.
2169-2175
: Good: end span before early return on attachments.This prevents trace leaks on the “attachments but no answer” branch.
121-122
: EnumKnowledgeBaseEntity
is properly exported and used
export enum KnowledgeBaseEntity
exists in server/search/types.ts (line 175).- Import from
@/search/types
resolves correctly and will be retained in production builds, as the enum is referenced at runtime.frontend/src/components/FileUpload.tsx (1)
15-15
: Verified shared alias and browser-safe fileUtils
Vite’s resolve.alias correctly maps “shared” to ../server/shared, and server/shared/fileUtils.ts (plus its types) uses only browser-safe APIs with no Node-only imports—no changes needed.server/shared/types.ts (1)
112-162
: LGTM; clear, centralized file type taxonomy.Enum + MIME/extension maps look good and make classification consistent across FE/BE.
server/api/files.ts (1)
363-394
: Non-image attachments 404 by design—ensure clients don’t fetch them.handleAttachmentServe returns 404 for non-images. Verify UI never attempts to GET non-image attachments via this endpoint (it should use Vespa metadata instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (6)
frontend/src/utils/fileUtils.ts (1)
9-20
: getFileType: mapping-first detection is solid (fixes prior image misclassification).Exact base-MIME matching + extension fallback aligns frontend and backend types and addresses the earlier “plain object” image misclassification and substring MIME issues. Nice.
frontend/src/components/ChatBox.tsx (1)
96-112
: De-duplicate and decouple: move getFileIcon to a shared util and import here.This function is shared and also imported by other components; keeping it in ChatBox creates odd dependencies and risks drift.
Apply:
-export const getFileIcon = (fileType: FileType | string | undefined) => { - switch (fileType) { - case FileType.IMAGE: - return <FileImage size={24} className="text-blue-500 dark:text-blue-400 flex-shrink-0" /> - case FileType.DOCUMENT: - return <FileText size={24} className="text-blue-600 dark:text-blue-400 flex-shrink-0" /> - case FileType.SPREADSHEET: - return <FileSpreadsheet size={24} className="text-green-600 dark:text-green-400 flex-shrink-0" /> - case FileType.PRESENTATION: - return <Presentation size={24} className="text-orange-600 dark:text-orange-400 flex-shrink-0" /> - case FileType.PDF: - return <FileText size={24} className="text-red-600 dark:text-red-400 flex-shrink-0" /> - case FileType.TEXT: - return <FileText size={24} className="text-gray-600 dark:text-gray-400 flex-shrink-0" /> - default: - return <File size={24} className="text-gray-500 dark:text-gray-400 flex-shrink-0" /> - } -} +// Use shared utility +// import { getFileIcon, getFileTypeLabel } from "@/utils/fileIcons"And add this new utility:
// frontend/src/utils/fileIcons.ts import React from "react" import { File, FileText, FileSpreadsheet, Presentation, FileImage } from "lucide-react" import { FileType } from "shared/types" export const getFileIcon = (fileType: FileType | string | undefined) => { switch (fileType) { case FileType.IMAGE: return <FileImage size={24} className="text-blue-500 dark:text-blue-400 flex-shrink-0" /> case FileType.DOCUMENT: return <FileText size={24} className="text-blue-600 dark:text-blue-400 flex-shrink-0" /> case FileType.SPREADSHEET: return <FileSpreadsheet size={24} className="text-green-600 dark:text-green-400 flex-shrink-0" /> case FileType.PRESENTATION: return <Presentation size={24} className="text-orange-600 dark:text-orange-400 flex-shrink-0" /> case FileType.PDF: return <FileText size={24} className="text-red-600 dark:text-red-400 flex-shrink-0" /> case FileType.TEXT: return <FileText size={24} className="text-gray-600 dark:text-gray-400 flex-shrink-0" /> default: return <File size={24} className="text-gray-500 dark:text-gray-400 flex-shrink-0" /> } } export const getFileTypeLabel = (t?: FileType | string) => { switch (t) { case FileType.IMAGE: return "Image" case FileType.DOCUMENT: return "Document" case FileType.SPREADSHEET: return "Spreadsheet" case FileType.PRESENTATION: return "Presentation" case FileType.PDF: return "PDF" case FileType.TEXT: return "Text" default: return "File" } }Note: After moving, remove now-unused lucide-react icon imports from ChatBox if any.
server/api/chat/chat.ts (4)
181-181
: Fix import alias for file utils (build-breaker).Use the "@/..." alias like the rest of the server imports; current path likely won’t resolve.
-import { isImageFile } from "shared/fileUtils" +import { isImageFile } from "@/shared/fileUtils"
4071-4074
: Build image/non-image attachment arrays type-safely and guard parse result.Ensure we never produce (string | undefined)[] and handle undefined metadata.
- const attachmentMetadata = parseAttachmentMetadata(c) - const imageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) - const nonImageAttachmentFileIds = attachmentMetadata.filter(m => !m.isImage).map(m => m.fileId) + const attachmentMetadata = parseAttachmentMetadata(c) || [] + const imageAttachmentFileIds: string[] = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + ) + const nonImageAttachmentFileIds: string[] = attachmentMetadata.flatMap((m) => + !m?.isImage && m.fileId ? [m.fileId] : [] + )
4107-4119
: fileIds may be undefined; concat can throw. Default and append safely.Current concat on possibly-undefined fileIds is a runtime risk.
- let isMsgWithContext = isMessageWithContext(message) + let isMsgWithContext = isMessageWithContext(message) @@ - isMsgWithContext = isMsgWithContext || (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) - let fileIds = extractedInfo?.fileIds - if (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) { - fileIds = fileIds.concat(nonImageAttachmentFileIds) - } + isMsgWithContext = + isMsgWithContext || (nonImageAttachmentFileIds?.length ?? 0) > 0 + let fileIds: string[] = extractedInfo?.fileIds ?? [] + if (nonImageAttachmentFileIds.length > 0) { + fileIds = fileIds.concat(nonImageAttachmentFileIds) + }
5614-5614
: Retry flow: rename to camelCase and avoid (string | null)[] narrowing issues.Use imageAttachmentFileIds and flatMap to keep string[]; update all usages accordingly.
- let ImageAttachmentFileIds: string[] = [] + let imageAttachmentFileIds: string[] = []- ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + )- ImageAttachmentFileIds = attachmentMetadata.map( - (m: AttachmentMetadata) => m.isImage ? m.fileId : null, - ).filter((m: string | null) => m !== null) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + )- (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds && imageAttachmentFileIds.length > 0)- ImageAttachmentFileIds, + imageAttachmentFileIds,Run to find any remaining PascalCase identifiers:
#!/bin/bash rg -nP '\b(ImageAttachmentFileIds|NonImageAttachmentFileIds)\b' --type ts -C2Also applies to: 5673-5676, 5762-5762, 5792-5792
🧹 Nitpick comments (7)
frontend/src/utils/fileUtils.ts (1)
22-30
: Optional: trim base MIME; minor perf nit.Tiny hardening and negligible perf polish:
- Trim parameters after the “;”.
- If this runs hot, prebuild Sets for MIME/extension lookups at module load; not required.
- const baseMime = mimeType.split(";")[0] + const baseMime = mimeType.split(";")[0].trim()frontend/src/components/ChatBox.tsx (3)
2355-2391
: Avoid double validation/toasts on paste; rely on processFiles’ path.validateAndDeduplicateFiles already shows error toasts for invalid files; the extra “Unsupported file type” branch can double-toast. Let processFiles handle the normal add path.
Apply:
- // Check if the file type is supported - const isValid = validateAndDeduplicateFiles([file], showToast) - if (isValid.length > 0) { - // Process the pasted file - processFiles([file]) - const fileType = getFileType({ type: file.type, name: file.name }) - - showToast( - "File pasted", - `${fileType} has been added to your message.`, - false, - ) - return // Exit early since we handled the file - } else { - showToast( - "Unsupported file type", - "This file type is not supported for attachments.", - true, - ) - return - } + // Defer to processFiles (it validates + dedupes + toasts) + processFiles([file]) + // Optional: friendly success toast + const ft = getFileType({ type: file.type, name: file.name }) + // Use a human label (see getFileTypeLabel util) + // showToast("File pasted", `${getFileTypeLabel(ft)} added.`, false) + return
2721-2723
: Use a human label for file type, not the enum value.Showing “IMAGE/SPREADSHEET” is a bit raw. Prefer “Image/Spreadsheet”.
Apply:
- <span className="text-white text-xs opacity-80 block"> - {selectedFile.fileType} - </span> + <span className="text-white text-xs opacity-80 block"> + {getFileTypeLabel(selectedFile.fileType)} + </span>- {selectedFile.fileType} + {getFileTypeLabel(selectedFile.fileType)}Also applies to: 2739-2739
3788-3790
: Keep input accept list in sync with shared mappings via a single source of truth.Avoid drift between UI accept, frontend utils, and backend mappings.
Apply (plus a small util addition):
- accept="image/jpeg,image/jpg,image/png,image/gif,image/webp,text/plain,text/csv,application/pdf,application/msword,application/vnd.openxmlformats-officedocument.wordprocessingml.document,application/vnd.ms-excel,application/vnd.openxmlformats-officedocument.spreadsheetml.sheet,application/vnd.ms-powerpoint,application/vnd.openxmlformats-officedocument.presentationml.presentation,text/markdown,.txt,.csv,.pdf,.doc,.docx,.xls,.xlsx,.ppt,.pptx,.md" + accept={ACCEPT_STRING}Add in frontend/src/utils/fileUtils.ts:
// Place near other exports export const ACCEPT_STRING = [ ...Object.values(MIME_TYPE_MAPPINGS).flat(), ...Object.values(EXTENSION_MAPPINGS).flat(), ].join(",")server/api/chat/agents.ts (2)
756-759
: Deduplicate fileIds after merging non-image attachmentsPrevents redundant Vespa lookups and duplicate fragments.
Apply:
- let fileIds = extractedInfo?.fileIds - if (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) { - fileIds = [...fileIds, ...nonImageAttachmentFileIds] - } + let fileIds = extractedInfo?.fileIds + if (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) { + fileIds = Array.from(new Set([...(fileIds ?? []), ...nonImageAttachmentFileIds])) + }
3721-3733
: Context detection and fileIds merge — add dedupeMarking isMsgWithContext true when non-image attachments exist is good. Also dedupe merged fileIds to prevent repeats.
Apply:
- isMsgWithContext = isMsgWithContext || (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) - let fileIds = extractedInfo?.fileIds - if (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) { - fileIds = [...fileIds, ...nonImageAttachmentFileIds] - } + isMsgWithContext = isMsgWithContext || (!!nonImageAttachmentFileIds?.length) + let fileIds = extractedInfo?.fileIds ?? [] + if (nonImageAttachmentFileIds?.length) { + fileIds = Array.from(new Set([...fileIds, ...nonImageAttachmentFileIds])) + }server/api/chat/chat.ts (1)
2116-2120
: Attachment image citation key format likely incompatible with parser.textToImageCitationIndex expects “[]” (digits only). Keys like “index_0” won’t match and won’t resolve via results[docIndex]. Either (a) inject attachments into results and use numeric indices, or (b) avoid emitting image citations for attachments.
Would you like a patch to inject lightweight “attachment result” stubs so citations can resolve?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
frontend/src/components/ChatBox.tsx
(11 hunks)frontend/src/utils/fileUtils.ts
(1 hunks)server/api/chat/agents.ts
(15 hunks)server/api/chat/chat.ts
(13 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-11T08:44:08.412Z
Learnt from: MayankBansal2004
PR: xynehq/xyne#719
File: server/api/chat/chat.ts:4023-4029
Timestamp: 2025-08-11T08:44:08.412Z
Learning: In server/api/chat/chat.ts and similar files, MayankBansal2004 prefers not to add explicit type definitions for metadata.usage structures when the receiving array is already typed and default values ensure type safety through the || 0 pattern.
Applied to files:
server/api/chat/chat.ts
server/api/chat/agents.ts
🧬 Code graph analysis (4)
frontend/src/components/ChatBox.tsx (1)
frontend/src/utils/fileUtils.ts (2)
getFileType
(9-31)validateAndDeduplicateFiles
(85-129)
frontend/src/utils/fileUtils.ts (2)
server/shared/types.ts (2)
MIME_TYPE_MAPPINGS
(124-152)EXTENSION_MAPPINGS
(155-162)server/shared/fileUtils.ts (1)
isImageFile
(4-8)
server/api/chat/chat.ts (4)
server/shared/types.ts (2)
dataSourceFileSchema
(34-34)AttachmentMetadata
(657-657)server/search/types.ts (1)
dataSourceFileSchema
(23-23)server/api/chat/utils.ts (2)
isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)server/db/attachment.ts (1)
getAttachmentsByMessageId
(54-120)
server/api/chat/agents.ts (1)
server/api/chat/utils.ts (2)
isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)
🔇 Additional comments (16)
frontend/src/utils/fileUtils.ts (1)
35-39
: createImagePreview: LGTM.Uses MIME-based check and revokes URLs elsewhere to avoid leaks.
frontend/src/components/ChatBox.tsx (3)
93-94
: Type-align SelectedFile.fileType with FileType enum (good).Prevents arbitrary strings and keeps icon/type detection consistent.
732-739
: Storing computed fileType on add: LGTM.Keeps render logic cheap and prevents recomputation on every render.
2801-2804
: Updated tooltip text reads well.Clear and aligned with supported types.
server/api/chat/agents.ts (9)
146-147
: Import for attachment entity looks correctBrings in KnowledgeBaseEntity for attachment gating in citations. Good.
317-321
: Skip citations for chat attachments — correct behaviorAvoids emitting citations for KnowledgeBaseEntity.Attachment. Ensure upstream consistently sets entity for attachments.
Would you confirm that all attachment-ingested items set source.entity === KnowledgeBaseEntity.Attachment?
743-746
: Filter→map pattern for type-safety looks goodSwitch to filter().map() narrows to string[] without leaking nulls.
3704-3704
: Bypass RagOff when attachments exist — correct gateKeeps attachment flows working even if agent disables RAG.
3923-3926
: Correct: treat image attachments as explicit contextEnsures UnderstandMessageAndAnswerForGivenContext path is taken when only image attachments are provided.
3957-3959
: Plumb imageAttachmentFileIds to contextual answererGood: allows image reasoning even without non-image fileIds.
4926-4929
: Non-stream path mirrors streaming gating — goodEnsures parity when only image attachments are present.
4942-4943
: Non-stream: pass imageAttachmentFileIds to context-aware pathConsistent with streaming path.
743-746
: No additional guards needed for parseAttachmentMetadata
parseAttachmentMetadata always returns an AttachmentMetadata[] (never null), so calling.filter()
and.map()
directly is safe.server/api/chat/chat.ts (3)
465-467
: Good: safe field access and clean skip logic.Optional chaining avoids runtime errors; skipping data source and KB attachments in citations is correct.
2153-2153
: Good: use finalImageFileNames consistently.This aligns the provider input with the computed image list.
2170-2177
: Good: end span before early return (fixes trace leaks).Correctly closes generateAnswerSpan in the no-answer-with-attachments branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
server/api/chat/chat.ts (3)
4118-4130
: Ensure fileIds is always string[] and concat safely.Keeps types strict and avoids undefined concat edge cases (also aligns with earlier guidance).
- let isMsgWithContext = isMessageWithContext(message) + let isMsgWithContext = isMessageWithContext(message) const extractedInfo = isMsgWithContext ? await extractFileIdsFromMessage(message, email) : { totalValidFileIdsFromLinkCount: 0, fileIds: [], threadIds: [], } - isMsgWithContext = isMsgWithContext || (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) - let fileIds = extractedInfo?.fileIds - if (nonImageAttachmentFileIds && nonImageAttachmentFileIds.length > 0) { - fileIds = fileIds.concat(nonImageAttachmentFileIds) - } + isMsgWithContext = isMsgWithContext || ((nonImageAttachmentFileIds?.length ?? 0) > 0) + let fileIds: string[] = extractedInfo.fileIds ?? [] + if ((nonImageAttachmentFileIds?.length ?? 0) > 0) { + fileIds = fileIds.concat(nonImageAttachmentFileIds) + }
5625-5631
: Retry flow: fix type narrowing bug and rename PascalCase vars (build-blocker).
- ImageAttachmentFileIds is PascalCase; the codebase uses camelCase.
- map(...).filter(m !== null) yields (string | null)[], then assigned to string[] → TS error. Use flatMap or reduce to produce string[].
Apply:
- let ImageAttachmentFileIds: string[] = [] + let imageAttachmentFileIds: string[] = [] @@ - ImageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + ) @@ - ImageAttachmentFileIds = attachmentMetadata.map( - (m: AttachmentMetadata) => m.isImage ? m.fileId : null, - ).filter((m: string | null) => m !== null) + imageAttachmentFileIds = attachmentMetadata.flatMap((m) => + m?.isImage && m.fileId ? [m.fileId] : [] + ) @@ - (ImageAttachmentFileIds && ImageAttachmentFileIds?.length > 0) + (imageAttachmentFileIds && imageAttachmentFileIds.length > 0) @@ - ImageAttachmentFileIds, + imageAttachmentFileIds,Run to confirm no PascalCase leftovers and no (string | null)[] constructs remain:
#!/bin/bash # Find PascalCase leftovers rg -nP '\b(ImageAttachmentFileIds|NonImageAttachmentFileIds)\b' --type ts -C2 # Check for map-to-null patterns that cause (string|null)[] rg -nP --type ts -C2 'map\s*\([^)]*\?\s*[^:]+:\s*null\)\s*\.filter\('Also applies to: 5684-5687, 5771-5773, 5803-5804
183-183
: Fix path alias for isImageFile import (will break builds).Other imports in this file use "@/..." aliasing. "shared/fileUtils" won’t resolve in most TS path setups and previously caused build failures.
Apply:
-import { isImageFile } from "shared/fileUtils" +import { isImageFile } from "@/shared/fileUtils"
🧹 Nitpick comments (2)
server/api/chat/chat.ts (2)
820-846
: Non-image attachment cleanup via Vespa DeleteDocument is correct; consider concurrency and clearer 404 checks.
- Current for-of deletes serially; for large chats this is slow. Consider limited concurrency with Promise.allSettled and a small pool (e.g., p-limit 5).
- Matching on "404 Not Found" substring is brittle; prefer checking a structured error (status/code) if available from DeleteDocument, or normalize errors at the source.
Example with limited concurrency:
- for (const fileId of nonImageAttachmentFileIds) { - try { - await DeleteDocument(fileId, KbItemsSchema) - loggerWithChild({ email: email }).info( - `Successfully deleted non-image attachment ${fileId} from Vespa kb_items schema`, - ) - } catch (error) { - const errorMessage = getErrorMessage(error) - if (errorMessage.includes("404 Not Found")) { - loggerWithChild({ email: email }).warn( - `Non-image attachment ${fileId} not found in Vespa kb_items schema (may have been already deleted)`, - ) - } else { - loggerWithChild({ email: email }).error( - error, - `Failed to delete non-image attachment ${fileId} from Vespa kb_items schema: ${errorMessage}`, - ) - } - } - } + const pLimit = 5 + let active = 0 + const queue = [...nonImageAttachmentFileIds] + const runNext = async (): Promise<void> => { + const fileId = queue.shift() + if (!fileId) return + active++ + try { + await DeleteDocument(fileId, KbItemsSchema) + loggerWithChild({ email: email }).info( + `Successfully deleted non-image attachment ${fileId} from Vespa kb_items schema`, + ) + } catch (error) { + const errorMessage = getErrorMessage(error) + if (errorMessage.includes("404 Not Found")) { + loggerWithChild({ email: email }).warn( + `Non-image attachment ${fileId} not found in Vespa kb_items schema (may have been already deleted)`, + ) + } else { + loggerWithChild({ email: email }).error( + error, + `Failed to delete non-image attachment ${fileId} from Vespa kb_items schema: ${errorMessage}`, + ) + } + } finally { + active-- + if (queue.length) await runNext() + } + } + await Promise.all(new Array(Math.min(pLimit, queue.length)).fill(0).map(() => runNext()))
4083-4085
: Harden attachment ID extraction + dedupe.Guard against missing fileId and avoid accidental empties; also dedupe to reduce redundant downstream calls.
- const imageAttachmentFileIds = attachmentMetadata.filter(m => m.isImage).map(m => m.fileId) - const nonImageAttachmentFileIds = attachmentMetadata.filter(m => !m.isImage).map(m => m.fileId) + const imageAttachmentFileIds: string[] = Array.from( + new Set( + attachmentMetadata.flatMap((m) => (m?.isImage && m?.fileId ? [m.fileId] : [])), + ), + ) + const nonImageAttachmentFileIds: string[] = Array.from( + new Set( + attachmentMetadata.flatMap((m) => (!m?.isImage && m?.fileId ? [m.fileId] : [])), + ), + )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/api/chat/chat.ts
(16 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-11T08:44:08.412Z
Learnt from: MayankBansal2004
PR: xynehq/xyne#719
File: server/api/chat/chat.ts:4023-4029
Timestamp: 2025-08-11T08:44:08.412Z
Learning: In server/api/chat/chat.ts and similar files, MayankBansal2004 prefers not to add explicit type definitions for metadata.usage structures when the receiving array is already typed and default values ensure type safety through the || 0 pattern.
Applied to files:
server/api/chat/chat.ts
🧬 Code graph analysis (1)
server/api/chat/chat.ts (5)
server/search/types.ts (2)
dataSourceFileSchema
(23-23)KbItemsSchema
(25-25)server/search/vespa.ts (1)
DeleteDocument
(2058-2064)server/utils.ts (1)
getErrorMessage
(89-92)server/api/chat/utils.ts (2)
isMessageWithContext
(497-499)extractFileIdsFromMessage
(507-657)server/db/attachment.ts (1)
getAttachmentsByMessageId
(54-120)
🔇 Additional comments (4)
server/api/chat/chat.ts (4)
96-97
: New imports look correct and scoped appropriately.The DeleteDocument, KnowledgeBaseEntity, and KbItemsSchema imports align with their downstream usage below. No issues.
Also applies to: 122-124
467-471
: Safe field access + correct skip logic for attachments.Optional chaining on fields prevents runtime errors, and excluding datasource/KB attachments from citations is correct.
2125-2166
: Passing finalImageFileNames into RAG is correct and consistent with earlier extraction.Keeps image-context path separate from non-image context as intended.
2181-2188
: Early-return path now ends span; trace leak fixed.Span termination before return avoids dangling spans in no-answer-with-attachments branch.
Description
Testing
Additional Notes
Summary by CodeRabbit
New Features
Improvements
Bug Fixes