Skip to content

Conversation

Himanshvarma
Copy link
Contributor

@Himanshvarma Himanshvarma commented Oct 8, 2025

Description

now attachments will be ingested into file schema instead of kb_items schema

Summary by CodeRabbit

  • New Features

    • Server-backed attachment deletion with batch handling and a dedicated delete endpoint.
    • Improved search: image-aware chunking, image embeddings, and enhanced attachment ranking; sheet-ID expansion for broader results.
    • Centralized file-type detection shared between frontend and server.
  • Bug Fixes

    • UI reliably removes files even if server deletion fails; selection state preserved and previews cleaned up.
  • Chores

    • Routing, schema, and dependency updates to support deletion and ingestion workflows.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 8, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Centralizes file-type detection into a shared utility; frontend ChatBox removeFile becomes async and calls a new server-side delete API while still removing files locally on failure; server adds handleAttachmentDelete + POST /api/v1/files/delete, broadens attachment-entity checks, and extends Vespa file schema with image chunking and embeddings.

Changes

Cohort / File(s) Summary
Frontend: file-type import & utils removal
frontend/src/components/AttachmentPreview.tsx, frontend/src/lib/common.tsx, frontend/src/utils/fileUtils.ts
Removed local getFileType export/logic; replaced imports to use shared/fileUtils; getFileIcon now calls external getFileType; fileUtils.ts no longer exports getFileType.
Frontend: ChatBox remove flow & props
frontend/src/components/ChatBox.tsx
removeFile made async, performs server-side deletion when metadata.fileId (logs errors but still removes locally), revokes object URLs using current selectedFiles, updated hook deps; added props `agentIdFromChatData?: string
Server: centralized attachment deletion & API
server/api/files.ts, server/api/search.ts, server/server.ts
Added handleAttachmentDelete and handleAttachmentDeleteApi; added handleAttachmentDeleteSchema; wired POST /api/v1/files/delete; batch deletion separates image disk deletes and Vespa document deletes with validation and permission checks; exported new handlers.
Server: chat citation & deletion flows
server/api/chat/chat.ts, server/api/chat/agents.ts
Replaced KnowledgeBaseEntity.Attachment checks with AttachmentEntity inclusion checks; ChatDeleteApi accumulates attachments and delegates deletion to handleAttachmentDelete (removed inline deletion); added attachmentRank profile usage for attachment searches; replaced local expandSheetIds with import.
Server: file ingestion & Vespa doc updates
server/api/files.ts, server/vespa/schemas/file.sd
New attf_ ID prefix for uploads; images and non-images ingested under fileSchema with updated fields (permissions, mimeType, metadata); file.sd extended with chunk_meta, image_chunks, image_chunk_embeddings and image-aware rank profile attachmentRank.
Server: shared utils & types
server/shared/fileUtils.ts, server/shared/types.ts
Added server-side getFileType({type,name}) (MIME then extension lookup); exported AttachmentEntity; extended FileEntitySchema union and added attachmentFileTypeMap.
Server: search helper
server/search/utils.ts
Added expandSheetIds(fileId: string): string[] to expand sheet-pattern IDs into enumerated sheet IDs.
Integrations: immutability fix
server/integrations/google/sync.ts
Stop mutating vespaData.permissions in-place; create vespaDataForInsert with permissionsAsString for insertion.
Misc (packages & formatting)
package.json, server/package.json
Added trailing newline to root package.json; bumped @xyne/vespa-ts from 1.1.01.1.4 in server/package.json.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Frontend as ChatBox (UI)
  participant API as /api/v1/files/delete
  participant FilesSvc as server/api/files.ts
  participant Handler as handleAttachmentDelete
  participant Disk as Disk (images)
  participant Vespa as Vespa (fileSchema)

  User->>Frontend: Click remove attachment
  Frontend->>API: POST /api/v1/files/delete { attachment }
  API->>FilesSvc: handleAttachmentDeleteApi(payload)
  FilesSvc->>Handler: handleAttachmentDelete([attachment], ownerEmail)
  alt attachment is image
    Handler->>Disk: remove image directory/files
    Handler->>Vespa: DeleteDocument(fileId) (fileSchema)
  else non-image attachment
    Handler->>Vespa: DeleteDocument(fileId) (fileSchema or KbItemsSchema by prefix)
  end
  Handler-->>FilesSvc: deletion result
  FilesSvc-->>API: 200 / error
  API-->>Frontend: response (Frontend still removes file locally)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • kalpadhwaryu
  • naSim087
  • devesh-juspay

Poem

I nibble bytes and tidy trails,
I hop through chunks and mend the rails,
Shared types snug in one small den,
Attachments vanish—cleaned again,
A happy rabbit hops—code patched then! 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly summarizes the primary change of migrating attachments to the file.sd schema and includes the reference ID, making it concise and specific to the pull request objectives.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/attachmentToFile

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Himanshvarma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request, authored by Himanshvarma, refactors the system to store and manage attachments within a dedicated file.sd schema in Vespa, rather than the previous kb_items schema. Key changes include extensive updates to the file.sd schema to support attachment-specific fields and ranking, refactoring of backend attachment upload and deletion logic to interact with the new schema, and corresponding frontend adjustments to utilize the updated APIs and shared utility functions. A new API endpoint for attachment deletion has also been introduced.

Highlights

  • Intent: This pull request refactors the attachment ingestion process. Previously, attachments were ingested into the kb_items schema in Vespa. This change moves them to the file.sd schema, centralizing attachment management and search capabilities within a dedicated file schema.
  • Key Changes: The changes span both frontend and backend, as well as the Vespa schema definition:
  1. Vespa Schema (server/vespa/schemas/file.sd): The file.sd schema has been significantly extended to accommodate attachment data. This includes new fields for image_chunks, chunks_pos, image_chunks_pos, chunks_map, image_chunks_map, and image_chunk_embeddings. Ranking profiles and document summaries within file.sd have been updated to leverage these new fields for improved search and retrieval of attachment content.
  2. Backend Attachment Handling (server/api/files.ts, server/api/chat/chat.ts, server/api/search.ts, server/server.ts):
    • Non-image attachments are now ingested into the fileSchema instead of KbItemsSchema during upload. The fileId prefix for new attachments is changed from att_ to attf_.
    • A new shared utility function handleAttachmentDelete has been introduced in server/api/files.ts to manage the deletion of both image (from disk) and non-image (from Vespa fileSchema or KbItemsSchema based on fileId prefix) attachments. This function is now used by the chat deletion API (ChatDeleteApi) to ensure proper cleanup of associated attachments.
    • A new API endpoint /files/delete is added to allow frontend-initiated deletion of attachments, including permission checks against the fileSchema.
    • Imports and usage of AttachmentEntity and fileSchema are updated across relevant backend files.
  3. Frontend Updates (frontend/src/components/AttachmentPreview.tsx, frontend/src/components/ChatBox.tsx, frontend/src/lib/common.tsx, frontend/src/utils/fileUtils.ts):
    • The getFileType utility function has been centralized by moving it from frontend/src/utils/fileUtils.ts to server/shared/fileUtils.ts, and frontend components now import it from the shared module.
    • The ChatBox component now includes logic to call the new /files/delete API endpoint when a user removes an already uploaded attachment from the UI.
  4. Shared Types and Utilities (server/shared/types.ts, server/shared/fileUtils.ts):
    • AttachmentEntity is now part of the FileEntitySchema in shared types.
    • A new attachmentFileTypeMap is added to map generic FileType to specific AttachmentEntity types for better categorization within Vespa.
  • Reviewer Activity: No reviewer activity has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a significant and well-executed refactoring to move attachments to the file.sd schema. The changes are consistent across the frontend, backend, and Vespa schema definitions. I've identified a few areas for improvement, mainly concerning error handling and code style, to enhance the robustness and readability of the new implementation.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
server/api/files.ts (2)

428-434: Consider error propagation for DeleteImages failures.

If DeleteImages fails but the subsequent Vespa deletion succeeds, orphaned images remain on disk. Consider whether to:

  1. Rethrow the error to prevent Vespa deletion
  2. Collect failed deletions and report them
  3. Continue as-is if orphaned images are acceptable

486-490: Prefer explicit typing over 'as any'.

While the Array.isArray check makes this safe, using any reduces type safety.

Consider this more type-safe approach:

-    const fields = attachmentDoc.fields as any
-    const permissions = Array.isArray(fields.permissions) ? fields.permissions as string[] : []
+    const fields = attachmentDoc.fields as Record<string, unknown>
+    const permissions = Array.isArray(fields.permissions) 
+      ? (fields.permissions as string[]).filter((p): p is string => typeof p === 'string')
+      : []
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9f2e57 and 66b52ed.

📒 Files selected for processing (2)
  • server/api/files.ts (5 hunks)
  • server/package.json (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (7)
server/shared/types.ts (2)
  • attachmentFileTypeMap (210-218)
  • AttachmentMetadata (707-707)
server/shared/fileUtils.ts (2)
  • getFileType (10-32)
  • isImageFile (4-8)
server/integrations/dataSource/config.ts (2)
  • getBaseMimeType (74-76)
  • isImageFile (89-92)
server/search/vespa.ts (3)
  • insert (20-20)
  • DeleteDocument (24-24)
  • GetDocument (21-21)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/api/dataSource.ts (1)
  • DeleteImages (235-261)
server/api/search.ts (1)
  • handleAttachmentDeleteSchema (259-261)
🔇 Additional comments (6)
server/api/files.ts (6)

8-8: LGTM!

Import additions support the new attachment deletion and file schema migration functionality.

Also applies to: 15-15, 18-18, 23-23, 25-25, 27-27, 30-32


384-393: LGTM!

Path traversal validation prevents security vulnerabilities by rejecting fileIds with directory traversal characters.


436-440: LGTM!

Backward compatibility logic elegantly handles both old (att_) and new (attf_) fileId prefixes during the migration period.


487-490: LGTM!

The permission check correctly handles the case where fields.permissions might not be an array, addressing the previous review concern.


266-266: ProcessFile return value includes all destructured fields The static async processFile method in server/services/fileProcessor.ts returns chunks, chunks_pos, image_chunks, and image_chunks_pos as expected.


230-230: No legacy att_ prefix found outside tests
Non-test code contains no hardcoded att_ prefixes and handleAttachmentDelete already handles both att_ and attf_. Tests and example comments still use the old prefix—a cosmetic concern only.

@Himanshvarma Himanshvarma requested a review from naSim087 October 8, 2025 13:52
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/files.ts (1)

272-301: Sheet ingestion bug: using fileId/title instead of per‑sheet docId/fileName.

For spreadsheets, you compute docId and fileName per processingResult but insert with docId: fileId and title: file.name. This will overwrite sheets or create duplicate IDs.

Apply this diff:

-            const vespaDoc = {
-              title: file.name,
+            const vespaDoc = {
+              title: fileName,
               url: "",
               app: Apps.Attachment,
-              docId: fileId,
+              docId: docId,
               parentId: null,
               owner: email,
               photoLink: "",
               ownerEmail: email,
-              entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })],
+              entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })],
               chunks: chunks,
               chunks_pos: chunks_pos,
               image_chunks: image_chunks,
               image_chunks_pos: image_chunks_pos,
               chunks_map: processingResult.chunks_map,
               image_chunks_map: processingResult.image_chunks_map,
               permissions: [email],
               mimeType: getBaseMimeType(file.type || "text/plain"),
               metadata: JSON.stringify({
-                originalFileName: file.name,
+                originalFileName: fileName,
                 uploadedBy: email,
                 chunksCount: chunks.length,
                 imageChunksCount: image_chunks.length,
                 processingMethod: getBaseMimeType(file.type || "text/plain"),
                 lastModified: Date.now(),
                 ...(('sheetName' in processingResult) && {
                   sheetName: (processingResult as SheetProcessingResult).sheetName,
                   sheetIndex: (processingResult as SheetProcessingResult).sheetIndex,
                   totalSheets: (processingResult as SheetProcessingResult).totalSheets,
                 }),
               }),
               createdAt: Date.now(),
               updatedAt: Date.now(),
             }
♻️ Duplicate comments (3)
server/package.json (1)

81-81: Blocking: @xyne/vespa-ts@1.1.2 appears unpublished; installs will fail unless a private registry serves it.

If your @xyne scope isn’t pointed to a private registry hosting 1.1.2, please either publish 1.1.2, downgrade to a published version, or reference a git commit.

Run to verify registry setup and availability:

#!/bin/bash
set -euo pipefail

echo "== Check for scoped registry config (.npmrc) =="
fd -H '.npmrc' -x sh -c 'echo "--- {} ---"; cat {}' || true
echo
echo "== Look for publishConfig in package.json files =="
fd package.json -t f -x jq -r 'select(.publishConfig!=null) | .name + " -> " + (.publishConfig.registry // "no registry")' || true
echo
echo "== Check lockfiles referencing @xyne/vespa-ts =="
fd -H 'bun.lockb|package-lock.json|pnpm-lock.yaml|yarn.lock' -E 'node_modules' -x sh -c 'echo "--- {} ---"; rg -n "@xyne/vespa-ts" {} || true' || true
echo
echo "== Public npm versions for @xyne/vespa-ts (if any) =="
curl -fsS https://registry.npmjs.org/%40xyne%2Fvespa-ts | jq -r '.["dist-tags"], (.versions|keys|sort|join(", "))' || echo "Package not visible on public npm"

If you need a quick unblock and 1.0.4 suffices, change to a published version:

-    "@xyne/vespa-ts": "1.1.2",
+    "@xyne/vespa-ts": "1.0.4",

If 1.1.2 is required and not yet published, consider a temporary git reference:

-    "@xyne/vespa-ts": "1.1.2",
+    "@xyne/vespa-ts": "git+ssh://git@github.com/xynehq/vespa-ts.git#<commit-sha>",
server/api/files.ts (2)

297-303: Optional: add defensive fallback for entity mapping.

getFileType defaults to FileType.FILE and attachmentFileTypeMap covers FILE, so this is safe today. Adding a fallback guards future changes.

-              entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })],
+              entity:
+                attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })] ??
+                AttachmentEntity.File,

505-536: Unauthorized deletion risk for images; enforce permission/ownership.

Image attachments are deleted without verifying requester’s access. A caller can supply any fileId and set isImage=true.

Suggested approach:

  • Validate that the image attachment belongs to the user before deletion by checking your attachments store (e.g., a lookup by fileId and email in the attachments table), or persist image metadata in Vespa and use the same permissions check as non‑images.
  • Alternatively, require messageId in the payload and verify the attachment exists on that message for the user.

Minimal patch sketch:

-  if(isImage) {
-    await handleAttachmentDelete([attachment], email)
+  if (isImage) {
+    // Verify ownership via attachments metadata
+    const owned = await findAttachmentByFileIdAndEmail(fileId, email) // implement in db/attachment
+    if (!owned) {
+      throw new HTTPException(403, { message: "Access denied to this attachment" })
+    }
+    await handleAttachmentDelete([attachment], email)
     return c.json({ success: true, message: "Attachment deleted successfully" })
   }

If a DB helper doesn’t exist yet, I can draft it. Do you want me to wire this up?

🧹 Nitpick comments (2)
server/api/chat/chat.ts (1)

511-516: Broadened attachment citation skip is correct; consider a small perf tweak.

The Object.values(AttachmentEntity).includes(...) check runs in a tight streaming loop. Precompute a Set once and use .has().

Apply within this block:

-            Object.values(AttachmentEntity).includes(f?.entity)
+            attachmentEntities.has(f?.entity)

Add near imports:

const attachmentEntities = new Set(Object.values(AttachmentEntity))
server/api/files.ts (1)

384-492: Centralized deletion helper is good; add small safety improvements.

  • Deduplicate fileIds to avoid repeated I/O/calls on duplicate metadata.
  • Consider short‑circuiting when arrays are empty to skip logging.

Example:

-  const imageAttachmentFileIds: string[] = []
-  const nonImageAttachmentFileIds: string[] = []
+  const imageAttachmentFileIds = new Set<string>()
+  const nonImageAttachmentFileIds = new Set<string>()
...
-          imageAttachmentFileIds.push(attachment.fileId)
+          imageAttachmentFileIds.add(attachment.fileId)
...
-          nonImageAttachmentFileIds.push(attachment.fileId)
+          nonImageAttachmentFileIds.add(attachment.fileId)
...
-  if (imageAttachmentFileIds.length > 0) {
+  if (imageAttachmentFileIds.size > 0) {
...
-    for (const fileId of imageAttachmentFileIds) {
+    for (const fileId of imageAttachmentFileIds) {
...
-  if (nonImageAttachmentFileIds.length > 0) {
+  if (nonImageAttachmentFileIds.size > 0) {
...
-    for (const fileId of nonImageAttachmentFileIds) {
+    for (const fileId of nonImageAttachmentFileIds) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 66b52ed and a0e76dd.

📒 Files selected for processing (4)
  • server/api/chat/agents.ts (2 hunks)
  • server/api/chat/chat.ts (4 hunks)
  • server/api/files.ts (6 hunks)
  • server/package.json (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/api/chat/agents.ts
🧰 Additional context used
🧬 Code graph analysis (2)
server/api/files.ts (8)
server/shared/types.ts (2)
  • attachmentFileTypeMap (210-218)
  • AttachmentMetadata (707-707)
server/shared/fileUtils.ts (2)
  • getFileType (10-32)
  • isImageFile (4-8)
server/integrations/dataSource/config.ts (2)
  • getBaseMimeType (69-71)
  • isImageFile (84-87)
server/search/vespa.ts (3)
  • insert (20-20)
  • DeleteDocument (24-24)
  • GetDocument (21-21)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/api/chat/chat.ts (1)
  • expandSheetIds (223-242)
server/api/dataSource.ts (1)
  • DeleteImages (235-261)
server/api/search.ts (1)
  • handleAttachmentDeleteSchema (259-261)
server/api/chat/chat.ts (3)
server/shared/types.ts (2)
  • AttachmentEntity (39-39)
  • AttachmentMetadata (707-707)
server/db/message.ts (1)
  • getChatMessagesWithAuth (47-58)
server/api/files.ts (1)
  • handleAttachmentDelete (384-492)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
server/integrations/google/sync.ts (1)

372-380: Good refactor to avoid mutation, but consider making the array path consistent.

The change correctly avoids mutating vespaData.permissions in place by creating a new object with the spread operator. However, the array path (lines 362-370) still mutates v.permissions within the map callback, creating an inconsistency.

Consider updating the array path for consistency:

 if (Array.isArray(vespaData)) {
-  let allData: VespaFileWithDrivePermission[] = [...vespaData].map(
-    (v) => {
-      v.permissions = toPermissionsList(v.permissions, email)
-      return v
-    },
-  )
+  let allData: VespaFileWithDrivePermission[] = vespaData.map((v) => ({
+    ...v,
+    permissions: toPermissionsList(v.permissions, email),
+  }))
   for (const data of allData) {
     await insertWithRetry(data, fileSchema)
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0e76dd and aa45a8e.

📒 Files selected for processing (1)
  • server/integrations/google/sync.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/integrations/google/sync.ts (2)
server/integrations/google/utils.ts (1)
  • toPermissionsList (355-381)
server/search/vespa.ts (1)
  • insertWithRetry (167-167)

naSim087
naSim087 previously approved these changes Oct 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
server/api/chat/chat.ts (1)

791-820: Attachments still not collected for deletion

getChatMessagesWithAuth returns SelectMessage objects without an attachments property, so attachmentsToDelete stays empty and uploaded files remain orphaned. Collect the metadata per message inside the transaction (falling back to getAttachmentsByMessageId) before calling handleAttachmentDelete after commit.

@@
-    const attachmentsToDelete: AttachmentMetadata[] = []
+    const attachmentsToDelete: AttachmentMetadata[] = []
@@
-      for (const message of messagesToDelete) {
-        if (message.attachments && Array.isArray(message.attachments)) {
-          const attachments = message.attachments as AttachmentMetadata[]
-          attachmentsToDelete.push(...attachments)
-        }
-      }
+      for (const message of messagesToDelete) {
+        const attachments =
+          Array.isArray((message as any).attachments)
+            ? ((message as any).attachments as AttachmentMetadata[])
+            : await getAttachmentsByMessageId(tx, message.externalId, email)
+        if (attachments?.length) {
+          attachmentsToDelete.push(...attachments)
+        }
+      }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aa45a8e and 3a56bc3.

📒 Files selected for processing (1)
  • server/api/chat/chat.ts (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/chat/chat.ts (4)
server/shared/types.ts (1)
  • AttachmentMetadata (707-707)
server/db/chat.ts (1)
  • getChatByExternalIdWithAuth (70-83)
server/db/message.ts (1)
  • getChatMessagesWithAuth (47-58)
server/api/files.ts (1)
  • handleAttachmentDelete (384-492)

naSim087
naSim087 previously approved these changes Oct 8, 2025
Copy link
Contributor

@naSim087 naSim087 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend is fine

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
server/api/files.ts (1)

293-301: Add fallback for entity field.

The entity field at line 301 could be undefined if getFileType returns a FileType not present in attachmentFileTypeMap. This was flagged in a previous review and remains unaddressed.

Apply this diff to add a safe fallback:

+            const fileType = getFileType({ type: file.type, name: file.name })
             const vespaDoc = {
               title: file.name,
               url: "",
               app: Apps.Attachment,
               docId: docId,
               parentId: null,
               owner: email,
               photoLink: "",
               ownerEmail: email,
-              entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })],
+              entity: attachmentFileTypeMap[fileType] ?? AttachmentEntity.File,

This ensures a valid entity is always provided, preventing potential Vespa ingestion failures.

🧹 Nitpick comments (1)
server/api/files.ts (1)

18-18: Remove unused GetDocument import
The GetDocument import on line 18 isn’t used anywhere in this file; please remove it.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3a56bc3 and 1998628.

📒 Files selected for processing (5)
  • server/api/chat/agents.ts (3 hunks)
  • server/api/chat/chat.ts (5 hunks)
  • server/api/files.ts (6 hunks)
  • server/api/knowledgeBase.ts (1 hunks)
  • server/search/utils.ts (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/api/chat/agents.ts
  • server/api/chat/chat.ts
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (7)
server/shared/types.ts (2)
  • attachmentFileTypeMap (210-218)
  • AttachmentMetadata (707-707)
server/shared/fileUtils.ts (2)
  • getFileType (10-32)
  • isImageFile (4-8)
server/search/vespa.ts (2)
  • insert (20-20)
  • DeleteDocument (24-24)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/search/utils.ts (1)
  • expandSheetIds (23-42)
server/api/dataSource.ts (1)
  • DeleteImages (235-261)
server/api/search.ts (1)
  • handleAttachmentDeleteSchema (259-261)
🔇 Additional comments (1)
server/api/knowledgeBase.ts (1)

61-61: LGTM! Clean import refactor.

The import path update centralizes expandSheetIds in a shared utility module, improving code organization and reusability.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fc4a51 and 1b1d6d4.

📒 Files selected for processing (1)
  • server/api/files.ts (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (8)
server/shared/types.ts (2)
  • attachmentFileTypeMap (210-218)
  • AttachmentMetadata (707-707)
server/shared/fileUtils.ts (2)
  • getFileType (10-32)
  • isImageFile (4-8)
server/integrations/dataSource/config.ts (2)
  • getBaseMimeType (69-71)
  • isImageFile (84-87)
server/search/vespa.ts (3)
  • insert (20-20)
  • DeleteDocument (24-24)
  • GetDocument (21-21)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/search/utils.ts (1)
  • expandSheetIds (23-42)
server/api/dataSource.ts (1)
  • DeleteImages (235-261)
server/api/search.ts (1)
  • handleAttachmentDeleteSchema (259-261)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
server/api/files.ts (1)

264-264: Add fallback for unmapped file types.

If getFileType returns a FileType not present in attachmentFileTypeMap, the entity field will be undefined, which could cause downstream issues in Vespa or search logic.

Apply this defensive pattern at both locations (lines 264 and 326):

-            entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })],
+            entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })] ?? AttachmentEntity.File,

Also applies to: 326-326

🧹 Nitpick comments (1)
server/api/files.ts (1)

409-513: Deletion function lacks internal permission validation.

While handleAttachmentDeleteApi validates permissions before calling this function, handleAttachmentDelete is also called directly from chat.ts (ChatDeleteApi). In that flow, chat ownership is verified before collecting attachments, providing implicit authorization. However, as a reusable utility function, it would be safer to include permission validation inside handleAttachmentDelete itself to prevent misuse if called from new locations in the future.

Consider adding an optional permission check inside handleAttachmentDelete:

-export const handleAttachmentDelete = async (attachments: AttachmentMetadata [], email: string) => {
+export const handleAttachmentDelete = async (
+  attachments: AttachmentMetadata [],
+  email: string,
+  skipPermissionCheck = false
+) => {
+  if (!skipPermissionCheck) {
+    // Verify permissions for each attachment before deletion
+    for (const attachment of attachments) {
+      if (!attachment.fileId) continue
+      const vespaIds = expandSheetIds(attachment.fileId)
+      for (const vespaId of vespaIds) {
+        try {
+          const schema = vespaId.startsWith("att_") ? KbItemsSchema : fileSchema
+          const doc = await GetDocument(schema, vespaId)
+          if (doc?.fields?.permissions && !doc.fields.permissions.includes(email)) {
+            loggerWithChild({ email }).warn(
+              `User ${email} attempted to delete attachment ${vespaId} without permission`
+            )
+            throw new HTTPException(403, { message: "Access denied" })
+          }
+        } catch (error) {
+          if (error instanceof HTTPException) throw error
+          // Document not found, allow deletion to proceed
+        }
+      }
+    }
+  }
  const imageAttachmentFileIds: string[] = []

Then call it with skipPermissionCheck: true from ChatDeleteApi where permissions are already validated upstream.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1b1d6d4 and fb78401.

📒 Files selected for processing (1)
  • server/api/files.ts (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (8)
server/shared/types.ts (2)
  • attachmentFileTypeMap (210-218)
  • AttachmentMetadata (707-707)
server/shared/fileUtils.ts (2)
  • getFileType (10-32)
  • isImageFile (4-8)
server/integrations/dataSource/config.ts (2)
  • getBaseMimeType (69-71)
  • isImageFile (84-87)
server/search/vespa.ts (3)
  • insert (20-20)
  • DeleteDocument (24-24)
  • GetDocument (21-21)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/search/utils.ts (1)
  • expandSheetIds (23-42)
server/api/dataSource.ts (1)
  • DeleteImages (235-261)
server/api/search.ts (1)
  • handleAttachmentDeleteSchema (259-261)
🔇 Additional comments (6)
server/api/files.ts (6)

255-278: Image attachments now create Vespa documents with image chunk fields.

This change aligns with the PR objective to store attachments in the file schema. Previously, images were only stored on disk. Now they have Vespa metadata records with image_chunks and image_chunks_pos fields for potential future embedding/search capabilities.

However, note the potential issue at line 264 below.


439-448: Path traversal validation is correctly implemented.

The validation prevents directory traversal attacks by checking for .., /, and \ characters in the fileId before constructing file paths.


487-491: Schema selection handles migration path correctly.

The logic correctly routes legacy att_ attachments to KbItemsSchema and new attf_ attachments to fileSchema, supporting the migration from the old schema to the new one.


535-538: Permission check correctly handles edge cases.

The Array.isArray check ensures that if permissions is not an array (e.g., malformed data), it safely defaults to an empty array, preventing runtime errors.

This addresses the concern raised by gemini-code-assist in the past review.


454-461: Image deletion now removes Vespa documents. Legacy attachments without Vespa metadata may cause DeleteDocument to error (404); verify there are no legacy image directories in your production storage path (e.g. $IMAGE_DIR), or wrap DeleteDocument in error handling to ignore missing Vespa documents.


527-531: Add documentation for legacy attachment limitation
This endpoint only deletes new attachments (attf_* via fileSchema); legacy att_* documents in KbItemsSchema are removed by the chat‐deletion flow—add a comment here to clarify.

naSim087
naSim087 previously approved these changes Oct 9, 2025
Copy link
Contributor

@naSim087 naSim087 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backend is fine

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb78401 and 878ad4a.

📒 Files selected for processing (1)
  • server/vespa/schemas/file.sd (6 hunks)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
server/api/chat/chat.ts (1)

771-785: Attachments are still never fetched
SelectMessage rows coming from getChatMessagesWithAuth don’t hydrate an attachments array, so attachmentsToDelete remains empty. The earlier review already pointed this out—please fetch via getAttachmentsByMessageId(tx, message.externalId, email) inside the loop, collect them, and only call handleAttachmentDelete after the transaction.

🧹 Nitpick comments (1)
server/vespa/schemas/file.sd (1)

50-71: LGTM! Consider clarifying field naming.

The new fields for image chunk support are well-structured and consistent with the existing chunks fields. The use of BM25 indexing on image_chunks is appropriate.

Minor observation: The name image_chunks suggests these contain text extracted from or describing images (OCR text, captions, etc.) rather than binary image data, which is correct given the array<string> type. This is fine, though if you want to be more explicit, names like image_text_chunks could clarify the content type.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 878ad4a and 86edf8f.

📒 Files selected for processing (3)
  • server/api/chat/chat.ts (6 hunks)
  • server/package.json (1 hunks)
  • server/vespa/schemas/file.sd (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/package.json
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/chat/chat.ts (4)
server/shared/types.ts (2)
  • AttachmentEntity (39-39)
  • AttachmentMetadata (707-707)
server/db/chat.ts (1)
  • getChatByExternalIdWithAuth (70-83)
server/db/message.ts (1)
  • getChatMessagesWithAuth (47-58)
server/api/files.ts (1)
  • handleAttachmentDelete (409-513)
🪛 GitHub Actions: TypeScript Build Check
server/api/chat/chat.ts

[error] 1973-1973: api/chat/chat.ts(1973,38): Property 'attachmentRank' does not exist on type 'typeof SearchModes'.

🔇 Additional comments (6)
server/vespa/schemas/file.sd (6)

3-7: LGTM! Well-structured metadata.

The chunk_meta struct provides appropriate fields for tracking chunk metadata including index, page numbers, and block labels.


117-121: Verify DIMS configuration at deployment.

The image_chunk_embeddings field is correctly configured with the same pattern as chunk_embeddings. The indexing pipeline will automatically embed the image_chunks field using the configured embedder.

Ensure that the DIMS placeholder is properly replaced with the actual embedding dimension at deployment time, and verify that the embedder model handles both regular text chunks and image-extracted text effectively.


196-219: LGTM! Image-aware scoring functions properly implemented.

The initial_image rank profile correctly extends the base initial profile with image-specific scoring functions:

  • vector_score_image() combines text and image embeddings scores
  • combined_nativeRank_image() includes proper zero-division protection
  • All functions appropriately aggregate scores from both text and image chunks

The implementation follows Vespa ranking best practices.


249-275: LGTM! Attachment ranking profile correctly configured.

The attachmentRank profile properly:

  • Inherits image-aware scoring from initial_image
  • Uses vector_score_image and combined_nativeRank_image in both first-phase and global-phase
  • Applies recency decay via doc_recency
  • Exposes comprehensive match-features including image_chunk_scores

The ranking expressions are syntactically valid and follow the expected pattern for hybrid search with image support.


441-450: LGTM! Document summaries properly configured.

The new summaries are correctly defined:

  • image_chunks_summary has bolding enabled for search result highlighting
  • Position summaries (chunks_pos_summary, image_chunks_pos_summary) are properly sourced

These summaries will enable retrieval of image chunk data and position information in query results.


135-141: Verify fieldset configuration for image chunks.

The image_chunks field is not included in any fieldset (neither default nor autocomplete). This means generic queries won't automatically search image chunk content unless the field is explicitly specified.

Please confirm this is intentional—typically this would be correct if:

  • Attachment/image searches always use the attachmentRank profile with explicit field targeting
  • You want to prevent generic searches from matching image-extracted text

If image chunks should be searchable in general queries, consider adding them to the default fieldset.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
server/api/chat/agents.ts (2)

873-874: Add null-safety check for appIntegrations.

The type assertion as string[] is unsafe because agentForDb?.appIntegrations can be null or undefined. Calling flatMap on a null/undefined value will cause a runtime exception.

Apply this diff to add proper null-checking:

-const appIntegrations = agentForDb?.appIntegrations as string[]
-const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)
+const appIntegrations = (agentForDb?.appIntegrations as string[]) || []
+const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)

Alternatively, use optional chaining:

-const appIntegrations = agentForDb?.appIntegrations as string[]
-const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)
+const appIntegrations = agentForDb?.appIntegrations as string[] | undefined
+const appIntegrationsWithSheetIds = appIntegrations?.flatMap(expandSheetIds) ?? []

3873-3874: Add null-safety check for appIntegrations.

The type assertion as string[] is unsafe because agentForDb?.appIntegrations can be null or undefined. Calling flatMap on a null/undefined value will cause a runtime exception.

Apply this diff to add proper null-checking:

-const appIntegrations = agentForDb?.appIntegrations as string[]
-const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)
+const appIntegrations = (agentForDb?.appIntegrations as string[]) || []
+const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)

Alternatively, use optional chaining:

-const appIntegrations = agentForDb?.appIntegrations as string[]
-const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)
+const appIntegrations = agentForDb?.appIntegrations as string[] | undefined
+const appIntegrationsWithSheetIds = appIntegrations?.flatMap(expandSheetIds) ?? []
♻️ Duplicate comments (1)
server/api/chat/chat.ts (1)

782-785: Attachments never get deleted

getChatMessagesWithAuth returns SelectMessage rows that don’t include attachment metadata (we persist it via storeAttachmentMetadata). As a result attachmentsToDelete stays empty and nothing is passed to handleAttachmentDelete, so files survive chat deletion. Fetch the metadata per message inside the transaction (falling back to message.attachments only if present) before pushing into attachmentsToDelete.

-      for (const message of messagesToDelete) {
-        if (message.attachments && Array.isArray(message.attachments)) {
-          const attachments = message.attachments as AttachmentMetadata[]
-          attachmentsToDelete.push(...attachments)
-        }
-      }
+      for (const message of messagesToDelete) {
+        const attachments = Array.isArray((message as any).attachments)
+          ? ((message as any).attachments as AttachmentMetadata[])
+          : await getAttachmentsByMessageId(tx, message.externalId, email)
+
+        if (attachments.length) {
+          attachmentsToDelete.push(...attachments)
+        }
+      }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 86edf8f and 74e93ca.

📒 Files selected for processing (2)
  • server/api/chat/agents.ts (5 hunks)
  • server/api/chat/chat.ts (9 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/api/chat/agents.ts (3)
server/shared/types.ts (2)
  • AttachmentEntity (39-39)
  • Apps (40-40)
server/search/utils.ts (1)
  • expandSheetIds (23-42)
server/search/vespa.ts (1)
  • getAllDocumentsForAgent (160-160)
server/api/chat/chat.ts (3)
server/shared/types.ts (2)
  • AttachmentEntity (39-39)
  • AttachmentMetadata (707-707)
server/api/files.ts (1)
  • handleAttachmentDelete (409-513)
server/search/utils.ts (1)
  • expandSheetIds (23-42)
🪛 GitHub Actions: TypeScript Build Check
server/api/chat/chat.ts

[error] 1973-1973: Property 'attachmentRank' does not exist on type 'typeof SearchModes'.

🔇 Additional comments (2)
server/api/chat/agents.ts (2)

396-398: LGTM! Attachment filtering is correct.

The logic properly filters out attachment entities from citations by checking against the AttachmentEntity enum values.


847-849: LGTM! Sheet expansion for attachments is correct.

The code correctly expands sheet IDs for non-image attachments using flatMap(expandSheetIds), ensuring multi-sheet files are properly handled.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
server/vespa/schemas/file.sd (1)

262-271: Consider adding individual image field metrics for completeness.

While the combined image-aware metrics are present, adding nativeRank(image_chunks) and bm25(image_chunks) to the match-features list could be helpful for debugging and fine-tuning the ranking behavior of image-based searches.

You can apply this diff to add the individual metrics:

     match-features {
       matchedFieldCount_image
       vector_score_image
       combined_nativeRank_image
       nativeRank(title)
       nativeRank(chunks)
+      nativeRank(image_chunks)
       chunk_scores
       image_chunk_scores
+      bm25(image_chunks)
       doc_recency
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b7e061 and 549d8c2.

📒 Files selected for processing (1)
  • server/vespa/schemas/file.sd (5 hunks)
🔇 Additional comments (6)
server/vespa/schemas/file.sd (6)

3-7: LGTM!

The chunk_meta struct is well-defined with appropriate types for tracking chunk metadata including index, page numbers, and block labels.


50-71: LGTM!

The new document fields for image chunk support are properly defined with appropriate types and indexing directives. The symmetric structure between text chunks and image chunks is clean and maintainable.


117-121: LGTM!

The image_chunk_embeddings field is correctly configured with the appropriate tensor type, indexing pipeline, distance metric, and memory optimization. It's consistent with the existing chunk_embeddings field structure.


223-272: LGTM! Past issues resolved.

The attachmentRank profile is correctly implemented with image-aware scoring functions that properly combine text and image signals. Both first-phase and global-phase expressions use the appropriate _image variants, and match-features include all relevant image metrics. The previously reported issues with malformed expressions and missing image signals have been successfully resolved.


135-137: Verify if image_chunks should be included in the default fieldset.

The default fieldset excludes image_chunks, meaning queries using this fieldset won't search image content unless explicitly specified. This might be intentional if image search requires explicit selection of the attachmentRank profile, but please confirm this design choice aligns with your search requirements.


438-447: LGTM!

The document summaries for image chunks and positions are properly defined with appropriate bolding settings and source references, consistent with the existing text chunk summaries.

@junaid-shirur junaid-shirur merged commit 0cee0c0 into main Oct 9, 2025
3 of 4 checks passed
@junaid-shirur junaid-shirur deleted the feat/attachmentToFile branch October 9, 2025 13:37
@Himanshvarma Himanshvarma restored the feat/attachmentToFile branch October 17, 2025 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants