-
Notifications
You must be signed in to change notification settings - Fork 56
fix: XYN-288 unlink file from disk after transaction of postgress and vespa ingestion of the file fails #905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Caution Review failedThe pull request is closed. Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds frontend accumulation and conditional toasts for partial-success batch uploads; server upload handler now uses mutable storageKey/storagePath, persists storagePath in Vespa docs, and attempts per-file on-disk cleanup on failures while adding richer file metadata. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant FE as Frontend (KnowledgeManagement)
participant S as Server API (Upload)
participant FS as Filesystem
participant V as Vespa
participant LOG as Logger
U->>FE: Select files & start upload
loop per batch
FE->>S: POST batch
S->>FS: write file -> storagePath
alt write success
S->>V: upsert doc (includes storagePath, metadata)
S-->>FE: UploadedResult { summary.successful }
else write/error
S->>FS: unlink(storagePath) for failed file
S->>LOG: warn if unlink fails
S-->>FE: UploadedResult { summary.successful }
end
FE->>FE: accumulate summary.successful
end
alt any successful
FE-->>U: Toast "Successfully added X out of Y files"
else all failed
FE-->>U: Toast "Add Files Failed"
end
sequenceDiagram
autonumber
participant S as Server Upload Handler
participant PR as Path Resolver
participant FS as Filesystem
participant V as Vespa
participant LOG as Logger
S->>PR: resolve storagePath(file, collection?)
PR-->>S: storagePath (mutable)
S->>FS: stream write -> storagePath
alt success
S->>V: upsert doc { fileName: collection?.name?, storagePath, metadata }
V-->>S: ack
S-->>Caller: success summary
else failure
S->>FS: unlink(storagePath)
alt unlink error
S->>LOG: warn cleanup failed
end
S-->>Caller: partial/failed summary
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Ravishekhar7870, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where files could be left on the disk if their processing and ingestion into the database or search index failed. It introduces a cleanup mechanism to ensure that such orphaned files are removed, thereby improving data consistency and resource management. Concurrently, the user experience for file uploads has been refined to offer more precise feedback on the success or failure of batch uploads. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly implements a cleanup mechanism to unlink files from disk if the ingestion transaction fails, preventing orphaned files. The frontend changes also improve user feedback by showing the number of successfully uploaded files. My review focuses on improving code quality and maintainability. I've suggested using the structured logger instead of console.log
for better error tracking on the server, removing an unused import, and several minor improvements to code style and readability for consistency with common practices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (4)
770-781
: Propagate partial‑success handling to “create new collection” flow.Right now you ignore the per‑batch result and always toast as if all files succeeded. Mirror the add‑to‑collection logic to avoid misleading UX.
Apply this diff:
for (let i = 0; i < batches.length; i++) { setBatchProgress((prev: typeof batchProgress) => ({ ...prev, batch: i + 1, })) const batchFiles = batches[i].map((f) => f.file) - await uploadFileBatch(batchFiles, cl.id) + const uploadedResult = await uploadFileBatch(batchFiles, cl.id) + const successfulInBatch = Number(uploadedResult?.summary?.successful ?? 0) + // Optionally accumulate if you want to reflect partial success in the final toast + // keep a running counter declared before the loop: let successfullyUploadedFiles = 0 + successfullyUploadedFiles += successfulInBatch setBatchProgress((prev: typeof batchProgress) => ({ ...prev, current: prev.current + batchFiles.length, })) }And later adjust the toast (see next comment) to reflect partial success.
835-840
: Toast should reflect actual upload outcome (partial successes).Use the accumulated successful count or updatedCl.totalCount rather than selectedFiles.length.
Apply this diff:
- showToast( - "Knowledge Base Created", - `Successfully created knowledge base "${collectionName.trim()}" with ${selectedFiles.length} files.`, - ) + const totalRequested = selectedFiles.length + const totalSucceeded = typeof successfullyUploadedFiles === 'number' + ? successfullyUploadedFiles + : Number(updatedCl?.totalCount ?? 0) + showToast( + "Knowledge Base Created", + `Added ${totalSucceeded} of ${totalRequested} files to "${collectionName.trim()}".` + )
904-930
: Defensive read of API response to avoid runtime errors on non‑JSON bodies.uploadFileBatch can return a non‑JSON fallback object; directly accessing uploadedResult.summary.successful can throw. Guard it.
Apply this diff:
- let successfullyUploadedFiles=0; + let successfullyUploadedFiles = 0; const batches = createBatches(selectedFiles, addingToCollection.name) @@ - const batchFiles = batches[i].map((f) => f.file) - const uploadedResult= await uploadFileBatch( + const batchFiles = batches[i].map((f) => f.file) + const uploadedResult = await uploadFileBatch( batchFiles, addingToCollection.id, targetFolder?.id, ) - successfullyUploadedFiles=successfullyUploadedFiles+ uploadedResult.summary.successful; + const successfulInBatch = Number(uploadedResult?.summary?.successful ?? 0) + successfullyUploadedFiles += successfulInBatch
976-984
: Simplify conditional toast; avoid ternary used for side‑effects.Readability nit: prefer a simple if/else.
Apply this diff:
- successfullyUploadedFiles? showToast( - "Files Added", - `Successfully added ${successfullyUploadedFiles} out of ${selectedFiles.length} files to collection "${addingToCollection.name}".`, - ): - showToast( - "Add Files Failed", - "Failed to add files to collection. Please try again.", - true, - ) + if (successfullyUploadedFiles > 0) { + showToast( + "Files Added", + `Successfully added ${successfullyUploadedFiles} out of ${selectedFiles.length} files to collection "${addingToCollection.name}".` + ) + } else { + showToast( + "Add Files Failed", + "Failed to add files to collection. Please try again.", + true + ) + }server/api/knowledgeBase.ts (3)
52-53
: Remove unused import.storage from googleapis is unused and may fail lint/noUnusedLocals.
Apply this diff:
-import { storage } from "googleapis/build/src/apis/storage"
1243-1246
: Unnecessary optional chaining; also normalize composed path.collection is already validated; optional chaining is noise. Consider precomputing path to avoid duplicate logic.
Apply this diff:
- fileName: - targetPath === "/" - ? collection?.name + targetPath + filePath - : collection?.name + targetPath + fileName, + fileName: collection.name + targetPath + (targetPath === "/" ? filePath : fileName),
1298-1306
: Use structured logger for cleanup errors; avoid console.log.Aligns with existing logging strategy and preserves context.
Apply this diff:
- if(storagePath){ - try{ - await unlink(storagePath); - } - catch(err){ - console.log(`Failed to clean up storage file at ${storagePath}: ${getErrorMessage(err)}`); - } - - } + if (storagePath) { + try { + await unlink(storagePath) + } catch (err) { + loggerWithChild({ email: userEmail }).warn( + `Failed to clean up storage file at ${storagePath}: ${getErrorMessage(err)}` + ) + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src/routes/_authenticated/knowledgeManagement.tsx
(5 hunks)server/api/knowledgeBase.ts
(8 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-16T11:56:22.752Z
Learnt from: oindrila-b
PR: xynehq/xyne#545
File: server/integrations/google/index.ts:137-145
Timestamp: 2025-06-16T11:56:22.752Z
Learning: In server/integrations/google/index.ts, both Logger (base logger) and loggerWithChild (factory for email-scoped child loggers) are intentionally maintained to handle different logging scenarios - one for when email context is not available and one for when it is available.
Applied to files:
server/api/knowledgeBase.ts
🧬 Code graph analysis (2)
frontend/src/routes/_authenticated/knowledgeManagement.tsx (1)
frontend/src/utils/fileUtils.ts (1)
uploadFileBatch
(222-269)
server/api/knowledgeBase.ts (2)
server/db/knowledgeBase.ts (2)
generateStorageKey
(763-765)generateFileVespaDocId
(768-770)server/utils.ts (1)
getErrorMessage
(89-92)
🔇 Additional comments (2)
server/api/knowledgeBase.ts (2)
1019-1205
: Good: on‑disk rollback on failure now in place.Declaring storagePath before write and unlinking on error covers DB/Vespa/processing failures; this meets the PR objective.
Please confirm we also cover failures after DB commit but before response write (rare). If needed, consider a best‑effort compensating task/queue to unlink orphaned files discovered later.
1184-1189
: Incorrect — workspaceExternalId is non-null on SelectUser; no change needed.selectUserSchema defines workspaceExternalId as notNull (server/db/schema/users.ts) and user objects returned by getUserByEmail include it; the codebase already relies on user.workspaceExternalId (e.g., server/api/knowledgeBase.ts:1184–1189), so passing user.workspaceExternalId to getStoragePath is safe.
Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
server/api/knowledgeBase.ts (2)
52-52
: Remove unused import.The
storage
import fromgoogleapis
is not used anywhere in this file and should be removed to keep the code clean.-import { storage } from "googleapis/build/src/apis/storage"
1298-1308
: Fix error logging and use structured logger.The error handling for file cleanup has syntax issues and uses incomplete logging. The
loggerWithChild
call is malformed and missing proper parameters.if (storagePath) { try { await unlink(storagePath) } catch (err) { - loggerWithChild({ email: userEmail, }).error( - error, - `Failed to clean up storage file` - ); + loggerWithChild({ + email: userEmail, + storagePath, + error: getErrorMessage(err) + }).warn( + `Failed to clean up storage file` + ) } }
🧹 Nitpick comments (4)
server/api/knowledgeBase.ts (4)
1022-1022
: Initialize variable with proper spacing.For consistency and readability, add spaces around the assignment operator.
- let storagePath = "" + let storagePath = ""
1181-1181
: Consider usingconst
for immutable variable.Since
storageKey
doesn't appear to be reassigned after generation, consider usingconst
instead oflet
.- let storageKey = generateStorageKey() + const storageKey = generateStorageKey()
1619-1637
: Verify Vespa response structure handling.The code directly accesses
resp.fields
properties without proper null checks on intermediate objects. While there are some checks, they could be more comprehensive.- const index = resp.fields.chunks_pos.findIndex( - (pos: number) => pos === chunkIndex, - ) + const chunksPos = resp.fields.chunks_pos + const chunks = resp.fields.chunks + + if (!Array.isArray(chunksPos) || !Array.isArray(chunks)) { + throw new HTTPException(404, { message: "Invalid chunk data format" }) + } + + const index = chunksPos.findIndex((pos: number) => pos === chunkIndex)
1772-1787
: Review filename sanitization logic.The filename sanitization functions have potential issues:
- Line 1779: The regex replacement might be too aggressive
- Line 1785-1786: The chaining of
escape
andunescape
calls seems redundantfunction sanitizeFilename(name: string): string { - // Replace non-ASCII and problematic characters with '_' - return name.replace(/[^\x20-\x7E]|["\\]/g, "_") + // Replace control characters and problematic characters with '_' + return name.replace(/[\x00-\x1f\x7f-\x9f"\\<>:|?*]/g, "_") } // RFC 5987 encoding for filename* function encodeRFC5987ValueChars(str: string): string { - return encodeURIComponent(str) - .replace(/['()]/g, escape) - .replace(/%(7C|60|5E)/g, unescape) + return encodeURIComponent(str) + .replace(/['()]/g, (c) => `%${c.charCodeAt(0).toString(16).toUpperCase()}`) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/api/knowledgeBase.ts
(11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-16T11:56:22.752Z
Learnt from: oindrila-b
PR: xynehq/xyne#545
File: server/integrations/google/index.ts:137-145
Timestamp: 2025-06-16T11:56:22.752Z
Learning: In server/integrations/google/index.ts, both Logger (base logger) and loggerWithChild (factory for email-scoped child loggers) are intentionally maintained to handle different logging scenarios - one for when email context is not available and one for when it is available.
Applied to files:
server/api/knowledgeBase.ts
🧬 Code graph analysis (1)
server/api/knowledgeBase.ts (1)
server/db/knowledgeBase.ts (3)
getCollectionItemsByParent
(142-164)generateStorageKey
(763-765)generateFileVespaDocId
(768-770)
🔇 Additional comments (2)
server/api/knowledgeBase.ts (2)
1686-1686
: Handle potential undefined originalName gracefully.The fallback to "file" when
originalName
is missing is good defensive programming.
1244-1245
: Replace optional chaining with direct access — collection is validated earlier.This handler calls getCollectionById and immediately throws if not found (server/api/knowledgeBase.ts:903–906), so collection is non-null when the Vespa doc is created (server/api/knowledgeBase.ts ≈1160–1260 / original lines 1244–1245). Use collection.name (not collection?.name) to avoid
undefined
prefixes in Vespa fileName.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
server/api/knowledgeBase.ts (1)
1297-1307
: Cleanup logging uses wrong error variable and misses context; treat ENOENT as non-issue.You're logging
error
from the outer catch instead oferr
, and severity should be warn. IncludestoragePath
and the unlink error message.Apply this diff:
- if (storagePath) { - try { - await unlink(storagePath) - } catch (err) { - loggerWithChild({ email: userEmail, }).error( - error, - `Failed to clean up storage file` - ); - - } - } + if (storagePath) { + try { + await unlink(storagePath) + } catch (err) { + // Ignore missing file; warn on other errors + if ((err as NodeJS.ErrnoException)?.code !== 'ENOENT') { + loggerWithChild({ email: userEmail, storagePath, error: getErrorMessage(err) }) + .warn(`Failed to clean up storage file`) + } + } + }
🧹 Nitpick comments (5)
server/api/knowledgeBase.ts (5)
1180-1180
: Prefer const for immutablestorageKey
.
storageKey
is never reassigned; useconst
.- let storageKey = generateStorageKey() + const storageKey = generateStorageKey()
1021-1021
: Use null sentinel for unset path.
""
can be mistaken for a valid path. Prefernull
and check truthiness accordingly.- let storagePath = "" + let storagePath: string | null = null
1242-1245
: Inconsistent Vespa fileName composition; avoid mixingfilePath
.Using
filePath
in the root case can double-embed folder segments if logic changes. Build uniformly fromcollection.name
,targetPath
, andfileName
.Apply this diff:
- fileName: - targetPath === "/" - ? collection?.name + targetPath + filePath - : collection?.name + targetPath + fileName, + fileName: `${collection!.name}${targetPath}${fileName}`,
249-283
: Potential N+1 when includeItems=true.Fetching items per collection can be heavy for large lists. Consider a batched query by collection IDs and grouping in memory, or gate behind a max collections threshold.
If interested, I can sketch a batched query using a single SELECT with
collectionId IN (...)
and grouping.
1777-1786
: Avoid deprecated escape/unescape in RFC5987 encoding helper.
escape
/unescape
are deprecated. Use a safe replacer.- function encodeRFC5987ValueChars(str: string): string { - return encodeURIComponent(str) - .replace(/['()]/g, escape) - .replace(/%(7C|60|5E)/g, unescape) - } + function encodeRFC5987ValueChars(str: string): string { + // Encode as per RFC5987 without deprecated APIs + return encodeURIComponent(str) + .replace(/[!'()*]/g, (c) => `%${c.charCodeAt(0).toString(16).toUpperCase()}`) + .replace(/%20/g, '%20') + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
frontend/src/routes/_authenticated/knowledgeManagement.tsx
(6 hunks)server/api/knowledgeBase.ts
(10 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/src/routes/_authenticated/knowledgeManagement.tsx
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-09-16T08:57:58.762Z
Learnt from: rahul1841
PR: xynehq/xyne#843
File: server/server.ts:996-996
Timestamp: 2025-09-16T08:57:58.762Z
Learning: The DownloadFileApi handler in server/api/knowledgeBase.ts properly uses the clId route parameter to scope and validate file access within the correct collection.
Applied to files:
server/api/knowledgeBase.ts
📚 Learning: 2025-09-16T08:57:58.762Z
Learnt from: rahul1841
PR: xynehq/xyne#843
File: server/server.ts:996-996
Timestamp: 2025-09-16T08:57:58.762Z
Learning: The DownloadFileApi handler in server/api/knowledgeBase.ts properly uses the clId route parameter to scope and validate file access within the correct collection, including fetching the collection, checking permissions, and verifying the file belongs to that collection.
Applied to files:
server/api/knowledgeBase.ts
📚 Learning: 2025-06-16T11:56:22.752Z
Learnt from: oindrila-b
PR: xynehq/xyne#545
File: server/integrations/google/index.ts:137-145
Timestamp: 2025-06-16T11:56:22.752Z
Learning: In server/integrations/google/index.ts, both Logger (base logger) and loggerWithChild (factory for email-scoped child loggers) are intentionally maintained to handle different logging scenarios - one for when email context is not available and one for when it is available.
Applied to files:
server/api/knowledgeBase.ts
🧬 Code graph analysis (1)
server/api/knowledgeBase.ts (1)
server/db/knowledgeBase.ts (3)
getCollectionItemsByParent
(142-164)generateStorageKey
(763-765)generateFileVespaDocId
(768-770)
🔇 Additional comments (3)
server/api/knowledgeBase.ts (3)
1619-1636
: Good defensive checks for Vespa response shape.The added null/type guards and index lookup reduce 404s and undefined access.
1685-1685
: Content-Disposition filename=UTF-8 encoding LGTM.*Safer for non-ASCII filenames in previews.
1184-1189
: Use user.workspaceId (not workspaceExternalId) for storage pathgetStoragePath expects a workspace ID; passing user.workspaceExternalId (unused elsewhere) will likely be undefined at runtime.
File: server/api/knowledgeBase.ts Lines: 1184-1189
- storagePath = getStoragePath( - user.workspaceExternalId, + storagePath = getStoragePath( + user.workspaceId, collectionId, storageKey, fileName, )
… vespa ingestion of the file fails (#905) * fix: XYN-288 unlink file from disk after vespa ijection fails * chore: XYN-288 resolved comments * Fix spacing in variable declaration for clarity * Fix assignment operator for successfullyUploadedFiles * Remove unused Google Cloud Storage import * Fix formatting of uploadedResult assignment --------- Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local> Co-authored-by: Sahil Kumar <119723019+SahilKumar000@users.noreply.github.com>
* fix: XYN-288 unlink file from disk after transaction of postgress and vespa ingestion of the file fails (#905) * fix: XYN-288 unlink file from disk after vespa ijection fails * chore: XYN-288 resolved comments * Fix spacing in variable declaration for clarity * Fix assignment operator for successfullyUploadedFiles * Remove unused Google Cloud Storage import * Fix formatting of uploadedResult assignment --------- Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local> Co-authored-by: Sahil Kumar <119723019+SahilKumar000@users.noreply.github.com> * fix: XYN-285 pdf view in citation (#881) resolved issue of viewing pdf file in citation Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local> --------- Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local> Co-authored-by: Sahil Kumar <119723019+SahilKumar000@users.noreply.github.com>
Description
unlink file from disk after transaction of postgress and vespa ingestion of the file fails
Testing
locally tested
Additional Notes
Summary by CodeRabbit
New Features
Bug Fixes