-
Notifications
You must be signed in to change notification settings - Fork 56
Feat/paddle #1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/paddle #1019
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds an OCR-based document chunking pipeline, a new ChunkMetadata type, persists chunk metadata maps (chunks_map, image_chunks_map) through file processing, queueing, APIs, and Vespa schema, and updates imports/logging and minor formatting across related files. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Uploader
participant FPS as FileProcessorService
participant OCR as Layout/OCR API
participant FS as Disk
participant Q as FileProcessingQueue
participant V as Vespa
Uploader->>FPS: process(fileBuffer, fileName, docId)
alt PDF / multi-page
FPS->>OCR: POST /parse (base64 payload)
OCR-->>FPS: OcrResponse (pages, blocks, images)
FPS->>FPS: chunkByOCR(...) ⇒ chunks, image_chunks, chunks_map, image_chunks_map
FPS->>FS: save extracted images (document dir)
else Non-PDF or fallback
FPS->>FPS: fallback chunking ⇒ chunks + default maps
end
FPS-->>Q: enqueue vespaDoc {chunks, image_chunks, chunks_map, image_chunks_map}
Q->>V: insert/update `kb_items` (includes chunk maps)
V-->>Q: ack
Q-->>Uploader: processing complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Aayushjshah, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly upgrades the document processing pipeline by integrating a robust OCR-based chunking solution. The core change involves replacing the existing PDF extraction method with a new module that leverages an external layout parsing API to extract text and images from documents, along with rich metadata such as page numbers and block types. This enhancement aims to improve the accuracy and detail of document chunking, particularly for complex PDF layouts, and ensures that this granular information is available throughout the system for better knowledge base management and retrieval. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new OCR-based file processing pipeline, primarily for PDFs, replacing the previous Gemini-based implementation. It adds a new chunkByOCR
service and integrates it into the FileProcessorService
. The data structures for chunking results have been expanded to include more detailed metadata (chunks_map
, image_chunks_map
), and these changes are propagated through the file processing and Vespa indexing logic. Several new test scripts have been added to validate the new OCR functionality. My review focuses on some critical issues in the new implementation and tests, including a syntax error, incorrect environment variable parsing, hardcoded local paths in tests that will break CI/CD, and broken test logic. There are also several opportunities to improve code quality by using a structured logger instead of console.log
and improving type safety.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 16
🧹 Nitpick comments (5)
server/services/fileProcessor.ts (1)
3-3
: Remove commented-out import.The commented import on line 3 appears to be dead code from the old PDF processing approach. Clean it up to reduce clutter.
-// import { extractTextAndImagesWithChunksFromPDF } from "@/pdf
server/scripts/testOCRFromProcessFile.ts (1)
8-8
: Empty default PDF path may cause confusing errors.If no PDF path is provided via CLI or environment,
DEFAULT_TEST_PDF
is an empty string, which will produce a cryptic "ENOENT" error when the script attempts to read it.Consider providing a clear error message:
-const DEFAULT_TEST_PDF = "" +const DEFAULT_TEST_PDF = "" function resolvePdfPath(args: string[], envs: EnvMap): string { // Priority: CLI arg -> TEST_PDF_PATH -> PDF_PATH const cli = args[0] const fromEnv = envs["TEST_PDF_PATH"] || envs["PDF_PATH"] || process.env.TEST_PDF_PATH || process.env.PDF_PATH const p = cli || fromEnv || DEFAULT_TEST_PDF + if (!p) { + throw new Error("No PDF path provided. Set TEST_PDF_PATH env var or pass as CLI argument.") + } return path.resolve(p) }server/test-chunkByOCR-enhanced.ts (1)
223-224
: Use proper typing instead of@ts-ignore
.The
@ts-ignore
directive suppresses type errors but reduces type safety. Consider using proper typing for the mock.Replace with a typed mock:
- // @ts-ignore - global.fetch = async (url: string, options: any) => { + global.fetch = async (url: string | URL | Request, options?: RequestInit): Promise<Response> => { console.log("🌐 Mock API called:", { url, method: options?.method })server/lib/chunkByOCR.ts (2)
246-253
: Remove or uncomment dead code.The table handling logic is commented out. If it's needed for future use, move it to a tracked TODO or issue. Otherwise, remove it to keep the codebase clean.
567-574
: Clarify commented code and filename strategy.The complex filename derivation logic is commented out (lines 567-573) in favor of a simple sequential pattern (line 574). Either remove the commented code or document why it's preserved for future use.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
server/api/knowledgeBase.ts
(3 hunks)server/docxChunks.ts
(1 hunks)server/lib/chunkByOCR.ts
(1 hunks)server/queue/fileProcessor.ts
(3 hunks)server/scripts/testOCRFromProcessFile.ts
(1 hunks)server/services/fileProcessor.ts
(4 hunks)server/test-chunkByOCR-enhanced.ts
(1 hunks)server/test-chunkByOCR.ts
(1 hunks)server/types.ts
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
server/services/fileProcessor.ts (2)
server/types.ts (1)
ChunkMetadata
(605-609)server/lib/chunkByOCR.ts (1)
chunkByOCRFromBuffer
(447-477)
server/lib/chunkByOCR.ts (3)
server/logger/index.ts (2)
getLogger
(36-93)Subsystem
(15-15)server/services/fileProcessor.ts (1)
ProcessingResult
(19-26)server/types.ts (1)
ChunkMetadata
(605-609)
server/test-chunkByOCR.ts (1)
server/lib/chunkByOCR.ts (2)
chunkByOCR
(479-653)chunkByOCRFromBuffer
(447-477)
server/test-chunkByOCR-enhanced.ts (1)
server/lib/chunkByOCR.ts (2)
chunkByOCR
(479-653)chunkByOCRFromBuffer
(447-477)
server/scripts/testOCRFromProcessFile.ts (1)
server/lib/chunkByOCR.ts (1)
chunkByOCRFromBuffer
(447-477)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (20)
server/types.ts (1)
604-609
: LGTM! Clean type definition for chunk metadata.The
ChunkMetadata
type is well-structured and clearly documents the metadata associated with each chunk in the OCR-based processing pipeline. The fields are appropriately typed and self-explanatory.server/docxChunks.ts (1)
2627-2627
: LGTM! Formatting improvement.Minor style fix adding space before the opening parenthesis, aligning with standard JavaScript/TypeScript conventions.
server/queue/fileProcessor.ts (2)
220-221
: LGTM! Chunk metadata propagation for files.Correctly populates
chunks_map
andimage_chunks_map
from the processing result for file entities. This aligns with the new OCR-based processing workflow.
330-331
: LGTM! Consistent metadata initialization for collections and folders.Appropriately initializes
chunks_map
andimage_chunks_map
as empty arrays for collection and folder entities, which do not have content to chunk.Also applies to: 451-452
server/api/knowledgeBase.ts (3)
47-47
: LGTM! Import style change.Switching from default to namespace import (
import * as crypto
) is a valid style choice with no functional impact.
755-759
: LGTM! Cleaner iteration pattern.The switch from
for-of
with.entries()
to.forEach()
is more idiomatic for this cleanup scenario and maintains the same functionality.
1579-1598
: LGTM! Robust dual-format chunk index lookup.The enhanced lookup logic correctly handles both legacy
number[]
and newChunkMetadata[]
formats:
- Type checking discriminates between formats
- Warning logged for unexpected object formats (good defensive programming)
- Maintains backward compatibility
- Existing 404 behavior preserved when no match is found
server/services/fileProcessor.ts (1)
164-175
: LGTM! Sensible default metadata for non-PDF files.The default chunk metadata generation for non-PDF processing paths is well-implemented:
- Correct index mapping
- Reasonable defaults (page_number: 0, block_labels: ["text"]/["image"])
- Image chunks continue indexing after text chunks to maintain uniqueness
server/scripts/testOCRFromProcessFile.ts (1)
79-119
: LGTM!The result extraction and logging correctly uses the
ProcessingResult
fields (chunks
,chunks_map
,image_chunks
,image_chunks_map
) and provides comprehensive output for debugging.server/test-chunkByOCR-enhanced.ts (1)
258-424
: LGTM!The test function is well-structured with proper setup/teardown, comprehensive logging, and correct usage of the
ProcessingResult
fields. The try/finally ensures mock restoration even on failure.server/lib/chunkByOCR.ts (10)
125-149
: LGTM!The magic byte detection for common image formats is correctly implemented with a sensible fallback.
151-173
: LGTM!File name sanitization and uniqueness logic is correctly implemented.
175-200
: LGTM!Bounding box normalization and parsing logic includes appropriate validation and error handling.
202-222
: LGTM!Image lookup construction correctly deduplicates by bounding box key.
224-238
: LGTM!Block content transformation applies appropriate markdown formatting based on label type.
262-291
: LGTM!Image filename derivation handles multiple scenarios with appropriate fallbacks.
293-372
: LGTM!The Layout Parsing API call is well-structured with proper timeout handling, error messages, and cleanup. The Triton request format is correctly implemented.
498-500
: LGTM!Directory creation with
recursive: true
ensures the path exists before image writes.
504-531
: LGTM!The
addChunk
closure correctly handles splitting oversized text buffers into multiple chunks with continuation markers, assuming individual chunks don't exceed the limit.
543-606
: LGTM!Image block processing correctly handles buffer lookup, metadata association, file saving with error handling, and chunk map creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
server/api/knowledgeBase.ts (1)
1593-1610
: Type the callback parameter explicitly for better clarity.The chunk position handling correctly supports both legacy
number[]
and newChunkMetadata[]
formats. However, thepos
parameter in thefindIndex
callback relies on type inference. For better type safety and clarity, consider explicitly typing it asnumber | ChunkMetadata
.Apply this diff for explicit typing:
- const index = resp.fields.chunks_pos.findIndex((pos: number | ChunkMetadata) => { + const index = resp.fields.chunks_pos.findIndex((pos: number | ChunkMetadata): boolean => { // If it's a number (legacy format), compare directly if (typeof pos === "number") { return pos === chunkIndex } // If it's a ChunkMetadata object, compare the index field if (typeof pos === "object") { if (pos.chunk_index !== undefined) { return pos.chunk_index === chunkIndex } else { loggerWithChild({ email: userEmail }).warn( `Unexpected chunk position object format: ${JSON.stringify(pos)}`, ) } } return false })Note: The callback logic is correct and the warning for unexpected formats is a good defensive practice.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/api/knowledgeBase.ts
(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/knowledgeBase.ts (2)
server/queue/api-server-queue.ts (2)
boss
(9-12)FileProcessingQueue
(14-14)server/types.ts (1)
ChunkMetadata
(605-609)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (3)
server/api/knowledgeBase.ts (3)
193-203
: LGTM! Improved readability.The reformatting of
boss.send
calls to multi-line style improves readability without changing semantics.Also applies to: 686-696, 836-846, 1213-1220
1208-1208
: LGTM! Clearer status message.The updated status message clearly indicates that the file has been uploaded and is awaiting processing.
1251-1254
: LGTM! Better error logging.The reformatted error logging follows structured logging best practices by passing the error object as the first parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
server/services/fileProcessor.ts (1)
24-25
: Verify all consumers handle the new required fields.The
ProcessingResult
interface now includeschunks_map
andimage_chunks_map
as required fields. Ensure all consumers (particularly inserver/queue/fileProcessor.ts
and Vespa integration) correctly handle these new fields.
🧹 Nitpick comments (2)
server/services/fileProcessor.ts (2)
29-29
: Logger instance defined but not used in this file.The logger is instantiated but never referenced within
FileProcessorService
. The past review flaggedconsole.log
statements in the PDF processing path (lines 48-53), but those logs are not visible in the current code. If logging was removed or moved tochunkByOCRFromBuffer
, consider removing this unused field. Otherwise, use it for error logging at line 144.
152-163
: Clarify metadata creation scope.The comment states "For non-PDF files" but this code executes for ALL files, including PDFs that encounter errors in the OCR path (lines 143-150). If PDFs should always produce metadata via OCR, consider moving this logic inside conditional branches for specific file types, or update the comment to reflect that it serves as a fallback for any processing failure.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/services/fileProcessor.ts
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/services/fileProcessor.ts (2)
server/lib/chunkByOCR.ts (2)
ProcessingResult
(14-21)chunkByOCRFromBuffer
(447-477)server/types.ts (1)
ChunkMetadata
(605-609)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (4)
server/services/fileProcessor.ts (4)
7-8
: LGTM!The new imports are correctly structured and align with the OCR processing enhancements.
17-17
: LGTM!The logger import enables structured logging as recommended in past reviews.
47-53
: LGTM! PDF processing correctly uses OCR.The PDF processing path now directly returns the result from
chunkByOCRFromBuffer
, which already includeschunks_map
andimage_chunks_map
(as shown in the relevant code snippet fromserver/lib/chunkByOCR.ts
). This correctly bypasses the fallback metadata creation at lines 152-163.
165-172
: LGTM! Return statement now includes all required fields.The return statement correctly includes
chunks_map
andimage_chunks_map
, addressing the issue flagged in the previous review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (4)
server/lib/chunkByOCR.ts (4)
713-721
: Remove commented-out code or use the more descriptive naming.The
deriveImageFileName
function is commented out in favor of a simple sequential numbering scheme. If the commented code is not needed, remove it to improve clarity. If the more descriptive naming is valuable, uncomment and use it.Apply this diff if removing:
- // const fileName = deriveImageFileName( - // metadata.fileName, - // metadata.bboxKey ?? normalizeBBox(block.block_bbox), - // imageBuffer, - // block.image_index, - // metadata.pageIndex ?? pageNumber, - // ) const extension = detectImageExtension(imageBuffer) const fileName = String(globalSeq.value) + "." + extensionBased on past review comments.
16-23
: Remove duplicate type definition and import from fileProcessor.The
ProcessingResult
interface is already defined inserver/services/fileProcessor.ts
(as shown in relevant code snippets). This duplication violates DRY and creates a maintenance burden—changes must be synchronized across files.Apply this diff:
-export interface ProcessingResult { - chunks: string[] - chunks_pos: number[] - image_chunks: string[] - image_chunks_pos: number[] - chunks_map: ChunkMetadata[] - image_chunks_map: ChunkMetadata[] -} +import type { ProcessingResult } from "../services/fileProcessor"Based on past review comments.
99-125
: Handle edge case where single sentence exceeds maxBytes.If a single sentence is larger than
maxBytes
, the current logic adds it tochunks
as-is (line 112), potentially violating the size constraint. This can cause downstream issues with storage or processing systems that enforce strict size limits.Consider splitting oversized sentences at word or character boundaries to ensure no chunk exceeds
maxBytes
. For example:function splitText(text: string, maxBytes: number): string[] { const sentences = text.match(/[^.!?]+[.!?]+|\S+/g) ?? [] const chunks: string[] = [] let currentChunk: string[] = [] let currentBytes = 0 for (const sentence of sentences) { const sentenceBytes = getByteLength(sentence) + 1 // If single sentence exceeds limit, split it further if (sentenceBytes > maxBytes) { if (currentChunk.length > 0) { chunks.push(currentChunk.join(" ")) currentChunk = [] currentBytes = 0 } // Split oversized sentence by words const words = sentence.split(/\s+/) let wordBuffer = "" for (const word of words) { const wordBytes = getByteLength(word) + 1 if (getByteLength(wordBuffer) + wordBytes > maxBytes) { if (wordBuffer) chunks.push(wordBuffer.trim()) wordBuffer = word } else { wordBuffer += (wordBuffer ? " " : "") + word } } if (wordBuffer) chunks.push(wordBuffer.trim()) continue } if (currentBytes + sentenceBytes > maxBytes) { if (currentChunk.length > 0) { chunks.push(currentChunk.join(" ")) } currentChunk = [sentence] currentBytes = sentenceBytes } else { currentChunk.push(sentence) currentBytes += sentenceBytes } } if (currentChunk.length > 0) { chunks.push(currentChunk.join(" ")) } return chunks }Based on past review comments.
631-638
: Fix parseInt radix argument.The second argument to
parseInt
must be the radix (2-36), but1024
is passed. This causesparseInt
to returnNaN
, and the fallback logic on lines 635-638 accidentally works around it. However, this is incorrect and fragile.Apply this diff:
const maxChunkBytes = Number.parseInt( process.env.OCR_MAX_CHUNK_BYTES ?? "", - 1024, + 10, )Based on past review comments.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/lib/chunkByOCR.ts
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/lib/chunkByOCR.ts (3)
server/logger/index.ts (2)
getLogger
(36-93)Subsystem
(15-15)server/services/fileProcessor.ts (1)
ProcessingResult
(19-26)server/types.ts (1)
ChunkMetadata
(605-609)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/api/knowledgeBase.ts
(9 hunks)server/lib/chunkByOCR.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/lib/chunkByOCR.ts
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/knowledgeBase.ts (2)
server/queue/api-server-queue.ts (2)
boss
(9-12)FileProcessingQueue
(14-14)server/types.ts (1)
ChunkMetadata
(605-609)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (6)
server/api/knowledgeBase.ts (6)
109-126
: LGTM - Return type annotation added.The explicit
: string
return type annotation improves type safety and code clarity.
192-202
: LGTM - Queue send calls reformatted for readability.The multi-line format for
boss.send()
calls improves readability without changing semantics. All calls use consistent retry/expiry configuration.Also applies to: 685-695, 835-845, 1212-1219
760-767
: LGTM - Session cleanup formatting.The session cleanup logic correctly removes sessions older than 1 hour. The formatting changes maintain the same behavior.
1207-1207
: LGTM - Status message added for file upload.Adding the status message
"File uploaded successfully, queued for processing"
provides clear feedback about the file's processing state and aligns with theUploadStatus.PENDING
status set later.
1250-1253
: LGTM - Error logging reformatted.Multi-line format improves readability without changing behavior.
47-47
: LGTM - Explicit crypto namespace import.The explicit namespace import
import * as crypto from "crypto"
is the recommended pattern for Node.js built-in modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/api/files.ts (1)
267-292
: Preserve the metadata returned byFileProcessorService
.The OCR pipeline already supplies
chunks_map
andimage_chunks_map
. Rebuilding placeholder maps here wipes page numbers and block labels, so Vespa never sees the enriched metadata. Destructure the maps fromprocessingResult
and pass them through unchanged (the service already backfills sensible defaults for non-OCR formats).- const { chunks, chunks_pos, image_chunks, image_chunks_pos } = - processingResult + const { + chunks, + chunks_pos, + image_chunks, + image_chunks_pos, + chunks_map, + image_chunks_map, + } = processingResult ... - chunks_map: chunks.map((_, index) => ({ - chunk_index: index, - page_number: 0, - block_labels: [], - })), - image_chunks_map: image_chunks.map((_, index) => ({ - chunk_index: index, - page_number: 0, - block_labels: [], - })), + chunks_map, + image_chunks_map,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/api/files.ts
(3 hunks)server/services/fileProcessor.ts
(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/api/files.ts (1)
server/integrations/dataSource/errors.ts (1)
isDataSourceError
(137-139)
server/services/fileProcessor.ts (3)
server/logger/index.ts (2)
getLogger
(36-93)Subsystem
(15-15)server/types.ts (1)
ChunkMetadata
(605-609)server/lib/chunkByOCR.ts (1)
chunkByOCRFromBuffer
(536-609)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
Description
Testing
Additional Notes
Summary by CodeRabbit
New Features
Improvements
Refactor / Chores