Skip to content

Conversation

Aayushjshah
Copy link
Contributor

@Aayushjshah Aayushjshah commented Sep 18, 2025

Description

Testing

Additional Notes

Summary by CodeRabbit

  • New Features

    • Enhanced PDF processing powered by AI for more accurate, structured text chunking.
    • Supports large PDFs by automatically splitting them into processable parts.
  • Bug Fixes

    • Clearer, user-friendly error messages surfaced during file uploads from connected data sources.
    • Explicit feedback when an individual PDF page exceeds the allowed size limit, with guidance to adjust the file.

Aayushjshah and others added 27 commits September 12, 2025 13:07
* Added support for viewing xlxs,xls,csv and title generation based on user forst query and assistant response

* added unititled as default chat title

* fixed generateTitle function

* removed the title generation changes

---------

Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local>
…868)

* added back button in citation preview

* comments resolved
* temperorily disabled chunk citations for pdfs

* resolved comments
* fix:title-generation

* made the suggested changes

* made the suggested changes

* made the suggested changes

---------

Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local>
* fix:app exclusion (#855)

* using sync job existance check ,modify all sources

* resolved issue

* resolved comments

* add routeTree and format code

* fix:conditional syncjob check  (#858)

* conditional syncjob check based on local or production

* resolved comments
…dential requirement (#876) (#878)

- Add new vespa‑deploy service built from Dockerfile that copies server/vespa code and runs deploy‑docker.sh
- Mount server/vespa directory into existing Vespa containers
- Update compose commands to always use `docker‑compose … up … --build` so images are rebuilt before start
- Relax Microsoft integration validation: missing MICROSOFT_CLIENT_ID/SECRET now logs a warning and skips sync jobs instead of failing
)

- Bump Vespa base image to 8.514.24
- Split DNF install into separate steps and install CUDA libraries
- Pin onnxruntime-cu118 to 1.20.1
- Mount Prometheus config as a template and inject METRICS_PORT via envsubst (default 3001)
- Expose METRICS_PORT as an environment variable to the container
- Update deployment script to accept optional VESPA_CLI_PATH and remove hard‑coded host/port
- Add validation override for schema‑removal
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 18, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Replaces PDF chunk extraction across integrations and services with a new Gemini-based module. Adds the Gemini chunker implementation, a prompt, a CLI test script, and a pdf-lib dependency. Updates API handlers to surface DataSource-specific errors. Removes an old test script and introduces a dedicated PdfPageTooLargeError.

Changes

Cohort / File(s) Summary
PDF extraction callsites
server/integrations/dataSource/index.ts, server/integrations/google/worker-utils.ts, server/integrations/microsoft/attachment-utils.ts, server/services/fileProcessor.ts
Swap extractor import/usage to extractTextAndImagesWithChunksFromPDFviaGemini(...) with reduced params; downstream handling unchanged.
Gemini chunker implementation
server/lib/chunkPdfWithGemini.ts, server/ai/prompts.ts, server/package.json
Add Gemini-based PDF chunking module, exports, and prompt; introduce pdf-lib dependency for PDF splitting.
API error handling
server/api/dataSource.ts, server/api/files.ts, server/integrations/dataSource/errors.ts
Add PdfPageTooLargeError; propagate DataSource errors and user messages through API layers.
Scripts
server/scripts/testGeminiFromProcessFile.ts, server/scripts/testPdfDirect.ts
Add Bun script to test Gemini chunking; remove legacy direct PDF test script.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant API as API (files/dataSource)
  participant FPS as FileProcessorService
  participant GC as Gemini Chunker (chunkPdfWithGemini)
  participant VA as Vertex AI Gemini
  U->>API: Upload PDF
  API->>FPS: processFile(buffer, meta)
  FPS->>GC: extractTextAndImagesWithChunksFromPDFviaGemini(data, docId)
  alt Small PDF (<= INLINE_MAX_BYTES)
    GC->>VA: GenerateContent (inline base64 PDF + CHUNKING_PROMPT)
    VA-->>GC: <chunk>…</chunk> blocks
  else Large PDF
    GC->>GC: Split PDF into inline-sized sub-PDFs
    loop For each part
      GC->>VA: GenerateContent (part inline)
      VA-->>GC: <chunk>…</chunk> blocks
    end
  end
  GC-->>FPS: { text_chunks, text_chunk_pos, image_chunks:[], image_chunk_pos:[] }
  FPS-->>API: Normalized processing result
  API-->>U: Response
  note over GC,VA: Image chunks always empty in this path
Loading
sequenceDiagram
  autonumber
  actor U as User
  participant API as API (files/dataSource)
  participant DS as DataSource Layer
  U->>API: Upload
  API->>DS: handle upload
  DS-->>API: throws DataSourceError (e.g., PdfPageTooLargeError)
  alt isDataSourceError(error)
    API-->>U: HTTP error with error.userMessage
  else Other errors
    API-->>U: Existing error handling/message
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • kalpadhwaryu
  • devesh-juspay

Poem

Hop hop, I split the scroll with care,
Into tidy chunks from bytes of air.
Gemini hums, the pages sing,
No image crumbs—just text we bring.
If a page’s too plump, I gently squeak:
“Trim that megabyte cheek!” 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title Check ❓ Inconclusive The PR title "Feat/pdf layout" is short and indicates a PDF-related change but is vague and does not clearly summarize the primary work in the changeset, which centers on integrating a Gemini-based PDF chunker (new server/lib/chunkPdfWithGemini.ts), updating callers to the new extractor, adding chunking prompts and test scripts, and adding pdf-lib as a dependency. Because the title does not convey the main technical change or intent, it is not sufficiently descriptive for a teammate scanning history to understand the primary impact. Therefore the title check is inconclusive. Please rename the PR to explicitly state the primary change and use a conventional prefix like feat(pdf):; for example "feat(pdf): integrate Gemini-based PDF chunker and chunking prompt" or "feat(pdf/chunking): add Gemini Flash extractor and splitting logic". Keep the title concise and focused on the main technical change rather than the ambiguous term "layout".
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/pdfLayout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Aayushjshah, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant upgrade to the system's PDF content processing capabilities by integrating Google Gemini Flash. The primary goal is to improve the quality and semantic coherence of extracted text and images from PDF documents, making them more suitable for advanced applications like Retrieval-Augmented Generation (RAG). This change streamlines the content ingestion pipeline, ensuring more accurate and structured data extraction from diverse PDF layouts.

Highlights

  • Gemini Integration for PDF Processing: The core change is the integration of Google Gemini Flash via Vertex AI for advanced PDF text and image extraction and semantic chunking.
  • Enhanced PDF Chunking Logic: A new 'chunkPdfWithGemini.ts' module defines detailed rules for OCR, Markdown conversion, HTML table extraction, image descriptions, and semantic chunking (targeting 250-512 words, max 1024 bytes per chunk).
  • Replacement of Existing PDF Processing: The new Gemini-based chunking ('extractTextAndImagesWithChunksFromPDFviaGemini') replaces the previous PDF processing logic in 'server/integrations/dataSource', 'server/integrations/google', 'server/integrations/microsoft', and 'server/services/fileProcessor'.
  • New PDF Parsing Utilities & Test Scripts: Several new Python scripts ('basic_paddleocr_pdf_to_md.py', 'doclingTemp.py', 'docling_pdf_to_md.py', 'paddleocr_pdf_to_md.py', 'pp_structurev3_parse_small2.py') and associated Markdown/JSON output files ('scratch/') have been added, indicating exploration and testing of various PDF parsing and layout extraction methods. A Bun script ('testGeminiFromProcessFile.ts') was also added to test the new Gemini integration.
  • Documentation Templates: A suite of new Markdown templates for project documentation (design, product, requirements, structure, tasks, tech) has been added under 'server/.spec-workflow/templates', along with a README for user-defined templates.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new PDF processing pipeline using Google's Gemini model, replacing the previous implementation. The core change is in server/lib/chunkPdfWithGemini.ts and its integration into the data source and email attachment processing flows. A number of experimental scripts for PDF processing with other tools like PaddleOCR and Docling have also been added, along with new specification templates.

My review focuses on the new Gemini implementation and the overall health of the codebase. I've found a few critical issues that need to be addressed before merging, such as a disabled pre-commit hook and hardcoded paths in test scripts. There are also opportunities to improve the robustness and clarity of the new PDF processing code. Please see my detailed comments below.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
.husky/pre-commit (1)

1-26: Re-enable pre-commit checks with opt‑out (prevent formatting regressions)

.husky/pre-commit is fully commented out; server/package.json already has a format script (bunx biome format --write ../) and @biomejs/biome + husky are present — restore the hook, add a DISABLE_PRECOMMIT opt‑out, and guard for missing bun/bunx (exit 0) so commits aren’t blocked in non‑bun environments.
Files: .husky/pre-commit (commented), server/package.json (format script, @biomejs/biome, husky dep).

outFull.md (1)

2487-2492: Remove or quarantine third‑party confidential material immediately

This file ends with “CONFIDENTIAL: This material is property of JUSPAY…”. Keeping this in a public repo risks legal/compliance issues and brand misuse. It also includes third‑party emails/phone numbers.

Action:

  • Delete the file or move to a private/internal location with explicit permission and attribution.
  • If retained, replace brand content with anonymized placeholders and add licensing/consent documentation.
-<entire file content>
+<removed: confidential third-party marketing collateral pending legal clearance>
server/pdfChunks.ts (1)

565-575: Replace console.log with structured Logger and gate verbose logs

Console logging in server code is noisy and bypasses log routing. Use Logger at debug level and guard summaries behind a flag.

-              console.log("Image operator detected", {
+              Logger.debug("Image operator detected", {
@@
-              console.log("Processing image", { imageName })
+              Logger.debug("Processing image", { imageName })
@@
-              console.log("Image operator details", {
+              Logger.debug("Image operator details", {
@@
-              console.log("Resolved imageDict", {
+              Logger.debug("Resolved imageDict", {
@@
-                  console.log("Full image details", {
+                  Logger.debug("Full image details", {
@@
-                    console.log("Skipped image with invalid dimensions", {
+                    Logger.debug("Skipped image with invalid dimensions", {
@@
-                    console.log("Skipped large image", {
+                    Logger.debug("Skipped large image", {
@@
-                    console.log("Skipped small image", {
+                    Logger.debug("Skipped small image", {
@@
-                  console.log(
+                  Logger.debug(
                     "Image passed all filters, proceeding with processing",
@@
-                              console.log(
+                              Logger.debug(
                                 `Skipping image with poor description: ${imageName} on page ${pageNum}`,
                               )
@@
-                              console.log(
+                              Logger.debug(
                                 `Skipping image with poor description: ${imageName} on page ${pageNum}`,
                               )
@@
-        if (imageOperatorIds.size > 0) {
-          console.log(`\n=== IMAGE OPERATORS ON PAGE ${pageNum} ===`)
-          console.log(`Found ${imageOperatorIds.size} unique image operator type(s):`)
+        if (process.env.DEBUG_IMAGE_OPS === "1" && imageOperatorIds.size > 0) {
+          Logger.debug(`IMAGE OPERATORS ON PAGE ${pageNum}`)
+          Logger.debug(`Found ${imageOperatorIds.size} unique image operator type(s)`)
@@
-            console.log(`  - fnId: ${fnId} (${operatorName})`)
+            Logger.debug(`fnId: ${fnId} (${operatorName})`)
@@
-          console.log(`Total images processed on page ${pageNum}: ${imagesOnPage}`)
+          Logger.debug(`Total images processed on page ${pageNum}: ${imagesOnPage}`)
-        } else {
-          console.log(`\n=== PAGE ${pageNum} ===`)
-          console.log('No image operators found on this page')
+        } else if (process.env.DEBUG_IMAGE_OPS === "1") {
+          Logger.debug(`PAGE ${pageNum}: No image operators found`)
@@
-    console.log(`\n=== COMPLETE DOCUMENT IMAGE OPERATORS SUMMARY ===`)
+    if (process.env.DEBUG_IMAGE_OPS === "1") Logger.debug("COMPLETE DOCUMENT IMAGE OPERATORS SUMMARY")
-    if (documentImageOperatorIds.size > 0) {
-      console.log(`Found ${documentImageOperatorIds.size} unique image operator type(s) across entire document:`)
+    if (process.env.DEBUG_IMAGE_OPS === "1" && documentImageOperatorIds.size > 0) {
+      Logger.debug(`Found ${documentImageOperatorIds.size} unique image operator type(s) across entire document`)
@@
-        console.log(`  - fnId: ${fnId} (${operatorName})`)
+        Logger.debug(`fnId: ${fnId} (${operatorName})`)
@@
-      console.log(`Total unique image operators: ${documentImageOperatorIds.size}`)
+      Logger.debug(`Total unique image operators: ${documentImageOperatorIds.size}`)
-    } else {
-      console.log('No image operators found in the entire document')
+    } else if (process.env.DEBUG_IMAGE_OPS === "1") {
+      Logger.debug("No image operators found in the entire document")
-    }
-    console.log(`=== END DOCUMENT SUMMARY ===\n`)
+    }
+    if (process.env.DEBUG_IMAGE_OPS === "1") Logger.debug("END DOCUMENT SUMMARY")
@@
-    console.log("Calling PDF document destroy")
+    Logger.debug("Destroying PDF document")

Also applies to: 582-590, 625-631, 655-675, 676-705, 708-714, 775-779, 884-887, 1245-1247, 1312-1327, 1349-1363, 1390-1392

server/integrations/dataSource/index.ts (1)

209-227: Functional regression: image‑only/scanned PDFs will fail ingestion.

You pass extractImages=true, but Gemini returns image_chunks=[], so PDFs with no extractable text will now throw “No chunks generated...”. Add a guarded legacy fallback (or OCR) when Gemini returns no text.

Apply this diff:

-    const { text_chunks, image_chunks, text_chunk_pos, image_chunk_pos } =
-      await extractTextAndImagesWithChunksFromPDFviaGemini(pdfBuffer, docId, true)
+    let result =
+      await extractTextAndImagesWithChunksFromPDFviaGemini(pdfBuffer, docId, true)
+
+    // Optional: fallback to legacy extractor when Gemini yields no text
+    if (
+      result.text_chunks.length === 0 &&
+      process.env.USE_PDF_LEGACY_FALLBACK === "1"
+    ) {
+      try {
+        const legacy =
+          await extractTextAndImagesWithChunksFromPDFLegacy(pdfBuffer, docId, true)
+        result = legacy
+        Logger.warn({ docId }, "Gemini produced no text; used legacy PDF extractor fallback")
+      } catch (fallbackErr) {
+        Logger.warn({ docId, fallbackErr }, "Legacy PDF extractor fallback failed")
+      }
+    }
+    const { text_chunks, image_chunks, text_chunk_pos, image_chunk_pos } = result

Add the missing import (outside this hunk):

+import { extractTextAndImagesWithChunksFromPDF as extractTextAndImagesWithChunksFromPDFLegacy } from "@/pdfChunks"
server/services/fileProcessor.ts (1)

43-48: Restore PDF image parity: add legacy fallback when Gemini returns no images

Gemini-based PDF extraction can produce zero image_chunks while multiple callers assume image_chunks exist — add the fallback below and the import to restore parity.

Call sites found: server/services/fileProcessor.ts (≈ lines 43–75), server/scripts/testPdfDirect.ts:70, server/integrations/dataSource/index.ts:135,212,255,303.

-        const result = await extractTextAndImagesWithChunksFromPDFviaGemini(
+        let result = await extractTextAndImagesWithChunksFromPDFviaGemini(
           new Uint8Array(buffer),
           vespaDocId,
           extractImages,
           describeImages,
         )
+        // If images were requested but Gemini produced none, optionally pull from legacy
+        if (
+          extractImages &&
+          (!result.image_chunks || result.image_chunks.length === 0) &&
+          process.env.USE_PDF_LEGACY_FALLBACK === "1"
+        ) {
+          try {
+            const legacy =
+              await extractTextAndImagesWithChunksFromPDFLegacy(
+                new Uint8Array(buffer),
+                vespaDocId,
+                extractImages,
+                describeImages,
+              )
+            if (legacy.image_chunks?.length) {
+              result.image_chunks = legacy.image_chunks
+              result.image_chunk_pos = legacy.image_chunk_pos
+            }
+          } catch (e) {
+            console.warn("Legacy PDF image fallback failed:", e)
+          }
+        }

Add import (outside this hunk):

+// import { extractTextAndImagesWithChunksFromPDF as extractTextAndImagesWithChunksFromPDFLegacy } from "@/pdfChunks"
🧹 Nitpick comments (53)
small23_output.md (1)

1-2: Confirm intention to commit generated stub.

If this is a generated placeholder, consider moving under scratch/test-fixtures or .gitignore it to keep the repo clean. Add a short header (title/date/tool) for traceability.

small2_output.md (1)

1-6: Stabilize fixture format and location.

  • If auto‑generated, relocate to scratch/test‑fixtures and document the generator.
  • Use consistent page separators (e.g., H2 “## Page n” instead of HTML comments + hr) to ease diffs.
small2_output.json (1)

9-23: Document units and page metrics.

Widths/heights lack units; add units (e.g., "pt" or "px") and optionally dpi in meta to make data self‑describing.

Example:

   "meta": {
+    "units": "pt",
+    "dpi": 144.0,
small23_output.json (1)

9-16: Same schema hygiene as small2_output.json.

Add explicit units/dpi; keep items empty by contract, or include a notes field explaining empties are expected.

server/.spec-workflow/templates/structure-template.md (2)

5-15: Fix markdownlint: add languages to fenced blocks; remove trailing punctuation in heading.

Specify a language for all fences (use text where appropriate) and drop the colon from “Example Structure:”.

Apply:

-```
+```text
 ...
-```
+```


-### Example Structure:
+### Example Structure
-```
+```text
 src/
 └── dashboard/          # Self-contained dashboard subsystem
 ...
-```
+```

Also applies to: 29-31, 54-61, 68-76, 79-85, 88-94, 125-133


1-146: Template clarity nits.

Consider adding a short preamble describing how placeholders (e.g., [Define...]) should be replaced and an example filled section to guide authors.

scratch/small2/document.md (2)

19-21: Use headings for figure titles and add alt text for images (a11y).

Replace bold “Figure n” with headings and provide alt text.

Apply:

-**Figure 1**
-![](images/page-1-img-1.png)
+#### Figure 1
+![Figure 1: overview graphic](images/page-1-img-1.png)

-**Figure 2**
-![](images/page-1-img-2.png)
+#### Figure 2
+![Figure 2: metric tile](images/page-1-img-2.png)

-**Figure 3**
-![](images/page-1-img-3.png)
+#### Figure 3
+![Figure 3: architecture block](images/page-1-img-3.png)

-**Figure 4**
-![](images/page-1-img-4.png)
+#### Figure 4
+![Figure 4: partner logos](images/page-1-img-4.png)

Also applies to: 24-26, 29-31, 34-36


1-11: Provenance and licensing.

Add a front‑matter block noting source PDF, tool, timestamp, and image license to avoid future compliance issues.

server/.spec-workflow/user-templates/README.md (1)

1-65: LGTM; clear override mechanics.

Nice coverage of priority and variables. Consider linking to config.example.toml for precedence in a “See also” line.

out.md (1)

1-50: Clean up headings, broken words, and structure.

Use proper heading levels, fix split words, prefer bullets for metrics.

Apply exemplar edits:

-# juspay1
-
-# for over 12+ years! Engineering Payments Global Scale
+# Juspay
+Engineering payments at global scale for 12+ years

-# Scale
-# 8900 Bn +
-Annualized TPV
+## Scale
+- 8,900 Bn+ annualized TPV

-San
-
-F rancisco
+San Francisco

-# Reliability
-# 200 Mn +
-Txns/day peaking at 20000 tps+
+## Reliability
+- 200 Mn+ txns/day; peaks 20,000+ TPS

-# Network
-2.5 Bn +
-SDK installs with 400Mn+ mobile users
+## Network
+- 2.5 Bn+ SDK installs; 400 Mn+ mobile users

-# I nnovation
+## Innovation
 ...
-Sao P aulo
+Sao Paulo
server/.spec-workflow/templates/product-template.md (1)

26-28: Fix MD053: list items parsed as link reference definitions

Avoid bracket+colon pattern inside list bullets; switch to plain labels.

- - [Metric 1]: [Target]
- - [Metric 2]: [Target]
- - [Metric 3]: [Target]
+ - Metric 1: [Target]
+ - Metric 2: [Target]
+ - Metric 3: [Target]
scratch/juspay/document.md (1)

965-965: Typos: “OBERVABILITY” → “OBSERVABILITY”

Fix obvious spelling in headings to improve clarity and satisfy linters.

-## PAYMENTS OBERVABILITY
+## PAYMENTS OBSERVABILITY
-## PAYMENTS OBERVABILITY
+## PAYMENTS OBSERVABILITY

Also applies to: 997-997

outOld.md (1)

1-1: Generated artifact: store as pretty‑printed JSON or exclude from VCS

Single‑line, schema‑like payload in a .md file is hard to diff and review. Prefer a .json/.jsonl with pretty‑print, and/or mark as generated (gitignore or linguist‑generated).

server/.spec-workflow/templates/design-template.md (1)

59-64: Add code fence languages (MD040)

Label fences to appease markdownlint and improve syntax highlighting.

-```
+```ts
 ...
-```
+```
-```
+```ts
 ...
-```
+```

Also applies to: 67-71

server/.spec-workflow/config.example.toml (1)

24-29: Clarify ephemeral port note

If the app supports OS‑assigned ephemeral ports, mention “0” explicitly; otherwise, remove ambiguity.

 # The port number for the web dashboard.
-# Must be between 1024 and 65535.
-# Default: ephemeral port (automatically assigned)
+# Must be between 1024 and 65535 (or 0 for an OS‑assigned ephemeral port, if supported).
+# Default: ephemeral port (automatically assigned when unset or set to 0)
server/.spec-workflow/templates/requirements-template.md (3)

1-11: Add minimal metadata and scope sections to make the template actionable

Include versioning, status, assumptions, out‑of‑scope, dependencies, and risks so the doc is self‑contained and auditable.

 # Requirements Document

+> Metadata
+> - Version: 0.1
+> - Status: Draft
+> - Owner: <name>
+> - Last Updated: <YYYY-MM-DD>
+
 ## Introduction
@@
 ## Requirements
+
+### Assumptions
+- <list assumptions>
+
+### Out of Scope
+- <explicitly list exclusions>
+
+### Dependencies
+- <systems/PRs/teams>
+
+### Risks
+- <top risks + mitigations>

15-31: Prefer testable, Gherkin-style acceptance criteria

Switch to Given/When/Then to ensure criteria are executable and unambiguous.

-1. WHEN [event] THEN [system] SHALL [response]
-2. IF [precondition] THEN [system] SHALL [response]
-3. WHEN [event] AND [condition] THEN [system] SHALL [response]
+1. Given [precondition] When [event] Then [system] [observable outcome]
+2. Given [context] And [additional context] When [action] Then [result]
+3. Negative: Given [invalid precondition] When [event] Then [error/state]

34-50: Make NFRs measurable

Add concrete SLOs and thresholds for perf, security, reliability, and usability.

-### Performance
-- [Performance requirements]
+### Performance
+- P95 end-to-end latency <= 800 ms; P99 <= 1500 ms under 500 RPS
+- Max memory growth per page <= 64 MB; no unbounded queues
@@
-### Security
-- [Security requirements]
+### Security
+- Data classification: <Public/Internal/Restricted>
+- No PII logged; logs scrubbed; dependency scan clean (no Critical/High)
@@
-### Reliability
-- [Reliability requirements]
+### Reliability
+- 99.9% availability; graceful degradation; idempotent retries with jitter
@@
-### Usability
-- [Usability requirements]
+### Usability
+- Keyboard accessible, WCAG AA on new UI; clear empty/error states
outFull.md (1)

231-231: Fix markdownlint and readability issues (duplicate headings, malformed headings, bare URLs, typos)

Numerous MD024/MD018/MD034 violations and OCR typos (e.g., “OBERVABILITY”, “IIndiGo”, “San F rancisco”). If the doc must stay, lint and clean it.

Quick fixes:

  • Deduplicate repeated headings.
  • Ensure a space after “#”.
  • Wrap URLs in link syntax and add titles.
  • Run a pass to correct obvious OCR artifacts.

Also applies to: 711-715, 739-743, 799-803, 811-851, 1121-1121, 2087-2087, 2271-2271, 2443-2443, 2487-2489

scripts/pp_structurev3_parse_small2.py (3)

26-33: Narrow the import exception and give actionable guidance

Catching Exception masks unrelated errors. Catch ModuleNotFoundError/ImportError only.

-    from paddleocr import PPStructureV3  # type: ignore
-  except Exception as e:
+    from paddleocr import PPStructureV3  # type: ignore
+  except (ModuleNotFoundError, ImportError) as e:

18-21: Avoid hard‑coded absolute paths; allow override via CLI/env

Make the input configurable to run in CI and by other devs.

-import sys
+import sys, os
@@
-    input_file = Path("/Users/aayush.shah/Downloads/small2.pdf").expanduser()
+    input_file = Path(os.environ.get("INPUT_PDF", "/Users/aayush.shah/Downloads/small2.pdf")).expanduser()

Optionally add argparse later for parity with basic_paddle script.


43-49: Defensive access to res.markdown

Some PPStructure outputs differ; guard for attribute/key presence to avoid AttributeError.

-        md_info = res.markdown
+        md_info = getattr(res, "markdown", {}) or {}
scratch/small2/document.jsonl (1)

1-2: Verify licensing/consent for branded content in test fixtures

The JSONL references branded assets and names. Ensure we have rights or replace with synthetic data to avoid distribution issues.

I can generate a synthetic small2 dataset with neutral content if you want to swap this fixture.

scratch/somatosensory/document.md (2)

25-29: Provide alt text for figures to satisfy a11y and markdownlint MD045

Add short, descriptive alt text placeholders.

-**Figure 1**
-![](images/page-1-img-1.png)
+### Figure 1
+![Receptors in human skin – diagram](images/page-1-img-1.png)
@@
-**Figure 1**
-![](images/page-2-img-1.png)
+### Figure 2
+![Muscle spindle schematic](images/page-2-img-1.png)
@@
-**Figure 1**
-![](images/page-4-img-1.png)
+### Figure 3
+![Proprioceptive feedback loop](images/page-4-img-1.png)

Also applies to: 47-51, 87-91


9-22: Replace bold “Figure” labels with headings; fix MD036

Use proper headings instead of emphasis.

-## Anatomy of the Somatosensory System
-FROM WIKIBOOKS 1
+## Anatomy of the Somatosensory System
+From Wikibooks

Also applies to: 41-43, 70-86

server/.spec-workflow/templates/tasks-template.md (2)

75-96: Normalize numbering and IDs to avoid cross‑referencing confusion

Two task groups share “4.”; prefer hierarchical numbering or unique IDs (e.g., API‑4, API‑4.1).

-- [ ] 4. Create API endpoints
+- [ ] 4.0 Create API endpoints
@@
-- [ ] 4.1 Set up routing and middleware
+- [ ] 4.1 Set up routing and middleware

3-20: Add DoD/owner/estimate fields per task

Makes the template execution‑ready in planning tools.

   - Purpose: Establish type safety for feature implementation
+  - Owner: <name>
+  - Estimate: <points/hours>
+  - Definition of Done: <acceptance checklist>
scripts/basic_paddleocr_pdf_to_md.py (2)

166-169: Use the documented PaddleOCR API (ocr.ocr) instead of predict; verify parameter names

Several PaddleOCR versions expose .ocr(image); predict may not exist. Also confirm use_textline_orientation is valid.

-    ocr = PaddleOCR(use_textline_orientation=True, lang=args.lang)
-    markdown = render_markdown(pages, ocr, min_conf=args.min_conf, headings=not args.no_page_headings)
+    ocr = PaddleOCR(lang=args.lang, show_log=False)
+    markdown = render_markdown(pages, ocr, min_conf=args.min_conf, headings=not args.no_page_headings)
-        result = ocr.predict(image_array)
+        result = ocr.ocr(image_array)

Please run a quick sanity test against a 1–2 page PDF to confirm shapes of the returned results.

Also applies to: 134-136


1-1: Shebang present but file likely not executable

Either make it executable in the repo or drop the shebang to avoid EXE001 in lint.

-#!/usr/bin/env python3
+# (Shebang removed; run with `python -m scripts.basic_paddleocr_pdf_to_md`)
server/pdfChunks.ts (4)

329-343: Deduplicate getOperatorName; define once and reuse

Two identical helpers exist; keep one at module scope and reuse for page scope.

-    // Map fnId to operation name for better logging
-    const getOperatorName = (fnId: number): string => {
-      const opNames: { [key: number]: string } = {
-        [PDFJS.OPS.paintImageXObject]: 'paintImageXObject',
-        [PDFJS.OPS.paintImageXObjectRepeat]: 'paintImageXObjectRepeat', 
-        [PDFJS.OPS.paintInlineImageXObject]: 'paintInlineImageXObject',
-        [PDFJS.OPS.paintImageMaskXObject]: 'paintImageMaskXObject'
-      }
-      return opNames[fnId] || `unknownOp_${fnId}`
-    }
+    // (moved to module scope) use getOperatorName(fnId)
@@
-        // Map fnId to operation name for better logging
-        const getOperatorName = (fnId: number): string => {
-          const opNames: { [key: number]: string } = {
-            [PDFJS.OPS.paintImageXObject]: 'paintImageXObject',
-            [PDFJS.OPS.paintImageXObjectRepeat]: 'paintImageXObjectRepeat', 
-            [PDFJS.OPS.paintInlineImageXObject]: 'paintInlineImageXObject',
-            [PDFJS.OPS.paintImageMaskXObject]: 'paintImageMaskXObject'
-          }
-          return opNames[fnId] || `unknownOp_${fnId}`
-        }
+        // use getOperatorName(fnId)

Additional (module scope, near other constants):

const getOperatorName = (fnId: number): string => {
  const opNames: Record<number, string> = {
    [PDFJS.OPS.paintImageXObject]: "paintImageXObject",
    [PDFJS.OPS.paintImageXObjectRepeat]: "paintImageXObjectRepeat",
    [PDFJS.OPS.paintInlineImageXObject]: "paintInlineImageXObject",
    [PDFJS.OPS.paintImageMaskXObject]: "paintImageMaskXObject",
  }
  return opNames[fnId] ?? `unknownOp_${fnId}`
}

Also applies to: 474-482


1193-1197: Log level: reuse description isn’t a warning

Downgrade Logger.warn to debug to avoid alert fatigue.

-                      Logger.warn(
-                        `Reusing description for repeated image ${imageName} on page ${pageNum}`,
-                      )
+                      Logger.debug(
+                        `Reusing description for repeated image ${imageName} on page ${pageNum}`,
+                      )

1378-1379: Remove duplicate debug log

Same message printed twice.

-    Logger.debug("All text chunks", { text_chunks })
     Logger.debug("All text chunks", { text_chunks })

561-565: Switch case with conditional labels is brittle

case extractImages ? OPS.paintImageXObject : null compiles, but harms readability and tooling. Prefer an explicit if (!extractImages) break; guard inside the case or a wrapping if (extractImages) around the switch block.

scripts/doclingTemp.py (2)

266-283: Narrow overly broad exceptions and include context.

Catching bare Exception hides actionable causes; at least log exception type.

Apply:

-                except Exception as e:
+                except (ValueError, OSError) as e:
                     log.warning(f"Page {page_no_display}: failed to crop figure bbox={bbox}: {e}")
                     continue
...
-                except Exception as e:
+                except OSError as e:
                     log.warning(f"Page {page_no_display}: failed to save figure: {e}")

1-1: Make script executable or remove shebang.

Either chmod +x scripts/doclingTemp.py or drop the shebang to silence linters.

scripts/docling_pdf_to_md.py (3)

168-173: Rename ambiguous variable l and avoid constant getattr for bbox.

Improves clarity and avoids style violations.

Apply:

-                            l = float(getattr(bbox, "l"))
-                            t = float(getattr(bbox, "t"))
-                            r = float(getattr(bbox, "r"))
-                            b = float(getattr(bbox, "b"))
-                            bbox_list = [l, t, r, b]
+                            left = float(bbox.l)
+                            top = float(bbox.t)
+                            right = float(bbox.r)
+                            bottom = float(bbox.b)
+                            bbox_list = [left, top, right, bottom]

111-116: Narrow try/except around callable invocation.

Only catch TypeError from wrong call signatures; other exceptions should propagate.

Apply:

-                try:
-                    text = text()
-                except Exception:
-                    text = ""
+                try:
+                    text = text()
+                except TypeError:
+                    text = ""

Repeat similarly for block_type invocation below.


1-1: Make script executable or drop shebang.

scripts/paddleocr_pdf_to_md.py (3)

279-285: Fix undefined type in annotation to avoid F821 and ease type checking.

The string "PPStructure" isn’t defined; prefer Any or a minimal Protocol.

Apply:

-from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple
+from typing import Any, Dict, Iterable, List, Optional, Sequence, Tuple
+from typing import Protocol
+
+class _PPStructureLike(Protocol):
+    def predict(self, img: Any) -> List[Dict[str, Any]]: ...
@@
-def process_page(
-    engine: "PPStructure",  # type: ignore[name-defined]
+def process_page(
+    engine: _PPStructureLike,

431-437: Broadened exception here is acceptable; include type and page for debugging.

Keep resilience but improve observability.

Apply:

-    except Exception as e:
-        sys.stderr.write(f"Failed to rasterize PDF: {e}\n")
+    except Exception as e:
+        sys.stderr.write(f"Failed to rasterize PDF ({type(e).__name__}): {e}\n")

And:

-        except Exception as e:
-            sys.stderr.write(f"Error on page {page_index + 1}: {e}\n")
+        except Exception as e:
+            sys.stderr.write(f"Error on page {page_index + 1} ({type(e).__name__}): {e}\n")

1-1: Make script executable or drop shebang.

scripts/paddle_pdf_to_md.py (2)

14-17: Catch only import errors to avoid masking real bugs.

Don’t swallow unrelated exceptions during import.

Apply:

-except Exception as e:  # pragma: no cover
+except (ModuleNotFoundError, ImportError) as e:  # pragma: no cover

1-1: Make script executable or drop shebang.

server/lib/chunkPdfWithGemini.ts (3)

61-68: Align inline/file threshold with docs and messages.

Comment and error say “15MB” but constant is 17MB; pick one.

Apply:

-// 15 MB threshold for inlineData vs file_data
-const INLINE_MAX_BYTES = 17 * 1024 * 1024
+// 15 MB threshold for inlineData vs file_data
+const INLINE_MAX_BYTES = 15 * 1024 * 1024

Or adjust strings to 17MB.


146-155: Add timeout/retry to external call to avoid hanging request threads.

LLM calls can stall; guard with a timeout and limited retries.

Proposed wrapper:

async function withTimeout<T>(p: Promise<T>, ms = 180_000): Promise<T> {
  return Promise.race([p, new Promise<never>((_, r) => setTimeout(() => r(new Error("Vertex call timed out")), ms))]) as Promise<T>
}
// usage
const response = await withTimeout(model.generateContent({ contents: [...] }), 180_000)

Also consider exponential backoff (e.g., 3 retries on transient 429/5xx).


156-165: Check finish reason and blocked content; fail fast when output is empty.

Prevents silent success with zero chunks.

Apply:

-  const candidates = (response as any)?.response?.candidates || []
+  const respAny = (response as any)?.response
+  const feedback = respAny?.promptFeedback
+  if (feedback?.blockReason) {
+    throw new Error(`Gemini blocked content: ${feedback.blockReason}`)
+  }
+  const candidates = respAny?.candidates || []
   const parts = candidates[0]?.content?.parts || []
   const text = parts
     .filter((p: any) => typeof p?.text === "string")
     .map((p: any) => p.text as string)
     .join("")
     .trim()
+  if (!text) Logger.warn({ sizeBytes: dataSize }, "Empty response from Gemini")
server/integrations/microsoft/attachment-utils.ts (1)

51-56: Note: image flags are now effectively no‑ops for PDFs.

Gemini returns no image chunks; the third boolean won’t change output. Consider a brief comment to avoid confusion for future readers.

-    const pdfResult = await extractTextAndImagesWithChunksFromPDFviaGemini(
+    // Gemini PDF path returns text only; images are not emitted
+    const pdfResult = await extractTextAndImagesWithChunksFromPDFviaGemini(
       pdfBuffer,
       attachmentId,
       false, // Don't extract images for email attachments
     )
server/integrations/google/worker-utils.ts (1)

52-56: Clarify that images are not produced by the Gemini PDF path.

Small comment helps avoid misinterpretation of the boolean.

-    const pdfResult = await extractTextAndImagesWithChunksFromPDFviaGemini(
+    // Gemini extractor emits text only; image arrays remain empty
+    const pdfResult = await extractTextAndImagesWithChunksFromPDFviaGemini(
       pdfBuffer,
       attachmentId,
       false, // Don't extract images for email attachments
     )
server/services/fileProcessor.ts (1)

3-4: Remove or fix the commented import.

The dangling comment is malformed and may trigger linters. Either delete it or fix the path.

-// import { extractTextAndImagesWithChunksFromPDF } from "@/pdf
+// Legacy extractor (optional fallback):
+// import { extractTextAndImagesWithChunksFromPDF as extractTextAndImagesWithChunksFromPDFLegacy } from "@/pdfChunks"
server/scripts/testPdfDirect.ts (3)

7-10: Avoid hard‑coded local paths; accept CLI arg or env.

Make the script portable.

-  let pdfPath = "/Users/aayush.shah/Downloads/juspay.pdf"
+  let pdfPath = process.argv[2] || process.env.TEST_PDF_PATH || process.env.PDF_PATH || ""
+  if (!pdfPath) {
+    throw new Error("Provide a PDF path as CLI arg or set TEST_PDF_PATH/PDF_PATH")
+  }

79-81: Outdated reference to pdfChunks logs.

This script should either use the Gemini path or update the note accordingly.

-    console.log("✓ Check the debug logs above from pdfChunks.ts")
-    console.log("✓ You can see exactly what's being processed in the current knowledge base flow")
+    console.log("✓ If using the Gemini path, check logs from server/lib/chunkPdfWithGemini.ts")

7-10: Align this test with the Gemini extractor and drop unused imports.

Currently it imports and calls the legacy extractor; switch to the Gemini path for consistency with production.

-import { FileProcessorService } from "@/services/fileProcessor"
-import { extractTextAndImagesWithChunksFromPDF } from "@/pdfChunks"
+import { extractTextAndImagesWithChunksFromPDFviaGemini } from "@/lib/chunkPdfWithGemini"
@@
-    const imageResult = await extractTextAndImagesWithChunksFromPDF(
+    const imageResult = await extractTextAndImagesWithChunksFromPDFviaGemini(
       new Uint8Array(pdfBuffer),
       "test-doc-with-images",
-      true,  // extractImages enabled
-      true   // describeImages enabled
+      true,  // flag retained for signature parity (Gemini ignores and returns no images)
+      true
     )

The PR summary claims this script was updated to the Gemini path, but the code still uses the legacy extractor.

Also applies to: 35-40

server/scripts/testGeminiFromProcessFile.ts (1)

91-101: Gate verbose dumps behind a flag to avoid noisy logs and PII leakage.

Print counts by default; full dumps when VERBOSE=1.

-  console.log("All text chunks", { chunks })
-  console.log("All text chunk positions", { chunks_pos })
-  console.log("All image chunks", { image_chunks })
-  console.log("All image chunk positions", { image_chunks_pos })
+  if (process.env.VERBOSE === "1") {
+    console.log("All text chunks", { chunks })
+    console.log("All text chunk positions", { chunks_pos })
+    console.log("All image chunks", { image_chunks })
+    console.log("All image chunk positions", { image_chunks_pos })
+  }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a74dab9 and 394e3eb.

⛔ Files ignored due to path filters (109)
  • scratch/juspay/images/page-1-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-1-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-1-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-1-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-10-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-10-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-10-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-10-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-11-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-11-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-12-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-12-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-12-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-12-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-13-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-13-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-13-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-13-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-15-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-15-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-15-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-15-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-15-img-5.png is excluded by !**/*.png
  • scratch/juspay/images/page-15-img-6.png is excluded by !**/*.png
  • scratch/juspay/images/page-16-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-17-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-17-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-18-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-18-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-19-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-19-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-19-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-5.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-6.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-7.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-8.png is excluded by !**/*.png
  • scratch/juspay/images/page-20-img-9.png is excluded by !**/*.png
  • scratch/juspay/images/page-21-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-21-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-22-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-23-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-23-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-24-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-24-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-25-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-25-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-25-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-26-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-26-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-26-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-26-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-27-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-27-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-27-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-27-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-27-img-5.png is excluded by !**/*.png
  • scratch/juspay/images/page-27-img-6.png is excluded by !**/*.png
  • scratch/juspay/images/page-27-img-7.png is excluded by !**/*.png
  • scratch/juspay/images/page-28-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-3-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-3-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-3-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-30-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-31-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-32-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-33-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-34-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-35-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-36-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-36-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-36-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-36-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-37-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-38-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-4-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-4-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-4-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-4-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-40-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-41-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-41-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-41-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-6-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-6-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-6-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-6-img-4.png is excluded by !**/*.png
  • scratch/juspay/images/page-6-img-5.png is excluded by !**/*.png
  • scratch/juspay/images/page-7-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-7-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-7-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-8-img-1.png is excluded by !**/*.png
  • scratch/juspay/images/page-8-img-2.png is excluded by !**/*.png
  • scratch/juspay/images/page-8-img-3.png is excluded by !**/*.png
  • scratch/juspay/images/page-9-img-1.png is excluded by !**/*.png
  • scratch/juspay/multimodal_2025-09-17_223836.parquet is excluded by !**/*.parquet
  • scratch/small2/images/page-1-img-1.png is excluded by !**/*.png
  • scratch/small2/images/page-1-img-2.png is excluded by !**/*.png
  • scratch/small2/images/page-1-img-3.png is excluded by !**/*.png
  • scratch/small2/images/page-1-img-4.png is excluded by !**/*.png
  • scratch/small2/multimodal_2025-09-17_223335.parquet is excluded by !**/*.parquet
  • scratch/small2/multimodal_2025-09-17_223510.parquet is excluded by !**/*.parquet
  • scratch/somatosensory/images/page-1-img-1.png is excluded by !**/*.png
  • scratch/somatosensory/images/page-2-img-1.png is excluded by !**/*.png
  • scratch/somatosensory/images/page-4-img-1.png is excluded by !**/*.png
  • scratch/somatosensory/multimodal_2025-09-17_223619.parquet is excluded by !**/*.parquet
📒 Files selected for processing (35)
  • .husky/pre-commit (1 hunks)
  • out.md (1 hunks)
  • outFull.md (1 hunks)
  • outOld.md (1 hunks)
  • scratch/juspay/document.md (1 hunks)
  • scratch/small2/document.jsonl (1 hunks)
  • scratch/small2/document.md (1 hunks)
  • scratch/somatosensory/document.jsonl (1 hunks)
  • scratch/somatosensory/document.md (1 hunks)
  • scripts/basic_paddleocr_pdf_to_md.py (1 hunks)
  • scripts/doclingTemp.py (1 hunks)
  • scripts/docling_pdf_to_md.py (1 hunks)
  • scripts/paddle_pdf_to_md.py (1 hunks)
  • scripts/paddleocr_pdf_to_md.py (1 hunks)
  • scripts/pp_structurev3_parse_small2.py (1 hunks)
  • server/.spec-workflow/config.example.toml (1 hunks)
  • server/.spec-workflow/templates/design-template.md (1 hunks)
  • server/.spec-workflow/templates/product-template.md (1 hunks)
  • server/.spec-workflow/templates/requirements-template.md (1 hunks)
  • server/.spec-workflow/templates/structure-template.md (1 hunks)
  • server/.spec-workflow/templates/tasks-template.md (1 hunks)
  • server/.spec-workflow/templates/tech-template.md (1 hunks)
  • server/.spec-workflow/user-templates/README.md (1 hunks)
  • server/integrations/dataSource/index.ts (2 hunks)
  • server/integrations/google/worker-utils.ts (2 hunks)
  • server/integrations/microsoft/attachment-utils.ts (2 hunks)
  • server/lib/chunkPdfWithGemini.ts (1 hunks)
  • server/pdfChunks.ts (17 hunks)
  • server/scripts/testGeminiFromProcessFile.ts (1 hunks)
  • server/scripts/testPdfDirect.ts (2 hunks)
  • server/services/fileProcessor.ts (2 hunks)
  • small23_output.json (1 hunks)
  • small23_output.md (1 hunks)
  • small2_output.json (1 hunks)
  • small2_output.md (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:55:46.701Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/gmail-worker.ts:293-294
Timestamp: 2025-05-28T10:55:46.701Z
Learning: There are two separate `parseMail` functions in the codebase: one in `server/integrations/google/gmail-worker.ts` with signature `(email, gmail, client, userEmail)` returning `{ mailData, insertedPdfCount, exist }`, and another in `server/integrations/google/gmail/index.ts` with signature `(email, gmail, userEmail, client, tracker?)` returning `{ mailData, exist }`. Each file calls its own local version correctly.

Applied to files:

  • server/integrations/google/worker-utils.ts
🧬 Code graph analysis (6)
server/integrations/google/worker-utils.ts (1)
server/lib/chunkPdfWithGemini.ts (1)
  • extractTextAndImagesWithChunksFromPDFviaGemini (194-233)
server/integrations/microsoft/attachment-utils.ts (1)
server/lib/chunkPdfWithGemini.ts (1)
  • extractTextAndImagesWithChunksFromPDFviaGemini (194-233)
server/integrations/dataSource/index.ts (1)
server/lib/chunkPdfWithGemini.ts (1)
  • extractTextAndImagesWithChunksFromPDFviaGemini (194-233)
server/scripts/testGeminiFromProcessFile.ts (1)
server/lib/chunkPdfWithGemini.ts (1)
  • extractTextAndImagesWithChunksFromPDFviaGemini (194-233)
server/services/fileProcessor.ts (1)
server/lib/chunkPdfWithGemini.ts (1)
  • extractTextAndImagesWithChunksFromPDFviaGemini (194-233)
server/lib/chunkPdfWithGemini.ts (1)
server/logger/index.ts (2)
  • getLogger (36-93)
  • Subsystem (15-15)
🪛 markdownlint-cli2 (0.17.2)
server/.spec-workflow/templates/product-template.md

26-26: Link and image reference definitions should be needed
Unused link or image reference definition: "metric 1"

(MD053, link-image-reference-definitions)


27-27: Link and image reference definitions should be needed
Unused link or image reference definition: "metric 2"

(MD053, link-image-reference-definitions)


28-28: Link and image reference definitions should be needed
Unused link or image reference definition: "metric 3"

(MD053, link-image-reference-definitions)

server/.spec-workflow/templates/design-template.md

59-59: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


67-67: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

server/.spec-workflow/templates/structure-template.md

5-5: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


54-54: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


68-68: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


79-79: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


88-88: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


125-125: Trailing punctuation in heading
Punctuation: ':'

(MD026, no-trailing-punctuation)


126-126: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

scratch/juspay/document.md

231-231: Multiple headings with the same content

(MD024, no-duplicate-heading)


711-711: Multiple headings with the same content

(MD024, no-duplicate-heading)


715-715: Multiple headings with the same content

(MD024, no-duplicate-heading)


739-739: Multiple headings with the same content

(MD024, no-duplicate-heading)


741-741: Multiple headings with the same content

(MD024, no-duplicate-heading)


743-743: Multiple headings with the same content

(MD024, no-duplicate-heading)


799-799: Multiple headings with the same content

(MD024, no-duplicate-heading)


801-801: Multiple headings with the same content

(MD024, no-duplicate-heading)


803-803: Multiple headings with the same content

(MD024, no-duplicate-heading)


811-811: Multiple headings with the same content

(MD024, no-duplicate-heading)


839-839: Multiple headings with the same content

(MD024, no-duplicate-heading)


841-841: Multiple headings with the same content

(MD024, no-duplicate-heading)


843-843: Multiple headings with the same content

(MD024, no-duplicate-heading)


847-847: Multiple headings with the same content

(MD024, no-duplicate-heading)


851-851: Multiple headings with the same content

(MD024, no-duplicate-heading)


1121-1121: No space after hash on atx style heading

(MD018, no-missing-space-atx)

scratch/small2/document.md

3-3: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


19-19: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


21-21: Images should have alternate text (alt text)

(MD045, no-alt-text)


24-24: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


26-26: Images should have alternate text (alt text)

(MD045, no-alt-text)


29-29: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


31-31: Images should have alternate text (alt text)

(MD045, no-alt-text)


34-34: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


36-36: Images should have alternate text (alt text)

(MD045, no-alt-text)

outFull.md

231-231: Multiple headings with the same content

(MD024, no-duplicate-heading)


711-711: Multiple headings with the same content

(MD024, no-duplicate-heading)


715-715: Multiple headings with the same content

(MD024, no-duplicate-heading)


739-739: Multiple headings with the same content

(MD024, no-duplicate-heading)


741-741: Multiple headings with the same content

(MD024, no-duplicate-heading)


743-743: Multiple headings with the same content

(MD024, no-duplicate-heading)


799-799: Multiple headings with the same content

(MD024, no-duplicate-heading)


801-801: Multiple headings with the same content

(MD024, no-duplicate-heading)


803-803: Multiple headings with the same content

(MD024, no-duplicate-heading)


811-811: Multiple headings with the same content

(MD024, no-duplicate-heading)


839-839: Multiple headings with the same content

(MD024, no-duplicate-heading)


841-841: Multiple headings with the same content

(MD024, no-duplicate-heading)


843-843: Multiple headings with the same content

(MD024, no-duplicate-heading)


847-847: Multiple headings with the same content

(MD024, no-duplicate-heading)


851-851: Multiple headings with the same content

(MD024, no-duplicate-heading)


1121-1121: No space after hash on atx style heading

(MD018, no-missing-space-atx)


2087-2087: Multiple headings with the same content

(MD024, no-duplicate-heading)


2271-2271: Multiple headings with the same content

(MD024, no-duplicate-heading)


2443-2443: Multiple headings with the same content

(MD024, no-duplicate-heading)


2487-2487: Bare URL used

(MD034, no-bare-urls)


2489-2489: Bare URL used

(MD034, no-bare-urls)

scratch/somatosensory/document.md

3-3: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


25-25: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


27-27: Images should have alternate text (alt text)

(MD045, no-alt-text)


47-47: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


49-49: Images should have alternate text (alt text)

(MD045, no-alt-text)


72-72: Multiple headings with the same content

(MD024, no-duplicate-heading)


87-87: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


89-89: Images should have alternate text (alt text)

(MD045, no-alt-text)

🪛 Ruff (0.12.2)
scripts/pp_structurev3_parse_small2.py

1-1: Shebang is present but file is not executable

(EXE001)


28-28: Do not catch blind exception: Exception

(BLE001)

scripts/basic_paddleocr_pdf_to_md.py

1-1: Shebang is present but file is not executable

(EXE001)

scripts/docling_pdf_to_md.py

1-1: Shebang is present but file is not executable

(EXE001)


21-21: Docstring contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF002)


114-114: Do not catch blind exception: Exception

(BLE001)


125-125: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


126-126: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


127-127: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


128-128: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


130-130: Do not catch blind exception: Exception

(BLE001)


137-137: Do not catch blind exception: Exception

(BLE001)


163-163: Do not catch blind exception: Exception

(BLE001)


168-168: Ambiguous variable name: l

(E741)


168-168: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


169-169: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


170-170: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


171-171: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


173-173: Do not catch blind exception: Exception

(BLE001)


194-194: Do not catch blind exception: Exception

(BLE001)


195-195: Comment contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF003)


229-232: Consider moving this statement to an else block

(TRY300)


235-235: Do not catch blind exception: Exception

(BLE001)


249-249: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


251-251: Do not call getattr with a constant attribute value. It is not any safer than normal property access.

Replace getattr with attribute access

(B009)


252-253: try-except-pass detected, consider logging the exception

(S110)


252-252: Do not catch blind exception: Exception

(BLE001)


281-281: Do not catch blind exception: Exception

(BLE001)


287-287: Do not catch blind exception: Exception

(BLE001)


306-306: Do not catch blind exception: Exception

(BLE001)


361-361: Unused noqa directive (non-enabled: T201)

Remove unused noqa directive

(RUF100)

scripts/doclingTemp.py

1-1: Shebang is present but file is not executable

(EXE001)


61-61: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


62-62: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


63-63: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


64-64: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


69-69: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


70-70: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


71-71: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


72-72: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


74-74: Avoid specifying long messages outside the exception class

(TRY003)


187-187: Avoid specifying long messages outside the exception class

(TRY003)


211-211: Comment contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF003)


272-272: Do not catch blind exception: Exception

(BLE001)


281-281: Do not catch blind exception: Exception

(BLE001)


294-294: Comment contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF003)


309-309: Comment contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF003)

scripts/paddle_pdf_to_md.py

1-1: Shebang is present but file is not executable

(EXE001)


4-4: Docstring contains ambiguous (RIGHT SINGLE QUOTATION MARK). Did you mean ``` (GRAVE ACCENT)?

(RUF002)


14-14: Do not catch blind exception: Exception

(BLE001)

scripts/paddleocr_pdf_to_md.py

1-1: Shebang is present but file is not executable

(EXE001)


170-170: Consider moving this statement to an else block

(TRY300)


171-171: Do not catch blind exception: Exception

(BLE001)


190-190: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


191-191: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


192-192: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


193-193: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


206-206: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


207-207: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


208-208: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


209-209: Value being cast to int is already an integer

Remove unnecessary int call

(RUF046)


279-279: Undefined name PPStructure

(F821)


433-433: Do not catch blind exception: Exception

(BLE001)


452-452: Do not catch blind exception: Exception

(BLE001)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
server/lib/chunkPdfWithGemini.ts (3)

198-210: Fix project ID resolution and error message.

The project ID resolution ignores opts.projectId and common environment variables, but the error message misleadingly references them.

This issue was previously flagged in past reviews. The code still only checks VERTEX_PROJECT_ID but the error message claims to check multiple environment variables and opts.projectId.

Apply this fix:

-  const projectId =
-    process.env.VERTEX_PROJECT_ID ||
-    ""
+  const projectId =
+    opts.projectId ||
+    process.env.VERTEX_PROJECT_ID ||
+    process.env.GOOGLE_CLOUD_PROJECT ||
+    process.env.GCLOUD_PROJECT ||
+    process.env.GCP_PROJECT_ID ||
+    ""

Also update the error message to accurately reflect what's being checked:

-      "Missing GCP project ID. Set VERTEX_PROJECT_ID or GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT/GCP_PROJECT_ID) or pass options.projectId.",
+      "Missing GCP project ID. Set one of: opts.projectId, VERTEX_PROJECT_ID, GOOGLE_CLOUD_PROJECT, GCLOUD_PROJECT, or GCP_PROJECT_ID.",

254-262: Replace unsafe type casting with proper response validation.

The response parsing uses unsafe (response as any) casting which could lead to runtime errors if the API response structure changes.

This was previously flagged. Consider defining interfaces for the expected response structure and using type guards:

interface VertexResponse {
  response?: {
    candidates?: Array<{
      content?: {
        parts?: Array<{ text?: string }>
      }
    }>
  }
}

function isValidVertexResponse(response: any): response is VertexResponse {
  return response && 
         response.response && 
         Array.isArray(response.response.candidates)
}

Then use it safely:

-  const candidates = (response as any)?.response?.candidates || []
+  if (!isValidVertexResponse(response)) {
+    throw new Error("Invalid response structure from Vertex AI")
+  }
+  const candidates = response.response.candidates || []

319-323: Add empty chunk validation and logging.

The function can silently return empty text chunks when Gemini fails to extract content, which may cause issues downstream.

This was previously identified. Add validation and logging:

  const raw = await extractSemanticChunksFromPdf(data, opts as ChunkPdfOptions)
  const chunks = parseGeminiChunkBlocks(raw)
+ if (chunks.length === 0) {
+   Logger.warn({ docid }, "No <chunk> blocks parsed from Gemini output")
+ }
  for (const c of chunks) {
    text_chunks.push(c)
    text_chunk_pos.push(globalSeq++)
  }

Similarly for the split PDF path at line 331.

🧹 Nitpick comments (1)
server/lib/chunkPdfWithGemini.ts (1)

64-66: Document the size limit rationale.

The 17MB and 100MB limits should be documented with their source and reasoning.

Add comments explaining these specific values:

// Size limits for PDF processing
-const INLINE_MAX_BYTES = 17 * 1024 * 1024 // 17MB - split into chunks
-const MAX_SUPPORTED_BYTES = 100 * 1024 * 1024 // 100MB - hard limit
+const INLINE_MAX_BYTES = 17 * 1024 * 1024 // 17MB - Vertex AI inlineData limit for single requests
+const MAX_SUPPORTED_BYTES = 100 * 1024 * 1024 // 100MB - practical processing limit to prevent resource exhaustion
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 845db22 and 70a95db.

📒 Files selected for processing (7)
  • server/integrations/dataSource/index.ts (2 hunks)
  • server/integrations/google/worker-utils.ts (2 hunks)
  • server/integrations/microsoft/attachment-utils.ts (2 hunks)
  • server/lib/chunkPdfWithGemini.ts (1 hunks)
  • server/package.json (1 hunks)
  • server/scripts/testGeminiFromProcessFile.ts (1 hunks)
  • server/services/fileProcessor.ts (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • server/integrations/google/worker-utils.ts
  • server/scripts/testGeminiFromProcessFile.ts
  • server/services/fileProcessor.ts
🧰 Additional context used
🧬 Code graph analysis (3)
server/integrations/dataSource/index.ts (1)
server/lib/chunkPdfWithGemini.ts (1)
  • extractTextAndImagesWithChunksFromPDFviaGemini (291-344)
server/integrations/microsoft/attachment-utils.ts (1)
server/lib/chunkPdfWithGemini.ts (1)
  • extractTextAndImagesWithChunksFromPDFviaGemini (291-344)
server/lib/chunkPdfWithGemini.ts (1)
server/logger/index.ts (2)
  • getLogger (36-93)
  • Subsystem (15-15)
🔇 Additional comments (8)
server/package.json (1)

93-93: Verify the security status of pdf-lib dependency.

The added pdf-lib ^1.17.1 dependency appears to have no direct vulnerabilities according to current security databases. However, verify the package is legitimate and actively maintained for production use.

Based on the search results, No direct vulnerabilities have been found for this package in Snyk's vulnerability database for pdf-lib version 1.17.1. The package appears to be secure from a vulnerability standpoint.

server/lib/chunkPdfWithGemini.ts (3)

69-80: LGTM: Clean PDF range extraction implementation.

The saveRange function efficiently creates a new PDF with specified page ranges using the pdf-lib library. The implementation correctly handles the page copying and document creation.


85-148: LGTM: Efficient binary search optimization.

The findMaxFittingCount function uses a well-designed exponential growth + binary search approach to find the optimal page count within size constraints. The single-page error handling and complexity documentation are excellent.


151-183: LGTM: Effective PDF splitting strategy.

The splitPdfIntoInlineSizedChunks function provides a clean public interface for splitting PDFs into processable chunks with good logging support.

server/integrations/dataSource/index.ts (2)

35-35: LGTM: Clean migration to Gemini-based PDF extractor.

The import change correctly switches from the legacy PDF processor to the new Gemini-based extractor, maintaining the expected function signature and return types.


211-211: LGTM: Function call updated correctly.

The call to extractTextAndImagesWithChunksFromPDFviaGemini correctly removes the third boolean parameter while preserving the return value handling. The downstream processing of text_chunks, image_chunks, etc. remains intact.

server/integrations/microsoft/attachment-utils.ts (2)

20-20: LGTM: Import updated to use Gemini-based extractor.

The import correctly switches to the new Gemini-based PDF processing function while maintaining the same module interface.


51-54: LGTM: Function call signature updated correctly.

The call to extractTextAndImagesWithChunksFromPDFviaGemini correctly removes the third boolean parameter and preserves the filtering logic for non-empty text chunks.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/api/dataSource.ts (1)

109-111: Path traversal via unsanitized filename in temp path.

file.name can contain ../ or path separators; path.join(base, tempFileName) can escape the base dir. Sanitize before composing the path.

Apply:

-  const tempFileName = `${Date.now()}_${randomUUID()}_${file.name}`
-  const filePath = join(DOWNLOADS_DIR_DATASOURCE, tempFileName)
+  const unsafeName = String(file.name || "upload")
+  const baseName = path.basename(unsafeName).replace(/[\/\\]+/g, "_")
+  const safeName = baseName.replace(/[^\w.\-()+\[\] ]+/g, "_").slice(0, 180)
+  const tempFileName = `${Date.now()}_${randomUUID()}_${safeName}`
+  const filePath = join(DOWNLOADS_DIR_DATASOURCE, tempFileName)
♻️ Duplicate comments (3)
server/lib/chunkPdfWithGemini.ts (3)

154-166: Fix projectId/location resolution and align error text.

opts.projectId and common GCP env vars are ignored; error mentions them anyway. This breaks in standard GCP setups.

Apply:

-  const projectId =
-    process.env.VERTEX_PROJECT_ID ||
-    ""
-
-  const location =
-    process.env.VERTEX_REGION ||
-    "us-central1"
+  const projectId =
+    opts.projectId ||
+    process.env.VERTEX_PROJECT_ID ||
+    process.env.GOOGLE_CLOUD_PROJECT ||
+    process.env.GCLOUD_PROJECT ||
+    process.env.GCP_PROJECT_ID ||
+    ""
+
+  const location =
+    opts.location ||
+    process.env.VERTEX_REGION ||
+    "us-central1"
@@
-  if (!projectId) {
-    throw new Error(
-      "Missing GCP project ID. Set VERTEX_PROJECT_ID or GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT/GCP_PROJECT_ID) or pass options.projectId.",
-    )
-  }
+  if (!projectId) {
+    throw new Error(
+      "Missing GCP project ID. Set one of: opts.projectId, VERTEX_PROJECT_ID, GOOGLE_CLOUD_PROJECT, GCLOUD_PROJECT, or GCP_PROJECT_ID.",
+    )
+  }

188-219: Harden response parsing; throw on empty output.

Current (response as any) is brittle and silent empty text yields empty chunks.

Apply:

+import { ContentExtractionError } from "@/integrations/dataSource/errors"
@@
-  // Parse and return raw text
-  const candidates = (response as any)?.response?.candidates || []
-  const parts = candidates[0]?.content?.parts || []
-  const text = parts
-    .filter((p: any) => typeof p?.text === "string")
-    .map((p: any) => p.text as string)
-    .join("")
-    .trim()
+  // Parse and return raw text (defensive)
+  const r: any = response
+  const candidates = Array.isArray(r?.response?.candidates)
+    ? r.response.candidates
+    : []
+  const parts = candidates[0]?.content?.parts
+  const text =
+    Array.isArray(parts)
+      ? parts
+          .filter((p: any) => typeof p?.text === "string")
+          .map((p: any) => String(p.text))
+          .join("")
+          .trim()
+      : ""
+  if (!text) {
+    Logger.warn({ sizeBytes: dataSize }, "Gemini returned empty content")
+    throw new ContentExtractionError("Empty response from Gemini", "PDF")
+  }

271-293: Surface empty-chunk condition to callers.

Avoid silently returning no chunks; fail fast so UI can show userMessage.

Apply:

   if (data.length <= INLINE_MAX_BYTES) {
@@
-    const chunks = parseGeminiChunkBlocks(raw)
+    const chunks = parseGeminiChunkBlocks(raw)
+    if (chunks.length === 0) {
+      throw new ContentExtractionError("No <chunk> blocks parsed from Gemini output", "PDF")
+    }
@@
   } else {
@@
-      const chunks = parseGeminiChunkBlocks(raw)
+      const chunks = parseGeminiChunkBlocks(raw)
+      if (chunks.length === 0) {
+        throw new ContentExtractionError("No <chunk> blocks parsed for a sub-PDF", "PDF")
+      }
🧹 Nitpick comments (6)
server/api/files.ts (2)

31-33: Unused DOWNLOADS_DIR setup.

You create downloads/ but never use it in this module. Remove to avoid side effects.

Apply:

-const DOWNLOADS_DIR = join(process.cwd(), "downloads")
-await mkdir(DOWNLOADS_DIR, { recursive: true })
+// (removed unused downloads dir init)

223-246: Image save: extension may be empty → serving fails.

When filename lacks an extension, fullFileName becomes "0.", but the server looks for specific extensions. Fall back to MIME.

Apply:

-      const ext = file.name.split(".").pop()?.toLowerCase() || ""
+      const nameExt = file.name.split(".").pop()?.toLowerCase() || ""
+      const mimeExt = (() => {
+        const t = (file.type || "").toLowerCase()
+        if (t.includes("jpeg")) return "jpg"
+        if (t.includes("png")) return "png"
+        if (t.includes("webp")) return "webp"
+        if (t.includes("gif")) return "gif"
+        return ""
+      })()
+      const ext = nameExt || mimeExt
       const fullFileName = `${0}.${ext}`
server/integrations/dataSource/errors.ts (1)

16-27: Optional: differentiate error codes.

Consider a distinct code for PdfPageTooLargeError to allow tailored UI hints.

Apply:

-export class FileValidationError extends DataSourceError {
-  constructor(message: string, userMessage?: string) {
-    super(message, "FILE_VALIDATION_ERROR", userMessage)
-  }
-}
+export class FileValidationError extends DataSourceError {
+  constructor(message: string, userMessage?: string, code = "FILE_VALIDATION_ERROR") {
+    super(message, code, userMessage)
+  }
+}
@@
-export class PdfPageTooLargeError extends FileValidationError {
+export class PdfPageTooLargeError extends FileValidationError {
   constructor(pageNumber: number, maxSizeMB: number, actualBytes: number) {
@@
-    super(message, userMessage)
+    super(message, userMessage, "PDF_PAGE_TOO_LARGE")
   }
 }
server/lib/chunkPdfWithGemini.ts (3)

122-133: Use provided logger instead of console.log.

console.log in server code bypasses structured logging.

Apply:

-    if (logger) {
-     console.log(
-        {
+    if (logger) {
+      logger.info(
+        {
           startPage: start + 1,
           endPage: start + count,
           pagesInChunk: count,
           subSizeBytes: bytes.length,
           maxBytes,
         },
         "Prepared sub-PDF chunk",
       )
     }

22-25: Expose and document size limits.

Consider exporting INLINE_MAX_BYTES and MAX_SUPPORTED_BYTES or moving to config to keep prompt and splitter aligned.


178-187: Optional: move CHUNKING_PROMPT to systemInstruction.

Some Vertex models handle instruction better via systemInstruction; reduces prompt repetition in user parts.

Apply:

-  const messageParts: any[] = [{ text: CHUNKING_PROMPT }]
+  const messageParts: any[] = []
@@
-  const response = await model.generateContent({
-    contents: [
-      {
-        role: "user",
-        parts: messageParts,
-      },
-    ],
-  })
+  const response = await model.generateContent({
+    systemInstruction: { parts: [{ text: CHUNKING_PROMPT }] },
+    contents: [{ role: "user", parts: messageParts }],
+  })
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 70a95db and d42feac.

📒 Files selected for processing (5)
  • server/ai/prompts.ts (1 hunks)
  • server/api/dataSource.ts (2 hunks)
  • server/api/files.ts (2 hunks)
  • server/integrations/dataSource/errors.ts (1 hunks)
  • server/lib/chunkPdfWithGemini.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
server/api/files.ts (1)
server/integrations/dataSource/errors.ts (1)
  • isDataSourceError (137-139)
server/api/dataSource.ts (1)
server/integrations/dataSource/errors.ts (1)
  • isDataSourceError (137-139)
server/lib/chunkPdfWithGemini.ts (3)
server/logger/index.ts (2)
  • getLogger (36-93)
  • Subsystem (15-15)
server/integrations/dataSource/errors.ts (2)
  • PdfPageTooLargeError (30-37)
  • FileSizeExceededError (22-27)
server/ai/prompts.ts (1)
  • CHUNKING_PROMPT (2421-2461)
🔇 Additional comments (5)
server/ai/prompts.ts (1)

2417-2418: LGTM: user-friendly closing sentence.

Safe, non-behavioral change.

server/api/dataSource.ts (2)

201-204: Good: preserve DataSourceError for UI.

This keeps error.userMessage intact for client handling.


116-117: Bun runtime required — enforce Bun or add fallbacks. Multiple server modules call Bun.* (Bun.write, Bun.file, Bun.gzipSync, Bun.gunzipSync, Bun.serve, Bun.spawn). Confirm dev/CI/prod run on Bun or replace/guard these calls. Notable locations: server/api/dataSource.ts (lines 116–117), server/server.ts, server/sync-server.ts, server/api/files.ts, server/api/workflow.ts, server/integrations/google/*.ts, server/utils/compression.ts.

server/api/files.ts (1)

142-148: Correct: user-facing error message for DataSourceError.

Using error.userMessage improves UX without leaking internals.

server/integrations/dataSource/errors.ts (1)

29-37: PdfPageTooLargeError: clear messaging and reuse of validation class.

Good addition; pairs well with upstream page splitting logic.

Himanshvarma
Himanshvarma previously approved these changes Sep 19, 2025
junaid-shirur
junaid-shirur previously approved these changes Sep 19, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
server/lib/chunkPdfWithGemini.ts (2)

154-166: Fix project/location resolution; current code ignores opts and common GCP envs.

This will throw in environments that only set GOOGLE_CLOUD_PROJECT/GCLOUD_PROJECT, and it prevents per-call overrides. Also update the error message to reflect the actual sources checked.

Apply:

-  const projectId =
-    process.env.VERTEX_PROJECT_ID ||
-    ""
-
-  const location =
-    process.env.VERTEX_REGION ||
-    "us-central1"
+  const projectId =
+    opts.projectId ||
+    process.env.VERTEX_PROJECT_ID ||
+    process.env.GOOGLE_CLOUD_PROJECT ||
+    process.env.GCLOUD_PROJECT ||
+    process.env.GCP_PROJECT_ID ||
+    ""
+
+  const location =
+    opts.location ||
+    process.env.VERTEX_REGION ||
+    "us-central1"
@@
-  if (!projectId) {
-    throw new Error(
-      "Missing GCP project ID. Set VERTEX_PROJECT_ID or GOOGLE_CLOUD_PROJECT (or GCLOUD_PROJECT/GCP_PROJECT_ID) or pass options.projectId.",
-    )
-  }
+  if (!projectId) {
+    throw new Error(
+      "Missing GCP project ID. Set one of: opts.projectId, VERTEX_PROJECT_ID, GOOGLE_CLOUD_PROJECT, GCLOUD_PROJECT, or GCP_PROJECT_ID.",
+    )
+  }

271-293: Detect and surface empty output; include docid in logs for correlation.

Callers may assume non-empty chunks; today this fails silently. At minimum, log loudly; optionally throw behind a flag.

   if (data.length <= INLINE_MAX_BYTES) {
     // Single call path
-    Logger.info("Sending single PDF to Gemini , no splitting needed")
+    Logger.info({ docid }, "Sending single PDF to Gemini; no splitting needed")
     const raw = await extractSemanticChunksFromPdf(data, opts as ChunkPdfOptions)
     const chunks = parseGeminiChunkBlocks(raw)
+    if (chunks.length === 0) {
+      Logger.warn({ docid }, "No <chunk> blocks parsed from Gemini output (single part)")
+    }
     for (const c of chunks) {
       text_chunks.push(c)
       text_chunk_pos.push(globalSeq++)
     }
   } else {
@@
     for (let i = 0; i < subPdfs.length; i++) {
       const part = subPdfs[i]
-      Logger.info({ index: i + 1, bytes: part.length }, "Sending sub-PDF to Gemini")
+      Logger.info({ docid, index: i + 1, bytes: part.length }, "Sending sub-PDF to Gemini")
       const raw = await extractSemanticChunksFromPdf(part, opts as ChunkPdfOptions)
       const chunks = parseGeminiChunkBlocks(raw)
+      if (chunks.length === 0) {
+        Logger.warn({ docid, index: i + 1 }, "No <chunk> blocks parsed from Gemini output (sub part)")
+      }
       for (const c of chunks) {
         text_chunks.push(c)
         text_chunk_pos.push(globalSeq++)
       }
     }

Option (if stricter behavior desired): throw when all parts produce zero chunks and handle upstream.

🧹 Nitpick comments (2)
server/lib/chunkPdfWithGemini.ts (2)

122-133: Use provided logger instead of console.log in splitter.

You accept a logger but don’t use it; this bypasses structured logs in prod.

-    if (logger) {
-     console.log(
-        {
-          startPage: start + 1,
-          endPage: start + count,
-          pagesInChunk: count,
-          subSizeBytes: bytes.length,
-          maxBytes,
-        },
-        "Prepared sub-PDF chunk",
-      )
-    }
+    logger?.info(
+      {
+        startPage: start + 1,
+        endPage: start + count,
+        pagesInChunk: count,
+        subSizeBytes: bytes.length,
+        maxBytes,
+      },
+      "Prepared sub-PDF chunk",
+    )

209-219: Harden response parsing; log non-STOP finish reasons; avoid brittle any-casts.

Prevents silent empties and helps triage safety blocks and quota issues.

-  // Parse and return raw text
-  const candidates = result.response?.candidates ?? []
-  const parts = candidates[0]?.content?.parts ?? []
-  const text = parts
-    .filter((p: any) => typeof p?.text === "string")
-    .map((p: any) => p.text as string)
-    .join("")
-    .trim()
-
-  return text
+  // Parse and return raw text (defensive)
+  const resp = result?.response
+  const candidates = Array.isArray(resp?.candidates) ? resp!.candidates : []
+  if (candidates.length === 0) {
+    Logger.warn(
+      { promptFeedback: resp?.promptFeedback },
+      "Gemini returned no candidates",
+    )
+    return ""
+  }
+  const finishReason = (candidates[0] as any)?.finishReason
+  if (finishReason && finishReason !== "STOP") {
+    Logger.warn({ finishReason }, "Gemini finishReason not STOP")
+  }
+  const parts = Array.isArray((candidates[0] as any)?.content?.parts)
+    ? (candidates[0] as any).content.parts
+    : []
+  const text = parts
+    .map((p: any) => (typeof p?.text === "string" ? p.text : ""))
+    .join("")
+    .trim()
+  return text
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d42feac and bb78cd5.

📒 Files selected for processing (2)
  • server/lib/chunkPdfWithGemini.ts (1 hunks)
  • server/scripts/testPdfDirect.ts (0 hunks)
💤 Files with no reviewable changes (1)
  • server/scripts/testPdfDirect.ts
🧰 Additional context used
🧬 Code graph analysis (1)
server/lib/chunkPdfWithGemini.ts (3)
server/logger/index.ts (2)
  • getLogger (36-93)
  • Subsystem (15-15)
server/integrations/dataSource/errors.ts (2)
  • PdfPageTooLargeError (30-37)
  • FileSizeExceededError (22-27)
server/ai/prompts.ts (1)
  • CHUNKING_PROMPT (2421-2461)
🔇 Additional comments (2)
server/lib/chunkPdfWithGemini.ts (2)

40-104: Splitter search strategy is solid; good balance of saves vs. precision.

Exponential probe + binary search is appropriate here; single-page overflow surfaced via PdfPageTooLargeError is correct.

If you see perf spikes on large-doc workloads, we can add a tiny cache Map keyed by (start,count) to reuse saveRange results during the binary search.


13-20: Remove unused gcsUri from ChunkPdfOptions

The only match is the declaration in server/lib/chunkPdfWithGemini.ts — drop the field (or implement the GCS flow for PDFs ≥ 15MB). Verify this isn't part of any public API before removing.

 export type ChunkPdfOptions = {
   projectId?: string
   location?: string
   model?: string
-  gcsUri?: string // Optional GCS URI to use for PDFs >= 15MB
   maxOutputTokens?: number
   temperature?: number
 }

@junaid-shirur junaid-shirur merged commit 8ddd939 into main Sep 19, 2025
4 checks passed
@junaid-shirur junaid-shirur deleted the feat/pdfLayout branch September 19, 2025 11:19
Aayushjshah added a commit that referenced this pull request Sep 19, 2025
* pdfChunks file updated for better text processing

* comment fixes

* comment fixes

* comment fixes

* comment fixes

* comment fixes

* comment fixes

* removing old logic

* removing old logic

* Workflow route issue fix and removed logs (#847)

* Fixed file content retrieval: used wrong format for names (#851)

* Workflow Refresh Issue fixed (#860)

* fix/file-process: Now KB files images are processed (#861)

* feat(doc-preview): Added support for viewing xlxs,xls,csv  (#834) (#865)

* Added support for viewing xlxs,xls,csv and title generation based on user forst query and assistant response

* added unititled as default chat title

* fixed generateTitle function

* removed the title generation changes

---------

Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local>

* fix:model-selection (#866)

* fix(citation preview): added back button in citation preview (#846) (#868)

* added back button in citation preview

* comments resolved

* fix:temperorily disabled chunk citations for pdfs (#848) (#870)

* temperorily disabled chunk citations for pdfs

* resolved comments

* fix:title-generation (#852) (#872)

* fix:title-generation

* made the suggested changes

* made the suggested changes

* made the suggested changes

---------

Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local>

* User Management page (#631) (#871)

* testing

* Fix/app exc (#877)

* fix:app exclusion (#855)

* using sync job existance check ,modify all sources

* resolved issue

* resolved comments

* add routeTree and format code

* fix:conditional syncjob check  (#858)

* conditional syncjob check based on local or production

* resolved comments

* fix(docker-compose): add vespa‑deploy service and relax Microsoft credential requirement (#876) (#878)

- Add new vespa‑deploy service built from Dockerfile that copies server/vespa code and runs deploy‑docker.sh
- Mount server/vespa directory into existing Vespa containers
- Update compose commands to always use `docker‑compose … up … --build` so images are rebuilt before start
- Relax Microsoft integration validation: missing MICROSOFT_CLIENT_ID/SECRET now logs a warning and skips sync jobs instead of failing

* fix:replace z.instanceof(File) with z.any() for better compatibility (#798) (#883)

* feat(docker): add GPU support and configurable metrics port (#884) (#885)

- Bump Vespa base image to 8.514.24
- Split DNF install into separate steps and install CUDA libraries
- Pin onnxruntime-cu118 to 1.20.1
- Mount Prometheus config as a template and inject METRICS_PORT via envsubst (default 3001)
- Expose METRICS_PORT as an environment variable to the container
- Update deployment script to accept optional VESPA_CLI_PATH and remove hard‑coded host/port
- Add validation override for schema‑removal

* Adding gemini for pdf processing

* chore: merge main into feat/pdfLayout

* chore: merge main into feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

---------

Co-authored-by: avirupsinha12 <avirup.sinha@juspay.in>
Co-authored-by: Sahil Kumar <119723019+SahilKumar000@users.noreply.github.com>
Co-authored-by: Ravishekhar Yadav <122727655+Ravishekhar7870@users.noreply.github.com>
Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local>
Co-authored-by: Mayank Bansal <mayankbansal51351@gmail.com>
Co-authored-by: Rahul Kumar <118290059+rahul1841@users.noreply.github.com>
Co-authored-by: Himansh varma <126441540+Himanshvarma@users.noreply.github.com>
Co-authored-by: Nasim Sheikh <nasimsheikh688@gmail.com>
Co-authored-by: Shivam Ashtikar <shivam.ashtikar@juspay.in>
shivamashtikar added a commit that referenced this pull request Sep 19, 2025
* pdfChunks file updated for better text processing

* comment fixes

* comment fixes

* comment fixes

* comment fixes

* comment fixes

* comment fixes

* removing old logic

* removing old logic

* Workflow route issue fix and removed logs (#847)

* Fixed file content retrieval: used wrong format for names (#851)

* Workflow Refresh Issue fixed (#860)

* fix/file-process: Now KB files images are processed (#861)

* feat(doc-preview): Added support for viewing xlxs,xls,csv  (#834) (#865)

* Added support for viewing xlxs,xls,csv and title generation based on user forst query and assistant response

* added unititled as default chat title

* fixed generateTitle function

* removed the title generation changes

---------



* fix:model-selection (#866)

* fix(citation preview): added back button in citation preview (#846) (#868)

* added back button in citation preview

* comments resolved

* fix:temperorily disabled chunk citations for pdfs (#848) (#870)

* temperorily disabled chunk citations for pdfs

* resolved comments

* fix:title-generation (#852) (#872)

* fix:title-generation

* made the suggested changes

* made the suggested changes

* made the suggested changes

---------



* User Management page (#631) (#871)

* testing

* Fix/app exc (#877)

* fix:app exclusion (#855)

* using sync job existance check ,modify all sources

* resolved issue

* resolved comments

* add routeTree and format code

* fix:conditional syncjob check  (#858)

* conditional syncjob check based on local or production

* resolved comments

* fix(docker-compose): add vespa‑deploy service and relax Microsoft credential requirement (#876) (#878)

- Add new vespa‑deploy service built from Dockerfile that copies server/vespa code and runs deploy‑docker.sh
- Mount server/vespa directory into existing Vespa containers
- Update compose commands to always use `docker‑compose … up … --build` so images are rebuilt before start
- Relax Microsoft integration validation: missing MICROSOFT_CLIENT_ID/SECRET now logs a warning and skips sync jobs instead of failing

* fix:replace z.instanceof(File) with z.any() for better compatibility (#798) (#883)

* feat(docker): add GPU support and configurable metrics port (#884) (#885)

- Bump Vespa base image to 8.514.24
- Split DNF install into separate steps and install CUDA libraries
- Pin onnxruntime-cu118 to 1.20.1
- Mount Prometheus config as a template and inject METRICS_PORT via envsubst (default 3001)
- Expose METRICS_PORT as an environment variable to the container
- Update deployment script to accept optional VESPA_CLI_PATH and remove hard‑coded host/port
- Add validation override for schema‑removal

* Adding gemini for pdf processing

* chore: merge main into feat/pdfLayout

* chore: merge main into feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

* chore: splitting of large pdfs feat/pdfLayout

---------

Co-authored-by: avirupsinha12 <avirup.sinha@juspay.in>
Co-authored-by: Sahil Kumar <119723019+SahilKumar000@users.noreply.github.com>
Co-authored-by: Ravishekhar Yadav <122727655+Ravishekhar7870@users.noreply.github.com>
Co-authored-by: Ravishekhar Yadav <ravishekhar.yadav@Ravishekhar-Yadav-GK4HXKX6DQ.local>
Co-authored-by: Mayank Bansal <mayankbansal51351@gmail.com>
Co-authored-by: Rahul Kumar <118290059+rahul1841@users.noreply.github.com>
Co-authored-by: Himansh varma <126441540+Himanshvarma@users.noreply.github.com>
Co-authored-by: Nasim Sheikh <nasimsheikh688@gmail.com>
Co-authored-by: Shivam Ashtikar <shivam.ashtikar@juspay.in>
This was referenced Sep 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants