Skip to content

Migrate to a Compiled Language (Rust or Go) for Performance & Memory Efficiency #10

@rithulkamesh

Description

@rithulkamesh

The current implementation of DocProc is written in Python. While Python enables rapid prototyping and simplicity, it's becoming a bottleneck in production workloads due to:

  • High memory consumption during large document parsing
  • Slower execution times for concurrent processing
  • Difficulty in scaling under load without complex optimization

Proposal:

Migrate critical parts of the DocProc backend to a compiled, memory-safe systems language — Rust or Go.


Benefits:

  • Improved performance: Faster execution, especially for CPU-intensive tasks (e.g., parsing, tokenization)
  • 🧠 Lower memory usage: Avoids Python’s GC overhead and object memory bloat
  • 🔒 Safety: Rust provides memory safety guarantees without GC
  • 📦 Deployment: Single static binaries, easier CI/CD and containerization
  • 🤝 Interoperability: Can be embedded into existing Python pipeline via FFI if needed (e.g. PyO3, cgo)

Tasks:

  • Identify performance-critical modules in the current pipeline
  • Choose target language (Rust vs Go) based on I/O needs, team comfort, and ecosystem
  • Prototype one processing stage (e.g., text extraction or format parsing)
  • Benchmark vs Python equivalent
  • Plan and execute incremental migration (module-by-module)
  • Update build and deployment workflows to include compiled binaries

Considerations:

  • Maintain feature parity with the Python version
  • Ensure cross-platform compatibility (Linux, macOS)
  • Allow fallback to Python for less performance-critical tasks (if needed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions