An Advanced Document-Intelligence Engine built for AI Engineers by Muhammad Wasif Anwar
โก โTurning raw pixels into readable intelligence.โ
NeuroScan-AI is a next-generation document-understanding system that bridges the gap between computer vision, OCR intelligence, and AI-driven image enhancement.
Designed with precision and modularity, it empowers engineers, researchers, and enterprises to transform raw document captures into structured, searchable, and professional-grade digital assets.
- ๐น End-to-End Intelligence: Automates the full workflow โ Scan โ Enhance โ OCR โ Export.
- ๐น AI-Enhanced Vision: Utilizes adaptive illumination correction, geometric perspective repair, and unsharp filtering for maximum OCR clarity.
- ๐น Seamless Dual Interface: Offers both an interactive Streamlit UI and a robust FastAPI backend for integration into enterprise systems.
- ๐น Containerized Precision: Every component runs in an isolated, reproducible Docker environment, ensuring identical performance across setups.
- ๐น Engineer-First Architecture: Built with clear module separation, making pipelines reusable and extensible for any vision or OCR-based AI project.
๐ง โTo redefine how machines perceive documents โ not as flat images, but as structured, interpretable intelligence.โ
NeuroScan-AI isnโt just a scanner โ itโs a document perception engine.
Every pixel processed through it passes through a scientifically tuned visual pipeline โ enhancing edges, reducing noise, balancing illumination, and extracting text with the same fidelity as a human eye trained on detail.
| ๐ค User Type | ๐ก Purpose |
|---|---|
| AI Engineers | Integrate OCR & enhancement modules in ML pipelines |
| Developers | Deploy production-ready document processing APIs |
| Researchers | Digitize academic papers or experiment datasets |
| Organizations | Automate document archiving with precision scanning |
| Data Scientists | Build structured datasets from unstructured inputs |
NeuroScan-AI is not just another scanner โ itโs an intelligent document-perception system engineered to make computers truly understand paper.
Every scan, every pixel, every curve of a letter is treated as meaningful data โ processed through a chain of AI-powered transformations that make even imperfect captures look professionally digitized.
Built from the ground up using OpenCV, scikit-image, and Tesseract OCR, it merges the precision of computer vision with the adaptability of modern AI engineering.
Whether youโre scanning invoices, legal documents, research papers, or handwritten notes, NeuroScan-AI intelligently detects the document frame, corrects distortions, balances lighting, and extracts clean, searchable text.
What sets it apart is its dual-interface ecosystem:
- ๐ฅ๏ธ A sleek Streamlit UI for instant, no-code scanning and previewing.
- โ๏ธ A powerful FastAPI backend for developers who want to integrate document intelligence into their own platforms.
All of this is containerized using Docker, ensuring you can deploy it anywhere โ from your local workstation to a full enterprise cloud โ with identical performance and zero dependency hassle.
The result is a research-grade yet production-ready solution that feels effortless to use but is deeply engineered underneath.
A three-stage perception pipeline โ Scan โ Enhance โ OCR โ built for precision and repeatability.
Each stage is modular, allowing you to use them independently or together within your own projects.
Automatically identifies document edges and corrects geometry in real-time.
No matter the angle or lighting of your photo, NeuroScan-AI re-shapes it into a perfectly rectangular, printable scan.
Uneven lighting, glare, and shadowed edges are intelligently normalized using morphological and contrast-based enhancement โ giving you crisp, uniformly lit outputs suitable for OCR and archiving.
Fine-tuned filtering boosts clarity and edge definition, ensuring even faint or faded text becomes legible without introducing noise or over-sharpening artifacts.
Beyond simple text extraction โ NeuroScan-AI embeds recognized text directly into PDFs, making them fully searchable while preserving the original layout and structure.
Upload full documents โ the system automatically splits, enhances, and runs OCR on every page, stitching them back into a seamless, text-searchable PDF file.
Includes modular utility classes for geometry processing, image I/O, and PDF handling, allowing researchers and developers to extend functionality with ease.
Each core component is unit-tested for consistency, ensuring reproducible results whether you run it once or a thousand times in batch mode.
Fully containerized with environment isolation, making setup frictionless and deployment instantaneous โ from personal laptops to enterprise servers.
Use it visually through Streamlit or programmatically through FastAPI endpoints.
Scan, enhance, and extract text either interactively or as part of your automated pipelines.
โจ In essence, NeuroScan-AI turns imperfect, real-world documents into structured, searchable intelligence โ bringing clarity where there was once only clutter.
NeuroScan-AI follows a clean, layered architecture that separates experience, orchestration, pipelines, and utilities for reliability and easy extension. The diagram below shows how a request flows from the Streamlit UI (or API client) through the Orchestrator into specialized Scan / Enhance / OCR stages, with shared Utils and Config/Schemas ensuring consistency โ all inside a Docker runtime.
- ๐ Streamlit UI (Frontend) โ Drag-and-drop uploads, side-panel controls (OCR lang/OEM/PSM), live previews (original vs scanned), and one-click downloads (searchable PDF / PNG ZIP).
- ๐ง Orchestrator (Backend Library) โ Validates inputs, loads config, and dispatches each page to the right pipeline stage: Scan โ Enhance โ OCR; aggregates multi-page results.
- ๐ฌ Pipelines
- ๐ Scan Pipeline โ Edge detection, 4-point perspective transform, auto-deskew fallback when no quad is found.
- โจ Enhance Pipeline โ Illumination correction (background removal), CLAHE, adaptive binarization, and unsharp mask for OCR clarity.
- ๐ OCR Pipeline โ Tesseract OCR with configurable lang / OEM / PSM and searchable PDF generation.
- ๐งฐ Utility Layer โ Reusable helpers for:
- ๐ Geometry (order points, perspective transform, deskew)
- ๐ผ Image I/O (safe decode/encode, RGBโBGR)
- ๐ PDF Handling (pdf2image conversion, multi-page orchestration)
- โ๏ธ Configuration & Schemas โ Centralized settings (env-driven) and Pydantic models that enforce types/validation across UI and API.
- ๐ณ Docker Container โ Bakes the exact runtime (Tesseract, Poppler, OpenCV libs) for reproducible results on any machine or cloud node.
- Upload: User drops an image/PDF in Streamlit (or sends to FastAPI).
- Split/Decode: PDFs become per-page RGB images; images are decoded safely.
- Scan: Document borders are detected; perspective corrected; deskew fallback if needed.
- Enhance: Illumination normalized โ contrast equalized โ adaptive thresholding โ unsharp mask.
- OCR: Text recognized (multi-language ready); searchable PDF pages produced.
- Aggregate: Text concatenated; pages merged; artifacts packaged (PDF/ZIP).
- Deliver: Results streamed back to UI/API with sizes, previews, and downloads.
- Modularity โ swap or upgrade any stage (e.g., CRAFT/EAST for text detection) without touching others.
- Reproducibility โ deterministic behavior via config + Docker.
- Performance โ lightweight CV ops with smart fallbacks; page-wise parallelization is easy to add.
- Integrations โ clear API boundaries make it trivial to plug into CRMs, EHRs, RPA, or MLOps jobs.
Click to view directory tree
NeuroScan-AI/ โโ app/ โ โโ pipelines/ โ โ โโ scan.py โ โ โโ enhance.py โ โ โโ ocr.py โ โโ utils/ โ โ โโ geometry.py โ โ โโ image_io.py โ โ โโ pdf.py โ โโ config.py โ โโ schemas.py โ โโ main.py โโ web/ โ โโ streamlit_app.py โโ reports/ โ โโ architecture.png โโ tests/ โ โโ test_geometry.py โ โโ test_scan_pipeline.py โโ requirements.txt โโ Dockerfile โโ .env.example โโ README.md
NeuroScan-AI is powered by a carefully engineered technology stackโblending state-of-the-art computer vision, OCR, and API frameworks to deliver precision, speed, and modular scalability.
Every component has been hand-picked to ensure that the system performs flawlessly across environments, whether youโre an AI researcher or a production engineer deploying in the cloud.
| โ๏ธ Layer | ๐ง Technology | ๐ฏ Purpose & Role |
|---|---|---|
| ๐จ Frontend | Streamlit | A lightweight, elegant interface for real-time document uploads, visualization, and download. Enables an intuitive experience for non-technical users. |
| ๐ง Backend | FastAPI | The heart of NeuroScan-AI โ orchestrates scanning, enhancement, and OCR pipelines while exposing clean REST endpoints. |
| ๐ OCR Engine | Tesseract OCR | Responsible for the actual text extraction and searchable PDF generation, supporting multilingual configurations and layout-aware recognition. |
| ๐ Vision Core | OpenCV, NumPy, scikit-image | Handles image processing: perspective correction, contrast balancing, illumination normalization, and deskewing. |
| ๐ File Management | pdf2image, Pillow | Manages high-resolution PDF page conversion, encoding, and seamless I/O operations. |
| ๐งฑ Containerization | Docker | Provides reproducible runtime with all dependencies โ ensuring identical OCR and image-processing results on any system. |
| ๐งช Testing Framework | pytest | Guarantees reliability and reproducibility across modules with automated unit and pipeline tests. |
๐ก Every tool was chosen not for trend โ but for its precision, maturity, and compatibility in building intelligent vision pipelines.
Setting up NeuroScan-AI takes only a few minutes โ whether youโre running it locally for research or deploying it as a containerized microservice.
git clone https://github.com/mwasifanwar/NeuroScan-AI.git cd NeuroScan-AI
Before running NeuroScan-AI, install all the required Python dependencies.
These include powerful libraries such as OpenCV, Tesseract bindings, and FastAPI that together drive the document processing pipeline.
pip install -r requirements.txt
๐ก This command installs everything โ from image enhancement and OCR components to backend orchestration frameworks, ensuring that every module is ready to perform seamlessly.
Before running NeuroScan-AI, youโll need to configure the environment so that the OCR and enhancement pipelines function correctly across all systems.
This setup ensures that Tesseract OCR knows where to locate its executable file, and that the pipeline uses the correct OCR settings.
Start by copying the example configuration file into your working directory:
cp .env.example .env
โ๏ธ 2๏ธโฃ Edit the .env File
Open the .env file and configure the following settings according to your system:
TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe OCR_LANG=eng OCR_OEM=3 OCR_PSM=6
Explanation of Parameters:
๐งญ TESSERACT_CMD โ The path to your local Tesseract executable.
Required for Windows users (for Linux/macOS, itโs usually auto-detected).
๐ OCR_LANG โ Language(s) for text extraction. Default: English (eng).
๐งฎ OCR_OEM โ OCR Engine Mode: 0 โ Legacy Engine 1 โ Neural Nets LSTM Engine 2 โ Combined Legacy + LSTM 3 โ Default Auto-Select (Recommended) ๐ OCR_PSM โ Page Segmentation Mode (layout assumption). Common values: 3 (auto layout), 6 (single uniform block), 11 (sparse text).
๐ง Tip
You can enable multilingual OCR easily by combining languages: OCR_LANG=eng+deu+fra
โก๏ธ This example enables English, German, and French recognition simultaneously โ perfect for international document scanning and multilingual archives.
NeuroScan-AI can operate in two powerful modes โ
a Visual Mode for intuitive, real-time interaction and a Headless Mode for integration into large-scale automation pipelines or backend systems.
Choose the mode that best fits your workflow.
Launch the Streamlit web interface for real-time document scanning, enhancement, and OCR execution.
streamlit run web/streamlit_app.py Once started, the app opens automatically in your browser, providing an interactive experience that requires no coding.
โจ Features at a Glance ๐งฉ Feature ๐ก Description ๐ Drag & Drop Uploads Upload images or PDFs directly from your desktop into the interface. ๐งฉ Instant Preview Compare Original vs Enhanced versions side-by-side with smooth, live rendering. ๐ง Dynamic OCR Toggle Enable or disable text extraction instantly without restarting the process. ๐ฅ One-Click Downloads Export processed outputs as searchable PDFs or enhanced PNG ZIPs.
๐ก Perfect for students, researchers, developers, and data engineers who want a hands-on, visual experience without writing a single line of code.
NeuroScan-AI provides a clean, well-documented REST API built with FastAPI, enabling developers to seamlessly integrate document scanning, enhancement, and OCR features into their own systems, workflows, or applications.
Each endpoint is lightweight, asynchronous, and fully compatible with JSON responses, allowing smooth integration with any frontend, automation tool, or enterprise backend.
| ๐ Endpoint | ๐ง Method | ๐ Description |
|---|---|---|
/health |
GET |
Performs a quick API health check โ verifies that the service is running and reachable. |
/scan |
POST |
Accepts an image or PDF upload, performs the complete pipeline (Scan โ Enhance โ OCR), and returns structured text output along with metadata such as image dimensions and processing time. |
/scan/pdf |
POST |
Converts uploaded documents into fully searchable PDFs, embedding OCR text layers directly into the file while preserving the original visual layout. |
curl -X POST "http://localhost:8000/scan"
-F "file=@document.jpg"
-F "ocr=true"
-F "lang=eng"
โ Response: Returns a JSON object containing: Extracted text (OCR output) Metadata (dimensions, page count, execution time)
Optional base64 or file download links (depending on configuration)
Muhammad Wasif
AI/ML Developer @ Effixly AI
โญ *If perception is the first step toward intelligence, NeuroScan-AI is where machines begin to truly see.*
