Skip to content

NeuroScan-AI is an advanced document-understanding engine built with modern computer vision and OCR pipelines. It performs smart perspective correction, illumination normalization, and adaptive enhancement to transform raw camera captures into clean, searchable, professional-grade documents.

Notifications You must be signed in to change notification settings

mwasifanwar/NeuroScan-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

6 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  NeuroScan-AI

An Advanced Document-Intelligence Engine built for AI Engineers by Muhammad Wasif Anwar


โšก โ€œTurning raw pixels into readable intelligence.โ€

NeuroScan-AI is a next-generation document-understanding system that bridges the gap between computer vision, OCR intelligence, and AI-driven image enhancement.
Designed with precision and modularity, it empowers engineers, researchers, and enterprises to transform raw document captures into structured, searchable, and professional-grade digital assets.


๐Ÿงฌ What Makes It Different

  • ๐Ÿ”น End-to-End Intelligence: Automates the full workflow โ€” Scan โ†’ Enhance โ†’ OCR โ†’ Export.
  • ๐Ÿ”น AI-Enhanced Vision: Utilizes adaptive illumination correction, geometric perspective repair, and unsharp filtering for maximum OCR clarity.
  • ๐Ÿ”น Seamless Dual Interface: Offers both an interactive Streamlit UI and a robust FastAPI backend for integration into enterprise systems.
  • ๐Ÿ”น Containerized Precision: Every component runs in an isolated, reproducible Docker environment, ensuring identical performance across setups.
  • ๐Ÿ”น Engineer-First Architecture: Built with clear module separation, making pipelines reusable and extensible for any vision or OCR-based AI project.

๐Ÿ—๏ธ Core Mission

๐Ÿง  โ€œTo redefine how machines perceive documents โ€” not as flat images, but as structured, interpretable intelligence.โ€

NeuroScan-AI isnโ€™t just a scanner โ€” itโ€™s a document perception engine.
Every pixel processed through it passes through a scientifically tuned visual pipeline โ€” enhancing edges, reducing noise, balancing illumination, and extracting text with the same fidelity as a human eye trained on detail.


๐Ÿงฉ Crafted For

๐Ÿ‘ค User Type ๐Ÿ’ก Purpose
AI Engineers Integrate OCR & enhancement modules in ML pipelines
Developers Deploy production-ready document processing APIs
Researchers Digitize academic papers or experiment datasets
Organizations Automate document archiving with precision scanning
Data Scientists Build structured datasets from unstructured inputs

๐Ÿงฉ Overview

NeuroScan-AI is not just another scanner โ€” itโ€™s an intelligent document-perception system engineered to make computers truly understand paper.
Every scan, every pixel, every curve of a letter is treated as meaningful data โ€” processed through a chain of AI-powered transformations that make even imperfect captures look professionally digitized.

Built from the ground up using OpenCV, scikit-image, and Tesseract OCR, it merges the precision of computer vision with the adaptability of modern AI engineering.
Whether youโ€™re scanning invoices, legal documents, research papers, or handwritten notes, NeuroScan-AI intelligently detects the document frame, corrects distortions, balances lighting, and extracts clean, searchable text.

What sets it apart is its dual-interface ecosystem:

  • ๐Ÿ–ฅ๏ธ A sleek Streamlit UI for instant, no-code scanning and previewing.
  • โš™๏ธ A powerful FastAPI backend for developers who want to integrate document intelligence into their own platforms.

All of this is containerized using Docker, ensuring you can deploy it anywhere โ€” from your local workstation to a full enterprise cloud โ€” with identical performance and zero dependency hassle.
The result is a research-grade yet production-ready solution that feels effortless to use but is deeply engineered underneath.


โš™๏ธ Key Features

๐Ÿง  Intelligent Document Processing

A three-stage perception pipeline โ€” Scan โ†’ Enhance โ†’ OCR โ€” built for precision and repeatability.
Each stage is modular, allowing you to use them independently or together within your own projects.

๐Ÿ” Perspective Correction & Deskewing

Automatically identifies document edges and corrects geometry in real-time.
No matter the angle or lighting of your photo, NeuroScan-AI re-shapes it into a perfectly rectangular, printable scan.

๐ŸŒ— Adaptive Illumination Balancing

Uneven lighting, glare, and shadowed edges are intelligently normalized using morphological and contrast-based enhancement โ€” giving you crisp, uniformly lit outputs suitable for OCR and archiving.

๐Ÿ–ผ๏ธ Contrast & Unsharp Enhancement

Fine-tuned filtering boosts clarity and edge definition, ensuring even faint or faded text becomes legible without introducing noise or over-sharpening artifacts.

๐Ÿ“„ Searchable PDF Generation

Beyond simple text extraction โ€” NeuroScan-AI embeds recognized text directly into PDFs, making them fully searchable while preserving the original layout and structure.

๐Ÿงพ Multi-Page PDF Support

Upload full documents โ€” the system automatically splits, enhances, and runs OCR on every page, stitching them back into a seamless, text-searchable PDF file.

๐Ÿงฐ Powerful Utility Layer

Includes modular utility classes for geometry processing, image I/O, and PDF handling, allowing researchers and developers to extend functionality with ease.

๐Ÿงช Tested & Reliable

Each core component is unit-tested for consistency, ensuring reproducible results whether you run it once or a thousand times in batch mode.

๐Ÿณ Dockerized & Portable

Fully containerized with environment isolation, making setup frictionless and deployment instantaneous โ€” from personal laptops to enterprise servers.

โšก Dual-Mode Access

Use it visually through Streamlit or programmatically through FastAPI endpoints.
Scan, enhance, and extract text either interactively or as part of your automated pipelines.


โœจ In essence, NeuroScan-AI turns imperfect, real-world documents into structured, searchable intelligence โ€” bringing clarity where there was once only clutter.

๐Ÿงฑ System Architecture

NeuroScan-AI follows a clean, layered architecture that separates experience, orchestration, pipelines, and utilities for reliability and easy extension. The diagram below shows how a request flows from the Streamlit UI (or API client) through the Orchestrator into specialized Scan / Enhance / OCR stages, with shared Utils and Config/Schemas ensuring consistency โ€” all inside a Docker runtime.

NeuroScan-AI Architecture

๐Ÿงฉ Components at a Glance

  • ๐ŸŽ› Streamlit UI (Frontend) โ€” Drag-and-drop uploads, side-panel controls (OCR lang/OEM/PSM), live previews (original vs scanned), and one-click downloads (searchable PDF / PNG ZIP).
  • ๐Ÿง  Orchestrator (Backend Library) โ€” Validates inputs, loads config, and dispatches each page to the right pipeline stage: Scan โ†’ Enhance โ†’ OCR; aggregates multi-page results.
  • ๐Ÿ”ฌ Pipelines
    • ๐Ÿ“ Scan Pipeline โ€” Edge detection, 4-point perspective transform, auto-deskew fallback when no quad is found.
    • โœจ Enhance Pipeline โ€” Illumination correction (background removal), CLAHE, adaptive binarization, and unsharp mask for OCR clarity.
    • ๐Ÿ”Ž OCR Pipeline โ€” Tesseract OCR with configurable lang / OEM / PSM and searchable PDF generation.
  • ๐Ÿงฐ Utility Layer โ€” Reusable helpers for:
    • ๐Ÿ“ Geometry (order points, perspective transform, deskew)
    • ๐Ÿ–ผ Image I/O (safe decode/encode, RGBโ†”BGR)
    • ๐Ÿ“„ PDF Handling (pdf2image conversion, multi-page orchestration)
  • โš™๏ธ Configuration & Schemas โ€” Centralized settings (env-driven) and Pydantic models that enforce types/validation across UI and API.
  • ๐Ÿณ Docker Container โ€” Bakes the exact runtime (Tesseract, Poppler, OpenCV libs) for reproducible results on any machine or cloud node.

๐Ÿ” Request โ†’ Result (Data Flow)

  1. Upload: User drops an image/PDF in Streamlit (or sends to FastAPI).
  2. Split/Decode: PDFs become per-page RGB images; images are decoded safely.
  3. Scan: Document borders are detected; perspective corrected; deskew fallback if needed.
  4. Enhance: Illumination normalized โ†’ contrast equalized โ†’ adaptive thresholding โ†’ unsharp mask.
  5. OCR: Text recognized (multi-language ready); searchable PDF pages produced.
  6. Aggregate: Text concatenated; pages merged; artifacts packaged (PDF/ZIP).
  7. Deliver: Results streamed back to UI/API with sizes, previews, and downloads.

๐Ÿง  Why this layout?

  • Modularity โ†’ swap or upgrade any stage (e.g., CRAFT/EAST for text detection) without touching others.
  • Reproducibility โ†’ deterministic behavior via config + Docker.
  • Performance โ†’ lightweight CV ops with smart fallbacks; page-wise parallelization is easy to add.
  • Integrations โ†’ clear API boundaries make it trivial to plug into CRMs, EHRs, RPA, or MLOps jobs.

๐Ÿ—‚ Directory Structure

Click to view directory tree

NeuroScan-AI/ โ”œโ”€ app/ โ”‚ โ”œโ”€ pipelines/ โ”‚ โ”‚ โ”œโ”€ scan.py โ”‚ โ”‚ โ”œโ”€ enhance.py โ”‚ โ”‚ โ””โ”€ ocr.py โ”‚ โ”œโ”€ utils/ โ”‚ โ”‚ โ”œโ”€ geometry.py โ”‚ โ”‚ โ”œโ”€ image_io.py โ”‚ โ”‚ โ””โ”€ pdf.py โ”‚ โ”œโ”€ config.py โ”‚ โ”œโ”€ schemas.py โ”‚ โ””โ”€ main.py โ”œโ”€ web/ โ”‚ โ””โ”€ streamlit_app.py โ”œโ”€ reports/ โ”‚ โ””โ”€ architecture.png โ”œโ”€ tests/ โ”‚ โ”œโ”€ test_geometry.py โ”‚ โ””โ”€ test_scan_pipeline.py โ”œโ”€ requirements.txt โ”œโ”€ Dockerfile โ”œโ”€ .env.example โ””โ”€ README.md

๐Ÿงฐ Tech Stack & Installation Guide

NeuroScan-AI is powered by a carefully engineered technology stackโ€”blending state-of-the-art computer vision, OCR, and API frameworks to deliver precision, speed, and modular scalability.
Every component has been hand-picked to ensure that the system performs flawlessly across environments, whether youโ€™re an AI researcher or a production engineer deploying in the cloud.


๐Ÿงฉ Core Technologies

โš™๏ธ Layer ๐Ÿง  Technology ๐ŸŽฏ Purpose & Role
๐ŸŽจ Frontend Streamlit A lightweight, elegant interface for real-time document uploads, visualization, and download. Enables an intuitive experience for non-technical users.
๐Ÿง  Backend FastAPI The heart of NeuroScan-AI โ€” orchestrates scanning, enhancement, and OCR pipelines while exposing clean REST endpoints.
๐Ÿ” OCR Engine Tesseract OCR Responsible for the actual text extraction and searchable PDF generation, supporting multilingual configurations and layout-aware recognition.
๐Ÿ‘ Vision Core OpenCV, NumPy, scikit-image Handles image processing: perspective correction, contrast balancing, illumination normalization, and deskewing.
๐Ÿ“„ File Management pdf2image, Pillow Manages high-resolution PDF page conversion, encoding, and seamless I/O operations.
๐Ÿงฑ Containerization Docker Provides reproducible runtime with all dependencies โ€” ensuring identical OCR and image-processing results on any system.
๐Ÿงช Testing Framework pytest Guarantees reliability and reproducibility across modules with automated unit and pipeline tests.

๐Ÿ’ก Every tool was chosen not for trend โ€” but for its precision, maturity, and compatibility in building intelligent vision pipelines.


โšก Installation & Setup

Setting up NeuroScan-AI takes only a few minutes โ€” whether youโ€™re running it locally for research or deploying it as a containerized microservice.


๐Ÿงฉ Step 1: Clone the Repository

git clone https://github.com/mwasifanwar/NeuroScan-AI.git cd NeuroScan-AI

โš™๏ธ Step 2: Install Dependencies

Before running NeuroScan-AI, install all the required Python dependencies.
These include powerful libraries such as OpenCV, Tesseract bindings, and FastAPI that together drive the document processing pipeline.

pip install -r requirements.txt

๐Ÿ’ก This command installs everything โ€” from image enhancement and OCR components to backend orchestration frameworks, ensuring that every module is ready to perform seamlessly.

๐Ÿ”ง Step 3: Configure the Environment

Before running NeuroScan-AI, youโ€™ll need to configure the environment so that the OCR and enhancement pipelines function correctly across all systems.
This setup ensures that Tesseract OCR knows where to locate its executable file, and that the pipeline uses the correct OCR settings.


๐Ÿงฉ 1๏ธโƒฃ Copy the Sample Environment File

Start by copying the example configuration file into your working directory:

cp .env.example .env

โš™๏ธ 2๏ธโƒฃ Edit the .env File

Open the .env file and configure the following settings according to your system:

TESSERACT_CMD=C:\Program Files\Tesseract-OCR\tesseract.exe OCR_LANG=eng OCR_OEM=3 OCR_PSM=6

Explanation of Parameters:

๐Ÿงญ TESSERACT_CMD โ†’ The path to your local Tesseract executable.

Required for Windows users (for Linux/macOS, itโ€™s usually auto-detected).

๐ŸŒ OCR_LANG โ†’ Language(s) for text extraction. Default: English (eng).

๐Ÿงฎ OCR_OEM โ†’ OCR Engine Mode: 0 โ†’ Legacy Engine 1 โ†’ Neural Nets LSTM Engine 2 โ†’ Combined Legacy + LSTM 3 โ†’ Default Auto-Select (Recommended) ๐Ÿ“„ OCR_PSM โ†’ Page Segmentation Mode (layout assumption). Common values: 3 (auto layout), 6 (single uniform block), 11 (sparse text).

๐Ÿง  Tip

You can enable multilingual OCR easily by combining languages: OCR_LANG=eng+deu+fra

โžก๏ธ This example enables English, German, and French recognition simultaneously โ€” perfect for international document scanning and multilingual archives.

๐Ÿš€ Step 4: Run NeuroScan-AI

NeuroScan-AI can operate in two powerful modes โ€”
a Visual Mode for intuitive, real-time interaction and a Headless Mode for integration into large-scale automation pipelines or backend systems.
Choose the mode that best fits your workflow.


๐Ÿ–ฅ๏ธ Streamlit Frontend (Visual Mode)

Launch the Streamlit web interface for real-time document scanning, enhancement, and OCR execution.

streamlit run web/streamlit_app.py Once started, the app opens automatically in your browser, providing an interactive experience that requires no coding.

โœจ Features at a Glance ๐Ÿงฉ Feature ๐Ÿ’ก Description ๐Ÿ“ Drag & Drop Uploads Upload images or PDFs directly from your desktop into the interface. ๐Ÿงฉ Instant Preview Compare Original vs Enhanced versions side-by-side with smooth, live rendering. ๐Ÿง  Dynamic OCR Toggle Enable or disable text extraction instantly without restarting the process. ๐Ÿ“ฅ One-Click Downloads Export processed outputs as searchable PDFs or enhanced PNG ZIPs.

๐Ÿ’ก Perfect for students, researchers, developers, and data engineers who want a hands-on, visual experience without writing a single line of code.

๐Ÿ”— API Endpoints

NeuroScan-AI provides a clean, well-documented REST API built with FastAPI, enabling developers to seamlessly integrate document scanning, enhancement, and OCR features into their own systems, workflows, or applications.

Each endpoint is lightweight, asynchronous, and fully compatible with JSON responses, allowing smooth integration with any frontend, automation tool, or enterprise backend.


๐ŸŒ Endpoint ๐Ÿ”ง Method ๐Ÿ“˜ Description
/health GET Performs a quick API health check โ€” verifies that the service is running and reachable.
/scan POST Accepts an image or PDF upload, performs the complete pipeline (Scan โ†’ Enhance โ†’ OCR), and returns structured text output along with metadata such as image dimensions and processing time.
/scan/pdf POST Converts uploaded documents into fully searchable PDFs, embedding OCR text layers directly into the file while preserving the original visual layout.

โš™๏ธ Example Usage (via cURL)

curl -X POST "http://localhost:8000/scan"
-F "file=@document.jpg"
-F "ocr=true"
-F "lang=eng"

โœ… Response: Returns a JSON object containing: Extracted text (OCR output) Metadata (dimensions, page count, execution time)

Optional base64 or file download links (depending on configuration)


โœจ Author

Muhammad Wasif
AI/ML Developer @ Effixly AI

LinkedIn Email Website

โญ *If perception is the first step toward intelligence, NeuroScan-AI is where machines begin to truly see.*


About

NeuroScan-AI is an advanced document-understanding engine built with modern computer vision and OCR pipelines. It performs smart perspective correction, illumination normalization, and adaptive enhancement to transform raw camera captures into clean, searchable, professional-grade documents.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages