This application extracts various components from a cheque, namely signature, receiver, account, and amount using YOLO object detection. It also provides OCR functionality for account number recognition.
- Features
- Architecture
- Prerequisites
- Setup
- Running the Application
- API Usage
- Frontend Usage
- Environment Variables
- Project Structure
- Model Requirements
- Troubleshooting
- Object detection for cheque components using YOLO
- OCR for account number recognition using TrOCR
- FastAPI backend with Celery for asynchronous processing
- Modern web frontend with responsive design
- Docker support for easy deployment
- Health checks and logging
- Download functionality for extracted images
The application follows a microservices architecture with the following components:
- Frontend Service: Serves the web interface for user interaction
- Backend API Service: FastAPI application that handles requests and orchestrates processing
- Celery Worker: Processes cheque extraction tasks asynchronously
- Redis: Message broker for Celery and result storage
- YOLO Model: Custom-trained model for cheque component detection
- TrOCR Model: Pre-trained model for account number OCR
- Python 3.10+
- Redis server
- Docker and Docker Compose (for containerized deployment)
- At least 4GB RAM (due to machine learning models)
- YOLO model file (
models/YOLOfinetuned.pt
)
-
Install dependencies:
pip install -e .
-
Or install from requirements.txt:
pip install -r requirements.txt
-
Build the Docker images:
docker compose up --build
-
Start all services:
docker compose up
You need to start each service separately:
-
Start Redis server:
redis-server
-
Start Celery worker:
celery -A src.web.backend.celery_worker worker --loglevel=info
-
Start the FastAPI application:
gunicorn src.web.backend.api:app --workers 4 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --log-level info
-
Start the frontend server:
python main.py
docker compose up
This will start all services:
- Redis database on port 6379
- FastAPI backend on port 8000
- Celery worker for task processing
- Frontend server on port 8080
Access the application at:
- Frontend: http://localhost:8080
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
./start_all.sh
This script starts all services in the background and provides process management.
Once the backend is running, you can access:
- API Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
- Cheque Extraction: http://localhost:8000/extract
The /extract
endpoint accepts:
file
: The cheque image file (PNG, JPG, JPEG, TIFF, BMP, GIF)perform_ocr
: Boolean flag to enable OCR on account numberX-API-KEY
: Header with your API key (default: f64bdf6ae22c46efa50b0a98c322ded4)
Example using curl:
curl -X POST "http://localhost:8000/extract" \
-H "X-API-KEY: f64bdf6ae22c46efa50b0a98c322ded4" \
-F "file=@/path/to/cheque.jpg" \
-F "perform_ocr=true"
The web interface provides an easy-to-use interface for cheque processing:
- Navigate to http://localhost:8080
- Click "Choose a cheque image" to select a cheque image file
- Optionally check "Perform OCR on account number"
- Click "Extract Information"
- View the results including:
- Detected components with confidence scores
- Cropped images for each component
- Annotated cheque image
- Account number (if OCR was performed)
- Download individual images or all images at once
The following environment variables can be set:
API_KEY
: Required API key for authentication (default: f64bdf6ae22c46efa50b0a98c322ded4)CELERY_BROKER_URL
: Redis URL for Celery (default: redis://localhost:6379/0)CELERY_RESULT_BACKEND
: Redis URL for Celery results (default: redis://localhost:6379/1)
When using Docker, these are set in the docker-compose.yml
file.
.
├── src/
│ ├── models/
│ │ ├── detection/
│ │ │ └── detect.py # YOLO detection logic
│ │ └── ocr/
│ │ └── ocr.py # OCR processing logic
│ └── web/
│ ├── backend/
│ │ ├── api.py # FastAPI application
│ │ ├── celery_worker.py # Celery configuration
│ │ ├── schemas.py # Pydantic models
│ │ └── tasks.py # Celery tasks
│ └── frontend/
│ ├── index.html # Main page
│ ├── styles.css # Styling
│ └── script.js # Frontend logic
├── models/
│ └── YOLOfinetuned.pt # YOLO model file (not included in repo)
├── logs/ # Application logs
├── main.py # Frontend server
├── start_all.sh # Startup script
├── Dockerfile # Docker configuration
├── docker-compose.yml # Docker Compose configuration
├── requirements.txt # Python dependencies
└── pyproject.toml # Project metadata and dependencies
The application requires two machine learning models:
-
YOLO Model: Custom-trained model for cheque component detection
- File:
models/YOLOfinetuned.pt
- Place this file in the
models/
directory before running the application
- File:
-
TrOCR Model: Pre-trained model for account number OCR
- Automatically downloaded during Docker build
- For manual setup, the model will be downloaded on first run
-
Permission denied when running start_all.sh:
chmod +x start_all.sh
-
Models not found:
- Ensure
models/YOLOfinetuned.pt
exists - Check that the TrOCR model was downloaded successfully
- Ensure
-
Docker services not starting:
docker-compose logs <service_name>
-
Insufficient memory:
- Allocate at least 4GB RAM to Docker
- Close other memory-intensive applications
Each service includes health checks:
- Backend: http://localhost:8000/health
Check health status:
curl http://localhost:8000/health -H "X-API-KEY: ..."