Skip to content

The description provides a comprehensive overview of the project, including its purpose (industrial text recognition), functionality (OCR, preprocessing, networking), technical details (libraries, performance optimization), and benefits (automation, error reduction).

Notifications You must be signed in to change notification settings

rkarahul/OCR-Based-Image-Processing-and-Text-Recognition-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

OCR-Based Image Processing and Text Recognition System

Image Preprocessing Image

Screenshot (67)

This project implements an advanced system for extracting and mapping text from images using Optical Character Recognition (OCR). It preprocesses images to enhance text readability, performs OCR using the TrOCR model, maps detected text to a specific format, and communicates results to a remote server via REST API and socket programming. The system is designed for industrial applications requiring reliable text extraction.

Features

  • Image Preprocessing: Rotates, crops, and enhances images using OpenCV and PIL for optimal OCR performance.
  • OCR Processing: Utilizes the TrOCR model for accurate text extraction.
  • Text Mapping: Applies custom character mapping to format detected text.
  • Real-Time Communication: Uses socket programming for device interaction and FastAPI for server communication.
  • Data Logging: Saves results to CSV with timestamps and serial numbers.
  • Performance Optimization: Implements multithreading and thread pool execution for efficient processing.

Tech Stack

  • Languages: Python
  • Libraries: OpenCV, TrOCR, FastAPI, PIL, NumPy, Requests
  • Tools: Socket Programming, Multithreading, REST API, CSV

Installation

  1. Clone the repository:
    git clone https://github.com/rkarahul/OCR-Image-Processing-System.git
    cd OCR-Image-Processing-System
  2. Create a virtual environment and install dependencies:
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    pip install -r requirements.txt
  3. Download the TrOCR model weights (if not automatically handled):
    # Follow instructions from https://huggingface.co/microsoft/trocr-large-printed

Usage

  1. Place an input image (e.g., org.bmp) in the data/ directory.
  2. Run the main script:
    python src/main.py
  3. View results in the console and check ocr_data.csv for logged data.

Example Output

{
    "code": "mapped_text_12345678901",
    "cord": [1030, 870, 2660, 3224]
}

Project Structure

  • src/: Core source code for image processing, OCR, and networking.
  • data/: Sample input/output data.
  • tests/: Unit tests for key functions.
  • requirements.txt: Python dependencies.
  • README.md: Project documentation.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements.

Contact

For questions, reach out to rahul.kumarbihar245@gmail.com.

About

The description provides a comprehensive overview of the project, including its purpose (industrial text recognition), functionality (OCR, preprocessing, networking), technical details (libraries, performance optimization), and benefits (automation, error reduction).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages