OCR-Based Image Processing and Text Recognition System

Image Preprocessing Image

This project implements an advanced system for extracting and mapping text from images using Optical Character Recognition (OCR). It preprocesses images to enhance text readability, performs OCR using the TrOCR model, maps detected text to a specific format, and communicates results to a remote server via REST API and socket programming. The system is designed for industrial applications requiring reliable text extraction.

Features

Image Preprocessing: Rotates, crops, and enhances images using OpenCV and PIL for optimal OCR performance.
OCR Processing: Utilizes the TrOCR model for accurate text extraction.
Text Mapping: Applies custom character mapping to format detected text.
Real-Time Communication: Uses socket programming for device interaction and FastAPI for server communication.
Data Logging: Saves results to CSV with timestamps and serial numbers.
Performance Optimization: Implements multithreading and thread pool execution for efficient processing.

Tech Stack

Languages: Python
Libraries: OpenCV, TrOCR, FastAPI, PIL, NumPy, Requests
Tools: Socket Programming, Multithreading, REST API, CSV

Installation

Clone the repository:

git clone https://github.com/rkarahul/OCR-Image-Processing-System.git
cd OCR-Image-Processing-System

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt

Download the TrOCR model weights (if not automatically handled):

# Follow instructions from https://huggingface.co/microsoft/trocr-large-printed

Usage

Place an input image (e.g., org.bmp) in the data/ directory.
Run the main script:
```
python src/main.py
```
View results in the console and check ocr_data.csv for logged data.

Example Output

{
    "code": "mapped_text_12345678901",
    "cord": [1030, 870, 2660, 3224]
}

Project Structure

src/: Core source code for image processing, OCR, and networking.
data/: Sample input/output data.
tests/: Unit tests for key functions.
requirements.txt: Python dependencies.
README.md: Project documentation.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any improvements.

Contact

For questions, reach out to rahul.kumarbihar245@gmail.com.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OCR-Based Image Processing and Text Recognition System

Image Preprocessing Image

Features

Tech Stack

Installation

Usage

Example Output

Project Structure

License

Contributing

Contact

About

Uh oh!

Releases

Packages

Languages

rkarahul/OCR-Based-Image-Processing-and-Text-Recognition-System

Folders and files

Latest commit

History

Repository files navigation

OCR-Based Image Processing and Text Recognition System

Image Preprocessing Image

Features

Tech Stack

Installation

Usage

Example Output

Project Structure

License

Contributing

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages