A robust real-time OCR (Optical Character Recognition) system that extracts text from live video feed and places it sequentially within dynamic bounding boxes. The system automatically corrects mirror image issues and saves extracted text to files.
- Real-time text extraction from live camera feed
- Dynamic bounding boxes that adapt to frame size
- Mirror image correction - automatically flips camera feed
- Sequential text placement within predefined regions
- Confidence-based filtering for high-quality text detection
- Comprehensive error handling to prevent crashes
- Configurable settings for easy customization
- Responsive layout: Boxes automatically adjust to frame dimensions
- Sequential placement: Text is placed in order within defined boxes
- Automatic overflow handling: Moves to next box when current is full
- Visual feedback: Bounding boxes are drawn on screen for reference
- Configurable ratios: Easy to modify box positions using frame ratios
- Automatic file saving: Text is saved to file as it's detected
- Timestamped entries: Each text entry includes timestamp
- Session management: Clear file on start or append to existing
- Storage statistics: Track text length and file information
- Camera failure recovery: Handles camera disconnection gracefully
- OCR processing errors: Continues operation even if OCR fails
- Input validation: Validates all inputs to prevent crashes
- Resource cleanup: Proper cleanup of camera and display resources
- Clone the repository:
git clone <repository-url>
cd my_ocr_base_code
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python main.py
Edit config.py
to customize the system:
# Dynamic bounding boxes configuration (ratios of frame dimensions)
BOUNDING_BOX_RATIOS = [
(0.1, 0.1, 0.9, 0.25), # Box 1 - Top section (10% to 90% width, 10% to 25% height)
(0.1, 0.3, 0.9, 0.45), # Box 2 - Upper middle
(0.1, 0.5, 0.9, 0.65), # Box 3 - Lower middle
(0.1, 0.7, 0.9, 0.85) # Box 4 - Bottom section
]
# Text storage settings
SAVE_TEXT_TO_FILE = True # Enable/disable text saving
TEXT_FILE_PATH = "extracted_text.txt" # File path for saving text
CLEAR_FILE_ON_START = True # Clear file when starting new session
# OCR settings
OCR_INTERVAL = 15 # Process OCR every N frames
CONFIDENCE_THRESHOLD = 0.5 # Minimum confidence for text detection
python main.py
q
: Quit the applicationc
: Clear accumulated text and reset bounding boxes
- Blue rectangles: Dynamic bounding boxes that adapt to frame size
- Green text: Detected text placed within boxes
- White text: Status information at bottom of screen
- Box counter: Shows current box being used (e.g., "Box: 2/4")
- Storage info: Shows file path and text length
- File location:
extracted_text.txt
(configurable inconfig.py
) - Format: Timestamped entries with detected text
- Example:
=== OCR Text Extraction Session Started: 2024-01-15 14:30:25 ===
[14:30:26] Hello World
[14:30:28] This is some text
[14:30:30] Another line of text
- Automatic saving: Text is saved as soon as it's detected
- Session tracking: Each session is clearly marked
- Timestamped entries: Know exactly when text was detected
- File statistics: Track file size and modification time
my_ocr_base_code/
├── main.py # Main application entry point
├── camera_feed.py # Video stream management with mirror correction
├── ocr_processor.py # Text extraction using EasyOCR
├── text_manager.py # Dynamic bounding box and text placement logic
├── ner_processor.py # Named Entity Recognition (unused)
├── utils.py # Text visualization utilities
├── config.py # Configuration settings
├── extracted_text.txt # Extracted text file (created automatically)
└── requirements.txt # Python dependencies
Camera Feed → Mirror Correction → OCR Processing → Text Analysis → Dynamic Bounding Box Placement → File Storage → Display
- Manages dynamic bounding boxes based on frame dimensions
- Handles sequential text placement within responsive boxes
- Provides automatic text storage to files
- Tracks accumulated text and current position
- Background thread for smooth frame capture
- Automatic mirror image correction
- Camera initialization and cleanup
- Error recovery for camera failures
- Uses EasyOCR for text extraction
- GPU acceleration support with CPU fallback
- Confidence-based filtering
- Comprehensive error handling
The system automatically corrects the mirror effect common in webcams:
- Automatic flipping: Camera feed is horizontally flipped
- Natural appearance: Text appears in correct orientation
- No configuration needed: Works automatically
Bounding boxes automatically adapt to different frame sizes:
- Responsive design: Boxes scale with frame dimensions
- Ratio-based positioning: Uses frame ratios instead of fixed pixels
- Automatic recalculation: Updates when frame size changes
- Configurable layout: Easy to modify box positions
The system includes comprehensive error handling for:
- Camera initialization failures
- Frame reading errors
- OCR processing failures
- Invalid input data
- Display errors
- File storage issues
- Resource cleanup issues
All errors are logged with appropriate severity levels and the system continues operation where possible.
- Frame skipping: OCR runs every 15 frames by default
- Confidence filtering: Only high-confidence text is displayed
- Background processing: Camera feed runs in separate thread
- Async operations: Non-blocking video processing
- Efficient storage: Text is saved incrementally
Edit config.py
:
BOUNDING_BOX_RATIOS = [
(x1_ratio, y1_ratio, x2_ratio, y2_ratio), # Add your custom boxes
# ... more boxes
]
SAVE_TEXT_TO_FILE = True # Enable/disable saving
TEXT_FILE_PATH = "my_text.txt" # Custom file path
CLEAR_FILE_ON_START = False # Append to existing file
OCR_INTERVAL = 15 # Processing frequency
CONFIDENCE_THRESHOLD = 0.5 # Minimum confidence
- Check camera permissions
- Try different camera source in
config.py
- Ensure camera is not in use by another application
- The system automatically corrects mirror effect
- If you still see mirror effect, check if another application is interfering
- Check frame dimensions in console output
- Verify box ratios in
config.py
- Ensure camera is providing valid frames
- Check
SAVE_TEXT_TO_FILE
setting inconfig.py
- Verify file path permissions
- Check the
extracted_text.txt
file directly
- Ensure good lighting conditions
- Check text is clearly visible in camera view
- Adjust
CONFIDENCE_THRESHOLD
if needed
- Reduce
OCR_INTERVAL
for faster processing - Increase
CONFIDENCE_THRESHOLD
to filter more text - Check GPU availability for OCR acceleration
- OpenCV: Video capture and image processing
- EasyOCR: Text recognition with GPU support
- PyTorch: Deep learning backend
- Transformers: NER capabilities (optional)
- NumPy: Numerical operations
This project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add error handling for new features
- Test thoroughly
- Submit a pull request
For issues and questions:
- Check the troubleshooting section
- Review error logs in console output
- Ensure all dependencies are installed
- Verify camera permissions and availability
- Check the
extracted_text.txt
file directly for stored text