Vision & Text Detection System 📸🧠

Welcome to the Vision & Text Detection System, a Python-based tool that leverages YOLO (You Only Look Once) for real-time object detection and Tesseract OCR for text recognition. This project captures live screen data, detects objects, extracts text, and organizes it into readable formats. It's designed for multi-process handling, ensuring both vision and text detection processes run seamlessly in parallel.

Features 🚀

Real-Time Object Detection: Uses a YOLO model to detect objects from the screen and logs their coordinates.
Text Extraction: Extracts text from detected objects' bounding boxes using Tesseract OCR.
Data Storage: Object coordinates and labels are stored as JSON files, making it easy to track detections over time.
Multi-Process Functionality: Runs vision detection and text updating concurrently for improved performance.
Configurable: You can replace the YOLO model with your preferred version.

System Requirements 🛠️

Python 3.7+
Required Python Libraries:
- ultralytics
- numpy
- mss
- pillow
- pytesseract
- json
- logging
- multiprocessing
- re
- time

Installation Guide 📝

Clone the Repository:

git clone https://github.com/bauerhartmut/yolov8-Computervision.git

Install libs:
```
pip install ultralytics mss pytesseract
```
Install Tesseract OCR:
- Download and install Tesseract OCR.
- Make sure to set the Tesseract path in the code correctly. Example:
```
pytesseract.pytesseract.tesseract_cmd = r'C:/Program Files/Tesseract-OCR/tesseract.exe'
```
Configure the YOLO Model:
- Download my YOLOv8 Model from Huggingface Computer_Vision_1.5.3
- get the label_description.json file from Huggingface Label_description.json

How It Works 🛠️

Vision Class 📷

The Vision class is responsible for handling object detection. It uses the YOLO model to analyze screen captures and detect objects. The results are saved in JSON files.

Screen Capture: Captures the live screen using the mss library.
Object Detection: YOLO analyzes the captured screen and detects objects.
JSON Output: Detected objects, along with their coordinates, are stored in model_view_output/label.json and respective label files.

Api Class 🌐

The Api class handles reading the JSON outputs, retrieving object coordinates, and performing OCR to extract text.

Retrieve Labels & Positions: Methods to get label data and object coordinates from JSON files.
Text Extraction: Uses Tesseract OCR to extract text from the detected objects' bounding boxes.
Text Cleanup: Removes non-ASCII characters from the extracted text.

Multi-Process Flow 🔄

This project runs two separate processes in parallel:

Vision Process: Constantly runs object detection on the screen.
Text Update Process: Continuously extracts and updates text found in the bounding boxes.

Example Output 📂

The system creates a directory model_view_output/ with the following files:

labels.json: Contains the count of detected objects for each label.
{label}.json: Stores the coordinates for each detected object of that label.
text.json: Stores coordinates of text-containing objects.
sumirize.json: A summary of the extracted text and corresponding coordinates.

How to Run ▶️

Ensure Tesseract is properly installed and its path is correctly set.
Start the system by running the main script:
```
python main.py
```
This will:
- Initiate the Vision Process for real-time object detection.
- Start the Text Updating Process to extract and log text from detected objects.

File Structure 📁

📦 Vision-Text-Detection
 ┣ 📂 model_view_output  # Stores output JSON files
 ┣ 📜 main.py            # Main script that runs the processes
 ┣ 📜 README.md          # Project documentation
 ┣ 📜 requirements.txt   # List of required Python libraries
 ┗ 📜 vision_api.py      # Contains Vision and Api classes

Customization Options 🎛️

YOLO Model: Replace vision_model = "Computer_Vision_1.3.0.onnx" with your own model path.
Tesseract Path: Update the Tesseract OCR path if installed in a different directory.
Detection Intervals: Modify the time.sleep() intervals in the start_vision() and updating_text() methods for faster/slower detection cycles.

Contributing 🤝

Feel free to fork the repository, submit issues, and open pull requests for improvements. All contributions are welcome!

License 📄

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LICENSE		LICENSE
README.md		README.md
label_description.json		label_description.json
model_vision.py		model_vision.py
object.py		object.py
view_output.json		view_output.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vision & Text Detection System 📸🧠

Features 🚀

System Requirements 🛠️

Installation Guide 📝

How It Works 🛠️

Vision Class 📷

Api Class 🌐

Multi-Process Flow 🔄

Example Output 📂

How to Run ▶️

File Structure 📁

Customization Options 🎛️

Contributing 🤝

License 📄

About

Uh oh!

Releases

Packages

Languages

License

bauerhartmut/yolov8-Computervision

Folders and files

Latest commit

History

Repository files navigation

Vision & Text Detection System 📸🧠

Features 🚀

System Requirements 🛠️

Installation Guide 📝

How It Works 🛠️

Vision Class 📷

Api Class 🌐

Multi-Process Flow 🔄

Example Output 📂

How to Run ▶️

File Structure 📁

Customization Options 🎛️

Contributing 🤝

License 📄

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages