🖼️ OCR Text Extractor

This Python script automates the extraction of text from images using Tesseract OCR. It processes all images in the test_images/ folder and saves the extracted text as .txt files in the extracted_texts/ directory, maintaining the original image filenames.

📁 Project Structure


OCR-Text-Extractor/
├── OCR.py
├── test_images/
│   └── image1.jpg
│   └── image2.png
├── extracted_texts/
│   └── image1.txt
│   └── image2.txt
└── README.md

⚙️ Features

Batch processes .jpg, .jpeg, and .png images.
Supports multiple languages (default: English and Hindi).
Automatically creates the extracted_texts/ folder if it doesn't exist.
Provides informative logging for each processed file.([GitHub][2])

🚀 Getting Started

1. Clone the Repository

git clone https://github.com/Mrigank005/OCR
cd OCR

2. Install Dependencies

Ensure you have Python 3 installed. Then, install the required Python libraries:

pip install pillow pytesseract

3. Install Tesseract OCR Engine

Windows: Download and install from Tesseract OCR Windows Installer.
macOS: Use Homebrew:([GitHub][1])
```
brew install tesseract
```
Linux (Debian/Ubuntu):
```
sudo apt-get install tesseract-ocr
```

Ensure Tesseract is added to your system's PATH.

4. Add Images

Place the images you want to process into the test_images/ directory.

5. Run the Script

python OCR.py

The extracted text files will be saved in the extracted_texts/ directory.

📝 Customization

Language Support: The script defaults to English and Hindi. To modify the languages, edit the langs parameter in the extract_text_and_save function within OCR.py:
```
def extract_text_and_save(image_path, langs=["eng", "hin"]):
```

Refer to Tesseract OCR Language Data for available language codes.([GitHub][1])

Tesseract Path: If Tesseract isn't in your system's PATH, specify its location in OCR.py:
```
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r'/path/to/tesseract'
```

🧪 Sample Output

For an image named page1.jpg in test_images/, the script will generate page1.txt in extracted_texts/ containing the recognized text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🖼️ OCR Text Extractor

📁 Project Structure

⚙️ Features

🚀 Getting Started

1. Clone the Repository

2. Install Dependencies

3. Install Tesseract OCR Engine

4. Add Images

5. Run the Script

📝 Customization

🧪 Sample Output

🙌 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
extracted_texts		extracted_texts
test_images		test_images
OCR.py		OCR.py
README.md		README.md

Mrigank005/OCR

Folders and files

Latest commit

History

Repository files navigation

🖼️ OCR Text Extractor

📁 Project Structure

⚙️ Features

🚀 Getting Started

1. Clone the Repository

2. Install Dependencies

3. Install Tesseract OCR Engine

4. Add Images

5. Run the Script

📝 Customization

🧪 Sample Output

🙌 Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages