It extracts text from an image of a store receipt and converts it into a structured JSON object using OCR (Tesseract), OpenCV, and Gemini AI.
- Python 3.x
- OpenCV
- pytesseract
- openai-agents
- Tesseract-OCR (must be installed separately and added to your system PATH)
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Install the required Python packages:
pip install -r requirements.txt
-
Install Tesseract-OCR (separately):
- Download and install from Tesseract at UB Mannheim.
- Add the Tesseract install directory (e.g.,
C:\Program Files\Tesseract-OCR
) to your system PATH. - Test installation with:
tesseract --version
-
Set up your Gemini API key:
- Create a
.env
file in the project root:GEMINI_API_KEY=your_gemini_api_key_here
- Create a
-
Add a receipt image
Place your receipt image (e.g.,receipt2.png
) in theraw_receipts
folder. -
Run the script:
python main.py
-
Output:
- The extracted text will be printed to the console.
- The structured JSON will be saved to
json_receipt/receipt.json
(full path will be shown after running).
main.py
: Main script for image processing, OCR, and AI extraction.llm_setup.py
: Gemini AI model setup and configuration.raw_receipts/
: Place your input receipt images here.json_receipt/
: Output folder for generated JSON files.requirements.txt
: Python dependencies.readme.md
: Project documentation.
- Tesseract not found:
Ensure Tesseract-OCR is installed and its path is added to your system PATH. Test withtesseract --version
in your terminal. - No internet connection:
Gemini AI requires an active internet connection. - API key errors:
Make sure your.env
file is present and contains a validGEMINI_API_KEY
.