An AI-powered application that extracts structured data from invoice documents using LangChain and vision-capable language models.
- Extract key information from invoice PDF or image files
- Process multiple invoices in a queue
- View results with thumbnails in a user-friendly interface
- Download individual or consolidated results (JSON/CSV)
- Clear all data with a single click
The application extracts the following data from invoices:
- Vendor Name
- Due Date
- Paid Date
- Service Period (From/To)
- Currency
- Net Amount
- VAT/Tax Amount
- Gross Amount
-
Create an .env.local file in the /invoice-parser dir, copy the contents of .env.example to this file, and insert your Open AI API key (The anthropic key is not used yet)
-
Start the application by running:
./start.sh
This will:
- Set up a Python virtual environment in the root directory (if needed)
- Install all required dependencies
- Start the Flask backend server on port 5002
- Start the Streamlit frontend on port 8501
- Create a log file for the steamlit service in the logs directory. The flask app logs directly to the console
Press Ctrl+C to stop both services when you're done.
invoice-experiment/
├── start.sh # Single command to start everything
├── run.sh # Alternative run script
├── venv/ # Python virtual environment (created by start.sh)
├── invoice-parser/ # Application directory
├── app.py # Flask backend API
├── streamlit_app.py # Streamlit frontend
├── invoice_processor.py # Core processing logic
├── uploads/ # Uploaded invoice files
├── results/ # Processing results
├── thumbnails/ # Invoice thumbnails
├── logs/ # Application logs
└── requirements.txt # Project dependencies
If you prefer to set up manually:
-
Create a virtual environment in the root directory:
# From the project root python -m venv venv
-
Activate the virtual environment:
- On macOS/Linux:
# From the project root source venv/bin/activate
- On Windows:
# From the project root venv\Scripts\activate
- On macOS/Linux:
-
Install dependencies:
# From the invoice-parser directory pip install -r invoice-parser/requirements.txt
-
Start the Flask backend:
# From the invoice-parser directory cd invoice-parser && python app.py
-
In a new terminal, start the Streamlit frontend:
# From the invoice-parser directory cd invoice-parser && streamlit run streamlit_app.py
- Navigate to http://localhost:8501 in your web browser
- Upload invoice documents (PDF, PNG, JPG, JPEG, TIFF)
- Files are automatically processed in the background
- Use the "Refresh Status" button to update processing status
- View and download results once processing is complete
The backend API is available at http://localhost:5002 with the following endpoints:
- POST
/api/upload-invoice
: Upload and process an invoice - GET
/api/status/<job_id>
: Check the status of a specific job - GET
/api/jobs
: List all jobs and their statuses - GET
/api/all-results
: Get all processed results - POST
/api/clear-all
: Clear all data (uploads, thumbnails, and results)
This project is licensed under the MIT License.