This application leverages Amazon Web Services (AWS) and other AI services to automatically extract structured data from invoices. It combines Amazon Textract for optical character recognition (OCR) with Claude AI for advanced data extraction, providing a web-based interface for uploading invoices and viewing the extracted information in a standardized format.
- Upload PDF invoices for processing
- Multiple processing methods with AWS focus:
- Textract + Claude: Our primary method combines Amazon Textract's OCR capabilities with Claude 3.5 Sonnet for multimodal processing and structured information extraction
- Bedrock Claude Sonnet: Direct integration with Amazon Bedrock's Claude Sonnet model
- Bedrock Data Automation: Utilizes Amazon Bedrock's Data Automation capabilities
- Additional processing methods also available as alternatives (GPT-4o and Document Intelligence)
- Interactive UI with real-time feedback
- Display of extracted invoice data in a structured format
- Intelligent post-processing with AWS-based algorithms
- Python 3.8 or higher
- AWS account with access to Amazon Textract and Amazon Bedrock (for Claude 3.5 Sonnet)
- Proper IAM permissions for AWS services
- Optional: Access to other document intelligence and AI services for alternative processing methods
-
Clone the repository
-
Install required dependencies:
pip install -r requirements.txt
-
Configure your environment variables:
a. Copy the provided
.env.sample
file to create your own.env
file:cp .env.sample .env
b. Open the
.env
file in a text editor and replace the empty quotes with your actual API keys and endpoints:# AWS Settings (Primary) AWS_REGION="us-east-1" AWS_ACCESS_KEY_ID="your-aws-access-key" AWS_SECRET_ACCESS_KEY="your-aws-secret-key" BEDROCK_CLAUDE_MODEL_ID="arn:aws:bedrock:us-east-1:302263040839:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0" # Alternative Processing Methods Settings (Optional) DOC_INTELLIGENCE_ENDPOINT="your-doc-intelligence-endpoint" DOC_INTELLIGENCE_KEY="your-doc-intelligence-key" OPENAI_API_KEY="your-openai-key" OPENAI_MODEL="gpt-4o"
This application can be deployed to AWS using multiple approaches. Here are two recommended methods:
-
Prerequisites:
- Docker installed and running
- AWS CLI configured with appropriate permissions
- Amazon ECR repository created
-
Deployment Steps:
a. Build and tag the Docker image:
docker build -t invoice-parsing-app:latest .
b. Log in to Amazon ECR:
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin your-aws-account.dkr.ecr.us-east-1.amazonaws.com
c. Tag and push the Docker image to ECR:
docker tag invoice-parsing-app:latest your-aws-account.dkr.ecr.us-east-1.amazonaws.com/invoice-parsing-app:latest docker push your-aws-account.dkr.ecr.us-east-1.amazonaws.com/invoice-parsing-app:latest
d. Create an ECS cluster, task definition, and service using the AWS console or CLI
e. Configure environment variables in the task definition
-
Prerequisites:
- AWS CLI and EB CLI installed
- Proper IAM permissions
-
Deployment Steps:
a. Initialize Elastic Beanstalk application:
eb init -p docker invoice-parsing-app
b. Create an environment and deploy:
eb create invoice-parsing-environment
c. Configure environment variables:
eb setenv AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID=your-key BEDROCK_CLAUDE_MODEL_ID=arn:aws:bedrock:us-east-1:302263040839:inference-profile/us.anthropic.claude-3-5-sonnet-20240620-v1:0
The following files are necessary for running the application:
- app.py - The main Flask application file that contains the server logic
- .env - Environment variables file containing your API keys and endpoints
- .env.sample - Template file with the structure for your .env file
- templates/index.html - The HTML template for the web interface
- requirements.txt - Lists all Python dependencies
- static/ - Directory for CSS, JavaScript, and static assets including screenshots
- uploads/ - Directory where uploaded invoices are stored
Run the application using:
python app.py
The application will be accessible at http://localhost:5000 in your web browser.
- Open the application in your browser
- Upload an invoice PDF using the "Upload Document" button
- Select a processing method from the dropdown:
- AWS-Powered Methods:
- Textract + Claude: Combines Amazon Textract OCR with Claude's multimodal processing (recommended)
- Bedrock Claude Sonnet: Direct integration with Amazon Bedrock's Claude model
- Bedrock Data Automation: Utilizes Amazon Bedrock's automation capabilities
- Alternative Methods:
- DI + GPT-4o with Image: Document Intelligence with GPT-4o vision capabilities
- DI + GPT-4o (No Image): Document Intelligence with GPT-4o text-only processing
- GPT-4o with Image Only: Direct processing with GPT-4o vision capabilities
- DI + Phi-3: Document Intelligence with Microsoft Phi-3 language model
- AWS-Powered Methods:
- Click "Run Analysis" to process the invoice
- View the extracted information displayed on the page
- The application stores uploaded files in the
uploads
directory - Processed results are cached to improve performance
- AWS credentials are securely handled
- Amazon Textract extracts text from invoices while Claude 3.5 Sonnet extracts structured information
- The application includes post-processing logic to ensure critical fields like seller information and tax calculations are complete
- If you encounter environment variable errors, ensure your
.env
file contains all required variables - For PDF rendering issues, ensure you have the necessary system dependencies for pdf2image
- Check application logs for detailed error information