Skip to content

Jmete/pdf-data-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Data Viewer

A Python application for viewing PDF files and extracting structured data through annotations.

Features

  • High-quality PDF viewing with support for multiple pages
  • Text selection and annotation capabilities
  • Data extraction from PDFs with field mapping
  • Date standardization for extracted data
  • SQLite database storage for annotations
  • CSV export for extracted data

Requirements

  • Python 3.9+
  • PySide6
  • PyMuPDF (fitz)
  • python-dateutil

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/pdf-data-py.git
cd pdf-data-py
  1. Install the package:
pip install -e .

Usage

Run the application:

python -m pdf_data_viewer.main

Or use the entry point:

pdf-data-viewer

Data Extraction

The application supports the extraction of the following data types:

Metadata

  • Document name
  • Customer name
  • Buyer name
  • Buyer Email
  • Buyer Phone
  • Buyer Job Position
  • Currency
  • RFQ date
  • Due date

Line Item Data

  • Line item number
  • Material number
  • Part number
  • Description
  • Full description
  • Quantity
  • Unit of measure
  • Requested delivery date
  • Delivery point
  • Manufacturer name

Development

Project Structure

pdf-data-py/
├── data/                     # Data directory
│   ├── annotations.db        # SQLite database file
│   └── exports/              # CSV export directory
└── pdf_data_viewer/          # Main package
    ├── core/                 # Core functionality
    ├── database/             # Database operations
    ├── ui/                   # User interface
    └── utils/                # Utility functions

Building from source

python setup.py build

License

MIT License

About

PDF Viewer that can extract data using AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages