PDF Data Viewer

A Python application for viewing PDF files and extracting structured data through annotations.

Features

High-quality PDF viewing with support for multiple pages
Text selection and annotation capabilities
Data extraction from PDFs with field mapping
Date standardization for extracted data
SQLite database storage for annotations
CSV export for extracted data

Requirements

Python 3.9+
PySide6
PyMuPDF (fitz)
python-dateutil

Installation

Clone the repository:

git clone https://github.com/yourusername/pdf-data-py.git
cd pdf-data-py

Install the package:

pip install -e .

Usage

Run the application:

python -m pdf_data_viewer.main

Or use the entry point:

pdf-data-viewer

Data Extraction

The application supports the extraction of the following data types:

Metadata

Document name
Customer name
Buyer name
Buyer Email
Buyer Phone
Buyer Job Position
Currency
RFQ date
Due date

Line Item Data

Line item number
Material number
Part number
Description
Full description
Quantity
Unit of measure
Requested delivery date
Delivery point
Manufacturer name

Development

Project Structure

pdf-data-py/
├── data/                     # Data directory
│   ├── annotations.db        # SQLite database file
│   └── exports/              # CSV export directory
└── pdf_data_viewer/          # Main package
    ├── core/                 # Core functionality
    ├── database/             # Database operations
    ├── ui/                   # User interface
    └── utils/                # Utility functions

Building from source

python setup.py build

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
pdf_data_viewer		pdf_data_viewer
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Data Viewer

Features

Requirements

Installation

Usage

Data Extraction

Metadata

Line Item Data

Development

Project Structure

Building from source

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Jmete/pdf-data-py

Folders and files

Latest commit

History

Repository files navigation

PDF Data Viewer

Features

Requirements

Installation

Usage

Data Extraction

Metadata

Line Item Data

Development

Project Structure

Building from source

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages